Training: 2022-01-14 13:40:55,227-rank_id: 0 Training: 2022-01-14 13:41:08,009-: loss arcface Training: 2022-01-14 13:41:08,010-: network mbf Training: 2022-01-14 13:41:08,010-: resume False Training: 2022-01-14 13:41:08,010-: output work_dirs/ms1mv3_mobileface_lr02 Training: 2022-01-14 13:41:08,010-: embedding_size 512 Training: 2022-01-14 13:41:08,010-: sample_rate 1.0 Training: 2022-01-14 13:41:08,010-: fp16 True Training: 2022-01-14 13:41:08,010-: momentum 0.9 Training: 2022-01-14 13:41:08,010-: weight_decay 0.0001 Training: 2022-01-14 13:41:08,011-: batch_size 256 Training: 2022-01-14 13:41:08,011-: lr 0.2 Training: 2022-01-14 13:41:08,011-: dali False Training: 2022-01-14 13:41:08,011-: verbose 5000 Training: 2022-01-14 13:41:08,011-: frequent 10 Training: 2022-01-14 13:41:08,011-: score None Training: 2022-01-14 13:41:08,011-: rec /train_tmp/ms1m-retinaface-t1 Training: 2022-01-14 13:41:08,011-: num_classes 93431 Training: 2022-01-14 13:41:08,011-: num_image 5179510 Training: 2022-01-14 13:41:08,011-: num_epoch 40 Training: 2022-01-14 13:41:08,011-: warmup_epoch 2 Training: 2022-01-14 13:41:08,011-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-01-14 13:41:08,011-: warmup_step 5058 Training: 2022-01-14 13:41:08,011-: total_step 101160 Training: 2022-01-14 13:42:18,075-Reducer buckets have been rebuilt in this iteration. Training: 2022-01-14 13:42:21,349-Speed 12877.29 samples/sec Loss 41.0112 LearningRate 0.0008 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-14 13:42:22,915-Speed 13086.64 samples/sec Loss 41.0093 LearningRate 0.0012 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-01-14 13:42:24,482-Speed 13072.91 samples/sec Loss 40.9579 LearningRate 0.0016 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-01-14 13:42:26,054-Speed 13039.19 samples/sec Loss 40.9894 LearningRate 0.0020 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-01-14 13:42:27,647-Speed 12865.71 samples/sec Loss 40.9625 LearningRate 0.0024 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-01-14 13:42:29,886-Speed 9154.10 samples/sec Loss 40.9756 LearningRate 0.0028 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-01-14 13:42:31,482-Speed 12854.13 samples/sec Loss 40.9507 LearningRate 0.0032 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-01-14 13:42:33,072-Speed 12889.48 samples/sec Loss 40.8658 LearningRate 0.0036 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-01-14 13:42:34,659-Speed 12912.14 samples/sec Loss 40.8802 LearningRate 0.0040 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-01-14 13:42:36,244-Speed 12932.03 samples/sec Loss 40.8394 LearningRate 0.0043 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-01-14 13:42:37,842-Speed 12827.57 samples/sec Loss 40.7040 LearningRate 0.0047 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-01-14 13:42:39,433-Speed 12883.96 samples/sec Loss 40.5941 LearningRate 0.0051 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-01-14 13:42:41,027-Speed 12853.59 samples/sec Loss 40.3550 LearningRate 0.0055 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-01-14 13:42:42,613-Speed 12920.39 samples/sec Loss 40.0046 LearningRate 0.0059 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-01-14 13:42:44,189-Speed 13011.65 samples/sec Loss 39.6351 LearningRate 0.0063 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-14 13:42:45,796-Speed 12744.66 samples/sec Loss 39.2199 LearningRate 0.0067 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-14 13:42:47,383-Speed 12911.65 samples/sec Loss 38.8339 LearningRate 0.0071 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-14 13:42:48,992-Speed 12743.26 samples/sec Loss 38.4806 LearningRate 0.0075 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-14 13:42:50,532-Speed 13304.36 samples/sec Loss 38.2588 LearningRate 0.0079 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-01-14 13:42:52,108-Speed 13009.71 samples/sec Loss 38.0594 LearningRate 0.0083 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:42:53,691-Speed 12953.04 samples/sec Loss 37.9383 LearningRate 0.0087 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:42:55,275-Speed 12940.53 samples/sec Loss 37.8477 LearningRate 0.0091 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:42:56,868-Speed 12861.19 samples/sec Loss 37.7748 LearningRate 0.0095 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:42:58,445-Speed 13002.27 samples/sec Loss 37.7501 LearningRate 0.0099 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:43:00,032-Speed 12907.15 samples/sec Loss 37.6720 LearningRate 0.0103 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:43:01,649-Speed 12672.04 samples/sec Loss 37.6668 LearningRate 0.0107 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:43:03,209-Speed 13166.10 samples/sec Loss 37.6281 LearningRate 0.0111 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:43:04,768-Speed 13149.78 samples/sec Loss 37.6304 LearningRate 0.0115 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:43:06,361-Speed 12859.36 samples/sec Loss 37.5899 LearningRate 0.0119 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 13:43:07,950-Speed 12899.28 samples/sec Loss 37.6025 LearningRate 0.0123 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:09,536-Speed 12923.92 samples/sec Loss 37.6229 LearningRate 0.0127 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:11,116-Speed 12968.62 samples/sec Loss 37.5783 LearningRate 0.0130 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:12,684-Speed 13072.25 samples/sec Loss 37.5656 LearningRate 0.0134 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:14,298-Speed 12696.95 samples/sec Loss 37.5660 LearningRate 0.0138 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:15,874-Speed 13000.91 samples/sec Loss 37.5873 LearningRate 0.0142 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:17,457-Speed 12953.21 samples/sec Loss 37.5756 LearningRate 0.0146 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:19,010-Speed 13201.11 samples/sec Loss 37.5717 LearningRate 0.0150 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:20,621-Speed 12719.85 samples/sec Loss 37.5794 LearningRate 0.0154 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:22,205-Speed 12946.56 samples/sec Loss 37.5977 LearningRate 0.0158 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:43:23,803-Speed 12837.75 samples/sec Loss 37.5633 LearningRate 0.0162 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:25,394-Speed 12881.51 samples/sec Loss 37.5655 LearningRate 0.0166 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:26,960-Speed 13082.97 samples/sec Loss 37.5582 LearningRate 0.0170 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:28,545-Speed 12936.10 samples/sec Loss 37.5590 LearningRate 0.0174 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:30,130-Speed 12928.76 samples/sec Loss 37.5441 LearningRate 0.0178 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:31,782-Speed 12403.78 samples/sec Loss 37.5631 LearningRate 0.0182 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:33,363-Speed 12966.12 samples/sec Loss 37.5180 LearningRate 0.0186 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:34,919-Speed 13167.93 samples/sec Loss 37.5329 LearningRate 0.0190 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:36,512-Speed 12860.21 samples/sec Loss 37.5148 LearningRate 0.0194 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:38,093-Speed 12969.90 samples/sec Loss 37.4554 LearningRate 0.0198 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:43:39,645-Speed 13201.09 samples/sec Loss 37.4743 LearningRate 0.0202 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:41,236-Speed 12883.53 samples/sec Loss 37.4519 LearningRate 0.0206 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:42,841-Speed 12772.46 samples/sec Loss 37.4561 LearningRate 0.0210 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:44,450-Speed 12734.72 samples/sec Loss 37.3697 LearningRate 0.0214 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:46,033-Speed 12953.60 samples/sec Loss 37.3471 LearningRate 0.0217 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:47,610-Speed 12993.16 samples/sec Loss 37.3015 LearningRate 0.0221 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:49,182-Speed 13034.71 samples/sec Loss 37.2194 LearningRate 0.0225 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:50,768-Speed 12922.93 samples/sec Loss 37.1741 LearningRate 0.0229 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:52,350-Speed 12953.41 samples/sec Loss 37.1555 LearningRate 0.0233 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:53,946-Speed 12843.20 samples/sec Loss 37.0728 LearningRate 0.0237 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:43:55,531-Speed 12934.16 samples/sec Loss 37.0126 LearningRate 0.0241 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:43:57,087-Speed 13169.83 samples/sec Loss 36.9407 LearningRate 0.0245 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:43:58,676-Speed 12894.49 samples/sec Loss 36.8718 LearningRate 0.0249 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:00,269-Speed 12863.57 samples/sec Loss 36.8346 LearningRate 0.0253 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:01,884-Speed 12689.90 samples/sec Loss 36.7672 LearningRate 0.0257 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:03,471-Speed 12922.97 samples/sec Loss 36.7505 LearningRate 0.0261 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:05,031-Speed 13136.46 samples/sec Loss 36.6801 LearningRate 0.0265 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:06,621-Speed 12888.37 samples/sec Loss 36.5643 LearningRate 0.0269 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:08,233-Speed 12712.98 samples/sec Loss 36.4715 LearningRate 0.0273 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:09,804-Speed 13050.47 samples/sec Loss 36.4422 LearningRate 0.0277 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:11,387-Speed 12943.75 samples/sec Loss 36.3549 LearningRate 0.0281 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:12,979-Speed 12990.02 samples/sec Loss 36.3233 LearningRate 0.0285 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:14,571-Speed 12875.06 samples/sec Loss 36.2326 LearningRate 0.0289 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:16,146-Speed 13005.22 samples/sec Loss 36.0950 LearningRate 0.0293 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:17,726-Speed 12970.65 samples/sec Loss 36.0533 LearningRate 0.0297 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:19,281-Speed 13177.75 samples/sec Loss 35.9241 LearningRate 0.0301 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:20,867-Speed 12929.68 samples/sec Loss 35.9211 LearningRate 0.0304 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:22,442-Speed 13008.61 samples/sec Loss 35.8355 LearningRate 0.0308 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:24,045-Speed 12788.57 samples/sec Loss 35.7156 LearningRate 0.0312 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:25,624-Speed 12975.95 samples/sec Loss 35.5708 LearningRate 0.0316 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:27,192-Speed 13067.57 samples/sec Loss 35.5332 LearningRate 0.0320 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:28,786-Speed 12865.17 samples/sec Loss 35.4608 LearningRate 0.0324 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:30,371-Speed 12930.88 samples/sec Loss 35.3091 LearningRate 0.0328 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:31,953-Speed 12952.37 samples/sec Loss 35.2637 LearningRate 0.0332 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:33,518-Speed 13103.26 samples/sec Loss 35.1219 LearningRate 0.0336 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:35,102-Speed 12939.76 samples/sec Loss 35.0243 LearningRate 0.0340 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:36,681-Speed 12971.12 samples/sec Loss 34.8919 LearningRate 0.0344 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:38,278-Speed 12843.85 samples/sec Loss 34.8457 LearningRate 0.0348 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:39,866-Speed 12915.80 samples/sec Loss 34.6637 LearningRate 0.0352 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:41,484-Speed 12689.78 samples/sec Loss 34.6087 LearningRate 0.0356 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:43,082-Speed 12827.20 samples/sec Loss 34.4821 LearningRate 0.0360 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:44,689-Speed 12753.44 samples/sec Loss 34.3654 LearningRate 0.0364 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:46,250-Speed 13130.29 samples/sec Loss 34.2535 LearningRate 0.0368 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:47,811-Speed 13126.88 samples/sec Loss 34.1366 LearningRate 0.0372 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:49,422-Speed 12723.83 samples/sec Loss 34.0524 LearningRate 0.0376 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:51,000-Speed 12985.13 samples/sec Loss 33.9193 LearningRate 0.0380 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:44:52,563-Speed 13110.05 samples/sec Loss 33.8137 LearningRate 0.0384 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:54,187-Speed 12617.69 samples/sec Loss 33.7003 LearningRate 0.0388 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:55,769-Speed 12954.01 samples/sec Loss 33.5221 LearningRate 0.0391 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:57,368-Speed 12816.34 samples/sec Loss 33.4419 LearningRate 0.0395 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:44:58,948-Speed 12972.87 samples/sec Loss 33.3111 LearningRate 0.0399 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:45:00,499-Speed 13235.57 samples/sec Loss 33.2227 LearningRate 0.0403 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:02,088-Speed 12896.51 samples/sec Loss 33.0835 LearningRate 0.0407 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:03,675-Speed 12936.57 samples/sec Loss 32.9698 LearningRate 0.0411 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:05,255-Speed 12964.78 samples/sec Loss 32.7683 LearningRate 0.0415 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:06,825-Speed 13082.65 samples/sec Loss 32.7244 LearningRate 0.0419 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:08,421-Speed 12846.39 samples/sec Loss 32.6280 LearningRate 0.0423 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:10,019-Speed 12829.08 samples/sec Loss 32.4296 LearningRate 0.0427 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:11,607-Speed 12906.03 samples/sec Loss 32.3239 LearningRate 0.0431 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:13,198-Speed 12880.37 samples/sec Loss 32.1701 LearningRate 0.0435 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:14,785-Speed 12915.93 samples/sec Loss 32.1234 LearningRate 0.0439 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:16,333-Speed 13242.76 samples/sec Loss 31.9080 LearningRate 0.0443 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:17,914-Speed 12956.66 samples/sec Loss 31.8226 LearningRate 0.0447 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:19,528-Speed 12696.05 samples/sec Loss 31.6860 LearningRate 0.0451 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:21,085-Speed 13162.92 samples/sec Loss 31.5762 LearningRate 0.0455 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:22,659-Speed 13020.29 samples/sec Loss 31.3677 LearningRate 0.0459 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:24,240-Speed 12963.65 samples/sec Loss 31.3229 LearningRate 0.0463 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:25,828-Speed 12906.20 samples/sec Loss 31.1921 LearningRate 0.0467 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:27,438-Speed 12726.86 samples/sec Loss 31.0442 LearningRate 0.0471 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:29,019-Speed 12968.20 samples/sec Loss 30.7992 LearningRate 0.0474 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:30,589-Speed 13051.96 samples/sec Loss 30.8365 LearningRate 0.0478 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:32,192-Speed 12781.06 samples/sec Loss 30.7373 LearningRate 0.0482 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:45:33,760-Speed 13073.28 samples/sec Loss 30.5070 LearningRate 0.0486 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:35,323-Speed 13108.27 samples/sec Loss 30.4037 LearningRate 0.0490 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:36,933-Speed 12731.18 samples/sec Loss 30.2023 LearningRate 0.0494 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:38,496-Speed 13115.95 samples/sec Loss 30.0591 LearningRate 0.0498 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:40,064-Speed 13069.96 samples/sec Loss 29.9791 LearningRate 0.0502 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:41,627-Speed 13109.54 samples/sec Loss 29.8690 LearningRate 0.0506 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:43,240-Speed 12707.43 samples/sec Loss 29.6607 LearningRate 0.0510 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:44,823-Speed 12952.24 samples/sec Loss 29.5476 LearningRate 0.0514 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:46,411-Speed 12898.71 samples/sec Loss 29.4054 LearningRate 0.0518 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:48,012-Speed 12799.99 samples/sec Loss 29.3306 LearningRate 0.0522 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:49,587-Speed 13009.64 samples/sec Loss 29.1913 LearningRate 0.0526 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:51,164-Speed 12999.56 samples/sec Loss 29.0981 LearningRate 0.0530 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:52,742-Speed 12982.55 samples/sec Loss 28.9200 LearningRate 0.0534 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:54,349-Speed 12754.25 samples/sec Loss 28.7407 LearningRate 0.0538 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:55,936-Speed 12908.34 samples/sec Loss 28.5912 LearningRate 0.0542 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:57,512-Speed 13010.94 samples/sec Loss 28.5209 LearningRate 0.0546 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:45:59,113-Speed 12789.92 samples/sec Loss 28.4532 LearningRate 0.0550 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:00,682-Speed 13064.44 samples/sec Loss 28.3189 LearningRate 0.0554 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:02,239-Speed 13182.19 samples/sec Loss 28.1202 LearningRate 0.0558 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:03,821-Speed 12961.60 samples/sec Loss 27.8911 LearningRate 0.0561 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:05,400-Speed 12977.72 samples/sec Loss 27.8423 LearningRate 0.0565 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:46:06,975-Speed 13008.67 samples/sec Loss 27.6816 LearningRate 0.0569 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:08,556-Speed 12965.43 samples/sec Loss 27.5713 LearningRate 0.0573 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:10,126-Speed 13054.72 samples/sec Loss 27.5140 LearningRate 0.0577 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:11,708-Speed 12955.46 samples/sec Loss 27.3332 LearningRate 0.0581 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:13,287-Speed 12980.86 samples/sec Loss 27.2034 LearningRate 0.0585 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:14,863-Speed 13007.23 samples/sec Loss 27.1298 LearningRate 0.0589 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:16,428-Speed 13091.89 samples/sec Loss 26.9386 LearningRate 0.0593 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:18,068-Speed 12495.26 samples/sec Loss 26.8309 LearningRate 0.0597 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:19,637-Speed 13059.07 samples/sec Loss 26.6572 LearningRate 0.0601 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:21,238-Speed 12802.44 samples/sec Loss 26.6808 LearningRate 0.0605 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:22,780-Speed 13290.57 samples/sec Loss 26.4650 LearningRate 0.0609 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:24,372-Speed 12874.52 samples/sec Loss 26.3040 LearningRate 0.0613 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:25,961-Speed 12895.48 samples/sec Loss 26.1529 LearningRate 0.0617 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:27,557-Speed 12852.64 samples/sec Loss 26.0498 LearningRate 0.0621 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:29,144-Speed 12918.68 samples/sec Loss 25.9616 LearningRate 0.0625 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:30,726-Speed 12948.50 samples/sec Loss 25.8202 LearningRate 0.0629 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:32,314-Speed 12908.87 samples/sec Loss 25.7246 LearningRate 0.0633 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:33,889-Speed 13005.41 samples/sec Loss 25.4996 LearningRate 0.0637 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:35,470-Speed 12962.37 samples/sec Loss 25.2930 LearningRate 0.0641 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:37,029-Speed 13149.60 samples/sec Loss 25.2432 LearningRate 0.0645 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:38,615-Speed 12922.18 samples/sec Loss 25.1874 LearningRate 0.0648 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:40,206-Speed 12886.59 samples/sec Loss 25.1639 LearningRate 0.0652 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:41,818-Speed 12710.62 samples/sec Loss 25.0009 LearningRate 0.0656 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:43,391-Speed 13022.33 samples/sec Loss 24.7589 LearningRate 0.0660 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:44,968-Speed 13004.41 samples/sec Loss 24.6888 LearningRate 0.0664 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:46,552-Speed 12934.33 samples/sec Loss 24.6503 LearningRate 0.0668 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:48,128-Speed 13004.30 samples/sec Loss 24.5185 LearningRate 0.0672 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:49,699-Speed 13041.41 samples/sec Loss 24.4085 LearningRate 0.0676 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:51,271-Speed 13037.73 samples/sec Loss 24.1973 LearningRate 0.0680 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:52,858-Speed 12910.92 samples/sec Loss 24.1477 LearningRate 0.0684 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:54,440-Speed 12957.74 samples/sec Loss 24.0387 LearningRate 0.0688 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:46:56,007-Speed 13072.38 samples/sec Loss 24.0001 LearningRate 0.0692 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:46:57,577-Speed 13061.04 samples/sec Loss 23.8708 LearningRate 0.0696 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:46:59,144-Speed 13080.22 samples/sec Loss 23.6704 LearningRate 0.0700 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:00,719-Speed 13006.69 samples/sec Loss 23.5394 LearningRate 0.0704 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:02,322-Speed 12783.45 samples/sec Loss 23.5062 LearningRate 0.0708 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:03,913-Speed 12886.12 samples/sec Loss 23.3218 LearningRate 0.0712 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:05,488-Speed 13010.73 samples/sec Loss 23.3423 LearningRate 0.0716 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:07,080-Speed 12869.83 samples/sec Loss 23.1982 LearningRate 0.0720 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:08,659-Speed 12984.78 samples/sec Loss 23.0880 LearningRate 0.0724 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:10,237-Speed 12990.11 samples/sec Loss 22.9063 LearningRate 0.0728 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:11,829-Speed 12865.24 samples/sec Loss 22.8054 LearningRate 0.0732 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:47:13,396-Speed 13084.20 samples/sec Loss 22.7083 LearningRate 0.0735 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:14,975-Speed 12981.88 samples/sec Loss 22.6588 LearningRate 0.0739 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:16,557-Speed 12946.82 samples/sec Loss 22.5338 LearningRate 0.0743 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:18,112-Speed 13179.00 samples/sec Loss 22.3661 LearningRate 0.0747 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:19,709-Speed 12837.40 samples/sec Loss 22.1458 LearningRate 0.0751 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:21,289-Speed 12966.42 samples/sec Loss 22.2316 LearningRate 0.0755 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:22,868-Speed 12978.69 samples/sec Loss 22.1242 LearningRate 0.0759 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:24,452-Speed 12943.67 samples/sec Loss 22.1014 LearningRate 0.0763 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:26,010-Speed 13150.68 samples/sec Loss 21.9135 LearningRate 0.0767 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:27,590-Speed 12966.97 samples/sec Loss 21.8315 LearningRate 0.0771 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:29,167-Speed 12997.96 samples/sec Loss 21.8325 LearningRate 0.0775 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:47:30,748-Speed 12961.94 samples/sec Loss 21.7478 LearningRate 0.0779 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:32,362-Speed 12701.08 samples/sec Loss 21.5366 LearningRate 0.0783 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:33,940-Speed 12984.00 samples/sec Loss 21.5069 LearningRate 0.0787 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:35,523-Speed 12942.67 samples/sec Loss 21.3825 LearningRate 0.0791 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:37,103-Speed 12975.46 samples/sec Loss 21.3083 LearningRate 0.0795 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:38,704-Speed 12799.62 samples/sec Loss 21.1560 LearningRate 0.0799 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:40,267-Speed 13117.88 samples/sec Loss 21.1132 LearningRate 0.0803 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:41,849-Speed 12952.98 samples/sec Loss 20.9994 LearningRate 0.0807 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:43,439-Speed 12886.14 samples/sec Loss 20.9412 LearningRate 0.0811 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:45,017-Speed 12997.46 samples/sec Loss 20.8304 LearningRate 0.0815 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:46,589-Speed 13028.01 samples/sec Loss 20.8146 LearningRate 0.0819 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:47:48,154-Speed 13098.50 samples/sec Loss 20.6984 LearningRate 0.0822 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:49,739-Speed 12928.34 samples/sec Loss 20.6116 LearningRate 0.0826 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:51,320-Speed 12965.12 samples/sec Loss 20.4819 LearningRate 0.0830 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:52,916-Speed 12833.84 samples/sec Loss 20.4950 LearningRate 0.0834 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:54,479-Speed 13109.39 samples/sec Loss 20.3041 LearningRate 0.0838 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:56,051-Speed 13037.78 samples/sec Loss 20.2887 LearningRate 0.0842 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:57,622-Speed 13051.23 samples/sec Loss 20.1444 LearningRate 0.0846 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:47:59,215-Speed 12867.11 samples/sec Loss 20.1554 LearningRate 0.0850 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:48:00,760-Speed 13264.43 samples/sec Loss 20.1011 LearningRate 0.0854 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:48:02,327-Speed 13070.20 samples/sec Loss 19.9462 LearningRate 0.0858 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:48:03,908-Speed 12960.46 samples/sec Loss 19.7756 LearningRate 0.0862 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:48:05,499-Speed 12883.31 samples/sec Loss 19.9370 LearningRate 0.0866 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:07,053-Speed 13182.12 samples/sec Loss 19.7225 LearningRate 0.0870 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:08,666-Speed 12711.72 samples/sec Loss 19.7399 LearningRate 0.0874 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:10,235-Speed 13067.45 samples/sec Loss 19.5517 LearningRate 0.0878 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:11,825-Speed 12894.90 samples/sec Loss 19.5179 LearningRate 0.0882 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:13,387-Speed 13122.13 samples/sec Loss 19.4485 LearningRate 0.0886 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:14,951-Speed 13102.14 samples/sec Loss 19.3711 LearningRate 0.0890 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:16,522-Speed 13046.90 samples/sec Loss 19.3076 LearningRate 0.0894 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:18,094-Speed 13056.52 samples/sec Loss 19.3347 LearningRate 0.0898 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:19,677-Speed 12939.78 samples/sec Loss 19.1904 LearningRate 0.0902 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:48:21,267-Speed 12914.55 samples/sec Loss 19.1129 LearningRate 0.0905 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:22,843-Speed 13001.71 samples/sec Loss 18.9626 LearningRate 0.0909 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:24,432-Speed 12906.64 samples/sec Loss 18.9143 LearningRate 0.0913 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:25,976-Speed 13276.25 samples/sec Loss 18.8430 LearningRate 0.0917 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:27,569-Speed 12867.16 samples/sec Loss 18.8000 LearningRate 0.0921 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:29,138-Speed 13061.74 samples/sec Loss 18.7784 LearningRate 0.0925 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:30,776-Speed 12511.69 samples/sec Loss 18.7816 LearningRate 0.0929 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:32,371-Speed 12850.98 samples/sec Loss 18.7621 LearningRate 0.0933 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:33,952-Speed 12960.21 samples/sec Loss 18.6145 LearningRate 0.0937 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:35,523-Speed 13043.56 samples/sec Loss 18.4993 LearningRate 0.0941 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:37,085-Speed 13121.80 samples/sec Loss 18.4542 LearningRate 0.0945 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:38,715-Speed 12567.90 samples/sec Loss 18.4937 LearningRate 0.0949 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:40,290-Speed 13011.47 samples/sec Loss 18.2909 LearningRate 0.0953 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:41,874-Speed 12938.46 samples/sec Loss 18.3141 LearningRate 0.0957 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:43,458-Speed 12932.49 samples/sec Loss 18.1107 LearningRate 0.0961 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:45,001-Speed 13289.02 samples/sec Loss 17.9921 LearningRate 0.0965 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:46,569-Speed 13071.97 samples/sec Loss 18.1365 LearningRate 0.0969 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:48,151-Speed 12954.88 samples/sec Loss 18.0203 LearningRate 0.0973 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:49,730-Speed 12975.10 samples/sec Loss 17.9684 LearningRate 0.0977 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:51,286-Speed 13169.90 samples/sec Loss 17.9705 LearningRate 0.0981 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:52,908-Speed 12635.62 samples/sec Loss 17.9252 LearningRate 0.0985 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:48:54,451-Speed 13286.33 samples/sec Loss 17.8295 LearningRate 0.0989 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:56,040-Speed 12892.78 samples/sec Loss 17.8235 LearningRate 0.0992 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:48:57,715-Speed 12239.61 samples/sec Loss 17.7666 LearningRate 0.0996 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:49:13,216-Speed 1321.35 samples/sec Loss 17.5155 LearningRate 0.1000 Epoch: 1 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:14,825-Speed 12743.73 samples/sec Loss 16.5399 LearningRate 0.1004 Epoch: 1 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:16,433-Speed 12743.05 samples/sec Loss 16.5204 LearningRate 0.1008 Epoch: 1 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:18,012-Speed 12983.90 samples/sec Loss 16.3550 LearningRate 0.1012 Epoch: 1 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:19,591-Speed 12979.64 samples/sec Loss 16.5922 LearningRate 0.1016 Epoch: 1 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:21,181-Speed 12895.78 samples/sec Loss 16.4153 LearningRate 0.1020 Epoch: 1 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:22,769-Speed 12901.29 samples/sec Loss 16.4899 LearningRate 0.1024 Epoch: 1 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:24,357-Speed 12907.50 samples/sec Loss 16.4255 LearningRate 0.1028 Epoch: 1 Global Step: 2600 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:49:25,924-Speed 13078.12 samples/sec Loss 16.5491 LearningRate 0.1032 Epoch: 1 Global Step: 2610 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:49:27,480-Speed 13166.71 samples/sec Loss 16.4484 LearningRate 0.1036 Epoch: 1 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:29,075-Speed 12877.97 samples/sec Loss 16.4324 LearningRate 0.1040 Epoch: 1 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:30,670-Speed 12851.45 samples/sec Loss 16.4273 LearningRate 0.1044 Epoch: 1 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:32,258-Speed 12903.99 samples/sec Loss 16.3514 LearningRate 0.1048 Epoch: 1 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:33,856-Speed 12822.35 samples/sec Loss 16.3102 LearningRate 0.1052 Epoch: 1 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:35,439-Speed 12952.57 samples/sec Loss 16.3180 LearningRate 0.1056 Epoch: 1 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:37,022-Speed 12945.27 samples/sec Loss 16.2656 LearningRate 0.1060 Epoch: 1 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:38,630-Speed 12742.00 samples/sec Loss 16.2202 LearningRate 0.1064 Epoch: 1 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:40,223-Speed 12880.34 samples/sec Loss 16.3020 LearningRate 0.1068 Epoch: 1 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:41,796-Speed 13024.94 samples/sec Loss 16.3201 LearningRate 0.1072 Epoch: 1 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:43,399-Speed 12783.24 samples/sec Loss 16.2837 LearningRate 0.1076 Epoch: 1 Global Step: 2720 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:49:44,957-Speed 13157.82 samples/sec Loss 16.2888 LearningRate 0.1079 Epoch: 1 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:46,552-Speed 12848.42 samples/sec Loss 16.0705 LearningRate 0.1083 Epoch: 1 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:48,165-Speed 12704.10 samples/sec Loss 16.1406 LearningRate 0.1087 Epoch: 1 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:49,747-Speed 12958.64 samples/sec Loss 16.2031 LearningRate 0.1091 Epoch: 1 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:51,345-Speed 12815.60 samples/sec Loss 15.9034 LearningRate 0.1095 Epoch: 1 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:52,941-Speed 12838.15 samples/sec Loss 16.0609 LearningRate 0.1099 Epoch: 1 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:54,505-Speed 13102.86 samples/sec Loss 16.0430 LearningRate 0.1103 Epoch: 1 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:56,081-Speed 13007.46 samples/sec Loss 15.9614 LearningRate 0.1107 Epoch: 1 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:57,687-Speed 12765.60 samples/sec Loss 16.0294 LearningRate 0.1111 Epoch: 1 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:49:59,271-Speed 12932.46 samples/sec Loss 15.8742 LearningRate 0.1115 Epoch: 1 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:00,859-Speed 12910.00 samples/sec Loss 15.8796 LearningRate 0.1119 Epoch: 1 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:02,446-Speed 12907.08 samples/sec Loss 15.9532 LearningRate 0.1123 Epoch: 1 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:04,028-Speed 12955.78 samples/sec Loss 15.9063 LearningRate 0.1127 Epoch: 1 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:05,605-Speed 12995.59 samples/sec Loss 15.9664 LearningRate 0.1131 Epoch: 1 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:07,164-Speed 13145.05 samples/sec Loss 15.9582 LearningRate 0.1135 Epoch: 1 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:08,748-Speed 12945.76 samples/sec Loss 15.7208 LearningRate 0.1139 Epoch: 1 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:10,320-Speed 13034.77 samples/sec Loss 15.7051 LearningRate 0.1143 Epoch: 1 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:11,897-Speed 12990.04 samples/sec Loss 15.7571 LearningRate 0.1147 Epoch: 1 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:13,448-Speed 13214.49 samples/sec Loss 15.5439 LearningRate 0.1151 Epoch: 1 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:15,006-Speed 13147.15 samples/sec Loss 15.6356 LearningRate 0.1155 Epoch: 1 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:16,587-Speed 12965.91 samples/sec Loss 15.6871 LearningRate 0.1159 Epoch: 1 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:18,153-Speed 13087.22 samples/sec Loss 15.6610 LearningRate 0.1163 Epoch: 1 Global Step: 2940 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:19,723-Speed 13057.23 samples/sec Loss 15.7099 LearningRate 0.1166 Epoch: 1 Global Step: 2950 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:21,286-Speed 13109.75 samples/sec Loss 15.6177 LearningRate 0.1170 Epoch: 1 Global Step: 2960 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:22,859-Speed 13025.00 samples/sec Loss 15.6468 LearningRate 0.1174 Epoch: 1 Global Step: 2970 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:24,435-Speed 13007.01 samples/sec Loss 15.5266 LearningRate 0.1178 Epoch: 1 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:26,013-Speed 13018.87 samples/sec Loss 15.5884 LearningRate 0.1182 Epoch: 1 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:27,600-Speed 12909.49 samples/sec Loss 15.4351 LearningRate 0.1186 Epoch: 1 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:29,179-Speed 12982.14 samples/sec Loss 15.5995 LearningRate 0.1190 Epoch: 1 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:30,727-Speed 13233.25 samples/sec Loss 15.4857 LearningRate 0.1194 Epoch: 1 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:32,295-Speed 13072.88 samples/sec Loss 15.4244 LearningRate 0.1198 Epoch: 1 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:33,871-Speed 12999.91 samples/sec Loss 15.4622 LearningRate 0.1202 Epoch: 1 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:35,443-Speed 13036.32 samples/sec Loss 15.4007 LearningRate 0.1206 Epoch: 1 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:37,016-Speed 13023.82 samples/sec Loss 15.3326 LearningRate 0.1210 Epoch: 1 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:38,586-Speed 13057.56 samples/sec Loss 15.4115 LearningRate 0.1214 Epoch: 1 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:50:40,179-Speed 12868.64 samples/sec Loss 15.3368 LearningRate 0.1218 Epoch: 1 Global Step: 3080 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:41,746-Speed 13072.38 samples/sec Loss 15.3489 LearningRate 0.1222 Epoch: 1 Global Step: 3090 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:43,327-Speed 12969.82 samples/sec Loss 15.2756 LearningRate 0.1226 Epoch: 1 Global Step: 3100 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:44,900-Speed 13025.76 samples/sec Loss 15.2356 LearningRate 0.1230 Epoch: 1 Global Step: 3110 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:46,477-Speed 12996.13 samples/sec Loss 15.1858 LearningRate 0.1234 Epoch: 1 Global Step: 3120 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:48,042-Speed 13093.09 samples/sec Loss 15.2157 LearningRate 0.1238 Epoch: 1 Global Step: 3130 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:49,627-Speed 12935.11 samples/sec Loss 15.1528 LearningRate 0.1242 Epoch: 1 Global Step: 3140 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:51,210-Speed 12946.03 samples/sec Loss 15.1771 LearningRate 0.1246 Epoch: 1 Global Step: 3150 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:52,782-Speed 13032.85 samples/sec Loss 15.0732 LearningRate 0.1250 Epoch: 1 Global Step: 3160 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:54,326-Speed 13272.94 samples/sec Loss 15.1043 LearningRate 0.1253 Epoch: 1 Global Step: 3170 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:55,896-Speed 13058.78 samples/sec Loss 15.0166 LearningRate 0.1257 Epoch: 1 Global Step: 3180 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:57,487-Speed 12872.23 samples/sec Loss 15.0712 LearningRate 0.1261 Epoch: 1 Global Step: 3190 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:50:59,077-Speed 12891.50 samples/sec Loss 15.1505 LearningRate 0.1265 Epoch: 1 Global Step: 3200 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:51:00,647-Speed 13049.72 samples/sec Loss 15.1274 LearningRate 0.1269 Epoch: 1 Global Step: 3210 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:51:02,208-Speed 13129.63 samples/sec Loss 15.0296 LearningRate 0.1273 Epoch: 1 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:03,811-Speed 12777.41 samples/sec Loss 14.8383 LearningRate 0.1277 Epoch: 1 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:05,386-Speed 13015.45 samples/sec Loss 14.9919 LearningRate 0.1281 Epoch: 1 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:06,978-Speed 12870.38 samples/sec Loss 14.8994 LearningRate 0.1285 Epoch: 1 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:08,580-Speed 12796.69 samples/sec Loss 14.8132 LearningRate 0.1289 Epoch: 1 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:10,145-Speed 13098.93 samples/sec Loss 14.7867 LearningRate 0.1293 Epoch: 1 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:11,729-Speed 12937.10 samples/sec Loss 14.6964 LearningRate 0.1297 Epoch: 1 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:13,317-Speed 12899.65 samples/sec Loss 14.7039 LearningRate 0.1301 Epoch: 1 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:14,921-Speed 12775.67 samples/sec Loss 14.7620 LearningRate 0.1305 Epoch: 1 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:16,485-Speed 13102.52 samples/sec Loss 14.8771 LearningRate 0.1309 Epoch: 1 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:18,059-Speed 13017.65 samples/sec Loss 14.7861 LearningRate 0.1313 Epoch: 1 Global Step: 3320 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:51:19,661-Speed 12796.72 samples/sec Loss 14.7296 LearningRate 0.1317 Epoch: 1 Global Step: 3330 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:51:21,249-Speed 12907.66 samples/sec Loss 14.7478 LearningRate 0.1321 Epoch: 1 Global Step: 3340 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:51:22,821-Speed 13033.69 samples/sec Loss 14.8221 LearningRate 0.1325 Epoch: 1 Global Step: 3350 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:51:24,434-Speed 12709.64 samples/sec Loss 14.6069 LearningRate 0.1329 Epoch: 1 Global Step: 3360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:51:26,024-Speed 12885.70 samples/sec Loss 14.6503 LearningRate 0.1333 Epoch: 1 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:27,584-Speed 13140.04 samples/sec Loss 14.6469 LearningRate 0.1336 Epoch: 1 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:29,155-Speed 13040.12 samples/sec Loss 14.6714 LearningRate 0.1340 Epoch: 1 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:30,747-Speed 12872.15 samples/sec Loss 14.5623 LearningRate 0.1344 Epoch: 1 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:32,323-Speed 13007.14 samples/sec Loss 14.5625 LearningRate 0.1348 Epoch: 1 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:51:33,908-Speed 12934.35 samples/sec Loss 14.6006 LearningRate 0.1352 Epoch: 1 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:35,483-Speed 13010.80 samples/sec Loss 14.5666 LearningRate 0.1356 Epoch: 1 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:37,040-Speed 13153.26 samples/sec Loss 14.3333 LearningRate 0.1360 Epoch: 1 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:38,605-Speed 13097.84 samples/sec Loss 14.4561 LearningRate 0.1364 Epoch: 1 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:40,167-Speed 13122.50 samples/sec Loss 14.5656 LearningRate 0.1368 Epoch: 1 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:41,748-Speed 12962.30 samples/sec Loss 14.3268 LearningRate 0.1372 Epoch: 1 Global Step: 3470 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:51:43,331-Speed 12939.45 samples/sec Loss 14.4077 LearningRate 0.1376 Epoch: 1 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:44,904-Speed 13029.49 samples/sec Loss 14.3864 LearningRate 0.1380 Epoch: 1 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:46,483-Speed 12979.02 samples/sec Loss 14.4114 LearningRate 0.1384 Epoch: 1 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:48,073-Speed 12891.79 samples/sec Loss 14.3826 LearningRate 0.1388 Epoch: 1 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:49,654-Speed 12962.42 samples/sec Loss 14.3027 LearningRate 0.1392 Epoch: 1 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:51,238-Speed 12940.67 samples/sec Loss 14.2057 LearningRate 0.1396 Epoch: 1 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:52,839-Speed 12808.15 samples/sec Loss 14.2602 LearningRate 0.1400 Epoch: 1 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:54,426-Speed 12911.75 samples/sec Loss 14.3118 LearningRate 0.1404 Epoch: 1 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:56,002-Speed 13000.90 samples/sec Loss 14.3275 LearningRate 0.1408 Epoch: 1 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:57,619-Speed 12674.14 samples/sec Loss 14.2366 LearningRate 0.1412 Epoch: 1 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:51:59,203-Speed 12940.99 samples/sec Loss 14.2963 LearningRate 0.1416 Epoch: 1 Global Step: 3580 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:00,772-Speed 13053.89 samples/sec Loss 14.1881 LearningRate 0.1420 Epoch: 1 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:02,387-Speed 12691.78 samples/sec Loss 14.1724 LearningRate 0.1423 Epoch: 1 Global Step: 3600 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:03,931-Speed 13267.61 samples/sec Loss 14.2293 LearningRate 0.1427 Epoch: 1 Global Step: 3610 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:05,505-Speed 13026.19 samples/sec Loss 14.1002 LearningRate 0.1431 Epoch: 1 Global Step: 3620 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:07,096-Speed 12876.09 samples/sec Loss 14.1493 LearningRate 0.1435 Epoch: 1 Global Step: 3630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:08,708-Speed 12716.67 samples/sec Loss 14.0729 LearningRate 0.1439 Epoch: 1 Global Step: 3640 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:10,277-Speed 13065.11 samples/sec Loss 14.0944 LearningRate 0.1443 Epoch: 1 Global Step: 3650 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:11,842-Speed 13088.56 samples/sec Loss 14.0147 LearningRate 0.1447 Epoch: 1 Global Step: 3660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:13,415-Speed 13029.87 samples/sec Loss 14.1677 LearningRate 0.1451 Epoch: 1 Global Step: 3670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:14,982-Speed 13083.66 samples/sec Loss 13.9558 LearningRate 0.1455 Epoch: 1 Global Step: 3680 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:16,571-Speed 12893.37 samples/sec Loss 14.0428 LearningRate 0.1459 Epoch: 1 Global Step: 3690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:18,149-Speed 12987.35 samples/sec Loss 13.9881 LearningRate 0.1463 Epoch: 1 Global Step: 3700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:19,737-Speed 12905.43 samples/sec Loss 13.9478 LearningRate 0.1467 Epoch: 1 Global Step: 3710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:21,295-Speed 13152.67 samples/sec Loss 13.9371 LearningRate 0.1471 Epoch: 1 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:22,900-Speed 12771.96 samples/sec Loss 13.9012 LearningRate 0.1475 Epoch: 1 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:24,507-Speed 12749.36 samples/sec Loss 13.9068 LearningRate 0.1479 Epoch: 1 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:26,078-Speed 13052.48 samples/sec Loss 13.9552 LearningRate 0.1483 Epoch: 1 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:27,648-Speed 13049.14 samples/sec Loss 13.8107 LearningRate 0.1487 Epoch: 1 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:29,245-Speed 12833.10 samples/sec Loss 13.8202 LearningRate 0.1491 Epoch: 1 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:30,804-Speed 13140.97 samples/sec Loss 13.8565 LearningRate 0.1495 Epoch: 1 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:32,363-Speed 13147.37 samples/sec Loss 13.8990 LearningRate 0.1499 Epoch: 1 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:33,970-Speed 12752.33 samples/sec Loss 13.7554 LearningRate 0.1503 Epoch: 1 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:35,558-Speed 12905.80 samples/sec Loss 13.7557 LearningRate 0.1507 Epoch: 1 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:52:37,143-Speed 12928.53 samples/sec Loss 13.6994 LearningRate 0.1510 Epoch: 1 Global Step: 3820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:38,743-Speed 12812.19 samples/sec Loss 13.6919 LearningRate 0.1514 Epoch: 1 Global Step: 3830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:40,343-Speed 12809.86 samples/sec Loss 13.7424 LearningRate 0.1518 Epoch: 1 Global Step: 3840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:41,901-Speed 13150.32 samples/sec Loss 13.6199 LearningRate 0.1522 Epoch: 1 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:43,491-Speed 12886.24 samples/sec Loss 13.5844 LearningRate 0.1526 Epoch: 1 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:45,078-Speed 12916.29 samples/sec Loss 13.6575 LearningRate 0.1530 Epoch: 1 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:46,662-Speed 12932.80 samples/sec Loss 13.5621 LearningRate 0.1534 Epoch: 1 Global Step: 3880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:48,281-Speed 12660.05 samples/sec Loss 13.6840 LearningRate 0.1538 Epoch: 1 Global Step: 3890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:49,853-Speed 13042.46 samples/sec Loss 13.6074 LearningRate 0.1542 Epoch: 1 Global Step: 3900 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:51,433-Speed 12970.99 samples/sec Loss 13.5867 LearningRate 0.1546 Epoch: 1 Global Step: 3910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:52,985-Speed 13209.15 samples/sec Loss 13.5853 LearningRate 0.1550 Epoch: 1 Global Step: 3920 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:54,576-Speed 12880.22 samples/sec Loss 13.5898 LearningRate 0.1554 Epoch: 1 Global Step: 3930 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:56,172-Speed 12839.59 samples/sec Loss 13.4618 LearningRate 0.1558 Epoch: 1 Global Step: 3940 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:57,738-Speed 13091.05 samples/sec Loss 13.4344 LearningRate 0.1562 Epoch: 1 Global Step: 3950 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:52:59,352-Speed 12697.11 samples/sec Loss 13.6207 LearningRate 0.1566 Epoch: 1 Global Step: 3960 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:00,930-Speed 12985.47 samples/sec Loss 13.4876 LearningRate 0.1570 Epoch: 1 Global Step: 3970 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:02,506-Speed 13002.16 samples/sec Loss 13.4503 LearningRate 0.1574 Epoch: 1 Global Step: 3980 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:04,123-Speed 12676.24 samples/sec Loss 13.4749 LearningRate 0.1578 Epoch: 1 Global Step: 3990 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:05,689-Speed 13087.10 samples/sec Loss 13.4119 LearningRate 0.1582 Epoch: 1 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:07,277-Speed 12906.51 samples/sec Loss 13.5177 LearningRate 0.1586 Epoch: 1 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:08,834-Speed 13166.69 samples/sec Loss 13.5065 LearningRate 0.1590 Epoch: 1 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:10,392-Speed 13153.62 samples/sec Loss 13.3530 LearningRate 0.1594 Epoch: 1 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:11,991-Speed 12812.16 samples/sec Loss 13.3322 LearningRate 0.1597 Epoch: 1 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:13,574-Speed 12943.35 samples/sec Loss 13.4775 LearningRate 0.1601 Epoch: 1 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:15,148-Speed 13035.75 samples/sec Loss 13.2498 LearningRate 0.1605 Epoch: 1 Global Step: 4060 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:16,722-Speed 13016.12 samples/sec Loss 13.3842 LearningRate 0.1609 Epoch: 1 Global Step: 4070 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:18,309-Speed 12909.15 samples/sec Loss 13.2978 LearningRate 0.1613 Epoch: 1 Global Step: 4080 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:19,860-Speed 13220.32 samples/sec Loss 13.2935 LearningRate 0.1617 Epoch: 1 Global Step: 4090 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:21,447-Speed 12914.81 samples/sec Loss 13.2731 LearningRate 0.1621 Epoch: 1 Global Step: 4100 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:23,012-Speed 13091.20 samples/sec Loss 13.3779 LearningRate 0.1625 Epoch: 1 Global Step: 4110 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:24,548-Speed 13352.32 samples/sec Loss 13.3562 LearningRate 0.1629 Epoch: 1 Global Step: 4120 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-14 13:53:26,106-Speed 13147.89 samples/sec Loss 13.3413 LearningRate 0.1633 Epoch: 1 Global Step: 4130 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:27,678-Speed 13035.90 samples/sec Loss 13.2554 LearningRate 0.1637 Epoch: 1 Global Step: 4140 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:29,291-Speed 12708.57 samples/sec Loss 13.1896 LearningRate 0.1641 Epoch: 1 Global Step: 4150 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:30,868-Speed 12991.88 samples/sec Loss 13.2881 LearningRate 0.1645 Epoch: 1 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:32,447-Speed 12981.45 samples/sec Loss 13.2704 LearningRate 0.1649 Epoch: 1 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:34,029-Speed 12951.18 samples/sec Loss 13.0955 LearningRate 0.1653 Epoch: 1 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:35,612-Speed 12944.93 samples/sec Loss 13.2636 LearningRate 0.1657 Epoch: 1 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:37,184-Speed 13039.87 samples/sec Loss 13.1098 LearningRate 0.1661 Epoch: 1 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:38,793-Speed 12729.31 samples/sec Loss 13.2398 LearningRate 0.1665 Epoch: 1 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:40,359-Speed 13087.24 samples/sec Loss 13.2064 LearningRate 0.1669 Epoch: 1 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:41,920-Speed 13125.47 samples/sec Loss 13.1811 LearningRate 0.1673 Epoch: 1 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:43,528-Speed 12746.55 samples/sec Loss 13.1616 LearningRate 0.1677 Epoch: 1 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:45,101-Speed 13033.71 samples/sec Loss 13.0914 LearningRate 0.1681 Epoch: 1 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 13:53:46,678-Speed 12992.17 samples/sec Loss 13.0595 LearningRate 0.1684 Epoch: 1 Global Step: 4260 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:48,277-Speed 12818.43 samples/sec Loss 13.0206 LearningRate 0.1688 Epoch: 1 Global Step: 4270 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:49,859-Speed 12955.47 samples/sec Loss 13.0924 LearningRate 0.1692 Epoch: 1 Global Step: 4280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:51,443-Speed 12937.31 samples/sec Loss 13.1164 LearningRate 0.1696 Epoch: 1 Global Step: 4290 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:53,020-Speed 12993.39 samples/sec Loss 13.0640 LearningRate 0.1700 Epoch: 1 Global Step: 4300 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:54,591-Speed 13048.40 samples/sec Loss 13.0487 LearningRate 0.1704 Epoch: 1 Global Step: 4310 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:56,202-Speed 12716.00 samples/sec Loss 13.0570 LearningRate 0.1708 Epoch: 1 Global Step: 4320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:57,752-Speed 13222.08 samples/sec Loss 13.0256 LearningRate 0.1712 Epoch: 1 Global Step: 4330 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:53:59,308-Speed 13172.65 samples/sec Loss 12.9736 LearningRate 0.1716 Epoch: 1 Global Step: 4340 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:00,917-Speed 12742.18 samples/sec Loss 12.9707 LearningRate 0.1720 Epoch: 1 Global Step: 4350 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:02,471-Speed 13179.69 samples/sec Loss 13.0407 LearningRate 0.1724 Epoch: 1 Global Step: 4360 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:04,033-Speed 13126.54 samples/sec Loss 13.0938 LearningRate 0.1728 Epoch: 1 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:05,608-Speed 13008.58 samples/sec Loss 12.9633 LearningRate 0.1732 Epoch: 1 Global Step: 4380 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:07,188-Speed 12972.61 samples/sec Loss 12.8715 LearningRate 0.1736 Epoch: 1 Global Step: 4390 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:08,775-Speed 12916.44 samples/sec Loss 12.9482 LearningRate 0.1740 Epoch: 1 Global Step: 4400 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:10,381-Speed 12752.36 samples/sec Loss 12.8545 LearningRate 0.1744 Epoch: 1 Global Step: 4410 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:11,954-Speed 13036.63 samples/sec Loss 12.8059 LearningRate 0.1748 Epoch: 1 Global Step: 4420 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:13,525-Speed 13045.33 samples/sec Loss 12.8305 LearningRate 0.1752 Epoch: 1 Global Step: 4430 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:15,094-Speed 13059.10 samples/sec Loss 12.9019 LearningRate 0.1756 Epoch: 1 Global Step: 4440 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:16,680-Speed 12916.22 samples/sec Loss 12.8297 LearningRate 0.1760 Epoch: 1 Global Step: 4450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:18,261-Speed 12967.23 samples/sec Loss 12.8689 LearningRate 0.1764 Epoch: 1 Global Step: 4460 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-14 13:54:19,783-Speed 13463.81 samples/sec Loss 12.7675 LearningRate 0.1767 Epoch: 1 Global Step: 4470 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:21,353-Speed 13083.21 samples/sec Loss 12.7870 LearningRate 0.1771 Epoch: 1 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:22,961-Speed 12772.82 samples/sec Loss 12.8776 LearningRate 0.1775 Epoch: 1 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:24,530-Speed 13061.57 samples/sec Loss 12.8122 LearningRate 0.1779 Epoch: 1 Global Step: 4500 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:26,123-Speed 12863.18 samples/sec Loss 12.8159 LearningRate 0.1783 Epoch: 1 Global Step: 4510 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:27,728-Speed 12768.47 samples/sec Loss 12.8993 LearningRate 0.1787 Epoch: 1 Global Step: 4520 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:29,338-Speed 12729.39 samples/sec Loss 12.6938 LearningRate 0.1791 Epoch: 1 Global Step: 4530 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:30,899-Speed 13130.76 samples/sec Loss 12.7414 LearningRate 0.1795 Epoch: 1 Global Step: 4540 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:32,475-Speed 13004.53 samples/sec Loss 12.7905 LearningRate 0.1799 Epoch: 1 Global Step: 4550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:34,086-Speed 12719.39 samples/sec Loss 12.6729 LearningRate 0.1803 Epoch: 1 Global Step: 4560 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:35,664-Speed 12986.32 samples/sec Loss 12.8093 LearningRate 0.1807 Epoch: 1 Global Step: 4570 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:37,220-Speed 13168.40 samples/sec Loss 12.7055 LearningRate 0.1811 Epoch: 1 Global Step: 4580 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:38,787-Speed 13076.35 samples/sec Loss 12.6500 LearningRate 0.1815 Epoch: 1 Global Step: 4590 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:40,375-Speed 12907.11 samples/sec Loss 12.6783 LearningRate 0.1819 Epoch: 1 Global Step: 4600 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:41,979-Speed 12772.29 samples/sec Loss 12.7283 LearningRate 0.1823 Epoch: 1 Global Step: 4610 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:43,556-Speed 12998.66 samples/sec Loss 12.6215 LearningRate 0.1827 Epoch: 1 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:45,171-Speed 12685.16 samples/sec Loss 12.6581 LearningRate 0.1831 Epoch: 1 Global Step: 4630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:46,745-Speed 13024.06 samples/sec Loss 12.4972 LearningRate 0.1835 Epoch: 1 Global Step: 4640 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:48,336-Speed 12879.20 samples/sec Loss 12.5455 LearningRate 0.1839 Epoch: 1 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:49,938-Speed 12796.57 samples/sec Loss 12.6386 LearningRate 0.1843 Epoch: 1 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:51,484-Speed 13259.32 samples/sec Loss 12.5239 LearningRate 0.1847 Epoch: 1 Global Step: 4670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:53,062-Speed 12983.16 samples/sec Loss 12.6768 LearningRate 0.1851 Epoch: 1 Global Step: 4680 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:54,661-Speed 12822.07 samples/sec Loss 12.5659 LearningRate 0.1854 Epoch: 1 Global Step: 4690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:56,233-Speed 13027.94 samples/sec Loss 12.6364 LearningRate 0.1858 Epoch: 1 Global Step: 4700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:57,826-Speed 12862.19 samples/sec Loss 12.5591 LearningRate 0.1862 Epoch: 1 Global Step: 4710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:54:59,393-Speed 13076.88 samples/sec Loss 12.6620 LearningRate 0.1866 Epoch: 1 Global Step: 4720 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:00,980-Speed 12916.85 samples/sec Loss 12.5104 LearningRate 0.1870 Epoch: 1 Global Step: 4730 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:02,566-Speed 12924.35 samples/sec Loss 12.5548 LearningRate 0.1874 Epoch: 1 Global Step: 4740 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:04,172-Speed 12757.48 samples/sec Loss 12.5128 LearningRate 0.1878 Epoch: 1 Global Step: 4750 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:05,748-Speed 13007.36 samples/sec Loss 12.4993 LearningRate 0.1882 Epoch: 1 Global Step: 4760 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:07,347-Speed 12809.16 samples/sec Loss 12.4871 LearningRate 0.1886 Epoch: 1 Global Step: 4770 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-14 13:55:08,917-Speed 13053.01 samples/sec Loss 12.4788 LearningRate 0.1890 Epoch: 1 Global Step: 4780 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:10,513-Speed 12843.74 samples/sec Loss 12.4195 LearningRate 0.1894 Epoch: 1 Global Step: 4790 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:12,080-Speed 13078.74 samples/sec Loss 12.6067 LearningRate 0.1898 Epoch: 1 Global Step: 4800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:13,645-Speed 13109.38 samples/sec Loss 12.3526 LearningRate 0.1902 Epoch: 1 Global Step: 4810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:15,247-Speed 12794.76 samples/sec Loss 12.3823 LearningRate 0.1906 Epoch: 1 Global Step: 4820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:16,824-Speed 12989.55 samples/sec Loss 12.3994 LearningRate 0.1910 Epoch: 1 Global Step: 4830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:18,414-Speed 12894.82 samples/sec Loss 12.3536 LearningRate 0.1914 Epoch: 1 Global Step: 4840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:19,976-Speed 13121.07 samples/sec Loss 12.4491 LearningRate 0.1918 Epoch: 1 Global Step: 4850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:21,532-Speed 13162.23 samples/sec Loss 12.3781 LearningRate 0.1922 Epoch: 1 Global Step: 4860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:23,105-Speed 13034.48 samples/sec Loss 12.2422 LearningRate 0.1926 Epoch: 1 Global Step: 4870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:24,648-Speed 13283.94 samples/sec Loss 12.3701 LearningRate 0.1930 Epoch: 1 Global Step: 4880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:26,218-Speed 13050.41 samples/sec Loss 12.3684 LearningRate 0.1934 Epoch: 1 Global Step: 4890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:27,810-Speed 12871.89 samples/sec Loss 12.4951 LearningRate 0.1938 Epoch: 1 Global Step: 4900 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:29,394-Speed 12936.47 samples/sec Loss 12.3522 LearningRate 0.1941 Epoch: 1 Global Step: 4910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:30,953-Speed 13145.52 samples/sec Loss 12.3487 LearningRate 0.1945 Epoch: 1 Global Step: 4920 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:32,520-Speed 13076.80 samples/sec Loss 12.4997 LearningRate 0.1949 Epoch: 1 Global Step: 4930 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:34,105-Speed 12932.08 samples/sec Loss 12.2744 LearningRate 0.1953 Epoch: 1 Global Step: 4940 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:35,666-Speed 13130.77 samples/sec Loss 12.3437 LearningRate 0.1957 Epoch: 1 Global Step: 4950 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:37,223-Speed 13161.03 samples/sec Loss 12.2828 LearningRate 0.1961 Epoch: 1 Global Step: 4960 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:38,834-Speed 12716.74 samples/sec Loss 12.3308 LearningRate 0.1965 Epoch: 1 Global Step: 4970 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:40,406-Speed 13034.40 samples/sec Loss 12.2594 LearningRate 0.1969 Epoch: 1 Global Step: 4980 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:42,008-Speed 12792.96 samples/sec Loss 12.1819 LearningRate 0.1973 Epoch: 1 Global Step: 4990 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:55:43,575-Speed 13079.61 samples/sec Loss 12.2983 LearningRate 0.1977 Epoch: 1 Global Step: 5000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 13:56:05,924-[lfw][5000]XNorm: 15.919271 Training: 2022-01-14 13:56:05,925-[lfw][5000]Accuracy-Flip: 0.98800+-0.00678 Training: 2022-01-14 13:56:05,926-[lfw][5000]Accuracy-Highest: 0.98800 Training: 2022-01-14 13:56:31,552-[cfp_fp][5000]XNorm: 13.534586 Training: 2022-01-14 13:56:31,553-[cfp_fp][5000]Accuracy-Flip: 0.88957+-0.01629 Training: 2022-01-14 13:56:31,553-[cfp_fp][5000]Accuracy-Highest: 0.88957 Training: 2022-01-14 13:56:53,854-[agedb_30][5000]XNorm: 15.570030 Training: 2022-01-14 13:56:53,855-[agedb_30][5000]Accuracy-Flip: 0.91033+-0.02077 Training: 2022-01-14 13:56:53,855-[agedb_30][5000]Accuracy-Highest: 0.91033 Training: 2022-01-14 13:56:55,448-Speed 284.95 samples/sec Loss 12.2019 LearningRate 0.1981 Epoch: 1 Global Step: 5010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:56:57,019-Speed 13047.45 samples/sec Loss 12.2887 LearningRate 0.1985 Epoch: 1 Global Step: 5020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:56:58,603-Speed 12932.13 samples/sec Loss 12.2180 LearningRate 0.1989 Epoch: 1 Global Step: 5030 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:57:00,149-Speed 13258.09 samples/sec Loss 12.2172 LearningRate 0.1993 Epoch: 1 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:01,770-Speed 12641.03 samples/sec Loss 12.2355 LearningRate 0.1997 Epoch: 1 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:15,919-Speed 1447.75 samples/sec Loss 11.8643 LearningRate 0.2000 Epoch: 2 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:17,545-Speed 12607.79 samples/sec Loss 11.2925 LearningRate 0.2000 Epoch: 2 Global Step: 5070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:19,123-Speed 12986.80 samples/sec Loss 11.2248 LearningRate 0.1999 Epoch: 2 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:20,710-Speed 12916.22 samples/sec Loss 11.1552 LearningRate 0.1999 Epoch: 2 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:22,285-Speed 13020.94 samples/sec Loss 11.1915 LearningRate 0.1998 Epoch: 2 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:23,858-Speed 13030.55 samples/sec Loss 11.2089 LearningRate 0.1998 Epoch: 2 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:25,495-Speed 12515.63 samples/sec Loss 11.2707 LearningRate 0.1997 Epoch: 2 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:27,065-Speed 13051.26 samples/sec Loss 11.1242 LearningRate 0.1997 Epoch: 2 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:28,649-Speed 12934.71 samples/sec Loss 11.2332 LearningRate 0.1997 Epoch: 2 Global Step: 5140 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:57:30,249-Speed 12815.78 samples/sec Loss 11.2961 LearningRate 0.1996 Epoch: 2 Global Step: 5150 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:57:31,816-Speed 13077.63 samples/sec Loss 11.2627 LearningRate 0.1996 Epoch: 2 Global Step: 5160 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:57:33,415-Speed 12816.43 samples/sec Loss 11.3167 LearningRate 0.1995 Epoch: 2 Global Step: 5170 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:57:35,006-Speed 12881.49 samples/sec Loss 11.3932 LearningRate 0.1995 Epoch: 2 Global Step: 5180 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:57:36,612-Speed 12771.50 samples/sec Loss 11.2906 LearningRate 0.1995 Epoch: 2 Global Step: 5190 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:57:38,164-Speed 13201.94 samples/sec Loss 11.2579 LearningRate 0.1994 Epoch: 2 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:39,766-Speed 12797.62 samples/sec Loss 11.3928 LearningRate 0.1994 Epoch: 2 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:41,331-Speed 13088.79 samples/sec Loss 11.4356 LearningRate 0.1993 Epoch: 2 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:42,919-Speed 12909.96 samples/sec Loss 11.2791 LearningRate 0.1993 Epoch: 2 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:44,482-Speed 13110.30 samples/sec Loss 11.2516 LearningRate 0.1992 Epoch: 2 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:46,073-Speed 12875.93 samples/sec Loss 11.2881 LearningRate 0.1992 Epoch: 2 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:47,695-Speed 12632.69 samples/sec Loss 11.2294 LearningRate 0.1992 Epoch: 2 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:49,272-Speed 13001.46 samples/sec Loss 11.4189 LearningRate 0.1991 Epoch: 2 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:57:50,841-Speed 13061.00 samples/sec Loss 11.3937 LearningRate 0.1991 Epoch: 2 Global Step: 5280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:57:52,416-Speed 13015.00 samples/sec Loss 11.3713 LearningRate 0.1990 Epoch: 2 Global Step: 5290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:57:54,030-Speed 12691.93 samples/sec Loss 11.3365 LearningRate 0.1990 Epoch: 2 Global Step: 5300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:57:55,608-Speed 13015.61 samples/sec Loss 11.3988 LearningRate 0.1990 Epoch: 2 Global Step: 5310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:57:57,207-Speed 12814.24 samples/sec Loss 11.3463 LearningRate 0.1989 Epoch: 2 Global Step: 5320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:57:58,777-Speed 13046.54 samples/sec Loss 11.3477 LearningRate 0.1989 Epoch: 2 Global Step: 5330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:58:00,327-Speed 13229.05 samples/sec Loss 11.4501 LearningRate 0.1988 Epoch: 2 Global Step: 5340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:58:01,957-Speed 12569.21 samples/sec Loss 11.5304 LearningRate 0.1988 Epoch: 2 Global Step: 5350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:58:03,557-Speed 12812.66 samples/sec Loss 11.3990 LearningRate 0.1987 Epoch: 2 Global Step: 5360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:58:05,249-Speed 12112.53 samples/sec Loss 11.3781 LearningRate 0.1987 Epoch: 2 Global Step: 5370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:58:06,841-Speed 12874.58 samples/sec Loss 11.3511 LearningRate 0.1987 Epoch: 2 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:08,463-Speed 12640.35 samples/sec Loss 11.4376 LearningRate 0.1986 Epoch: 2 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:10,031-Speed 13065.40 samples/sec Loss 11.3736 LearningRate 0.1986 Epoch: 2 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:11,612-Speed 12963.11 samples/sec Loss 11.4220 LearningRate 0.1985 Epoch: 2 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:13,216-Speed 12779.28 samples/sec Loss 11.3517 LearningRate 0.1985 Epoch: 2 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:14,806-Speed 12894.23 samples/sec Loss 11.4027 LearningRate 0.1985 Epoch: 2 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:16,380-Speed 13021.04 samples/sec Loss 11.2763 LearningRate 0.1984 Epoch: 2 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:17,978-Speed 12823.34 samples/sec Loss 11.3203 LearningRate 0.1984 Epoch: 2 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:19,554-Speed 13005.41 samples/sec Loss 11.3766 LearningRate 0.1983 Epoch: 2 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:21,133-Speed 12972.63 samples/sec Loss 11.3578 LearningRate 0.1983 Epoch: 2 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:22,737-Speed 12773.41 samples/sec Loss 11.2668 LearningRate 0.1982 Epoch: 2 Global Step: 5480 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:58:24,327-Speed 12891.93 samples/sec Loss 11.3472 LearningRate 0.1982 Epoch: 2 Global Step: 5490 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:58:25,918-Speed 12879.05 samples/sec Loss 11.4078 LearningRate 0.1982 Epoch: 2 Global Step: 5500 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:58:27,500-Speed 12958.82 samples/sec Loss 11.3442 LearningRate 0.1981 Epoch: 2 Global Step: 5510 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:58:29,092-Speed 12873.02 samples/sec Loss 11.2933 LearningRate 0.1981 Epoch: 2 Global Step: 5520 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:58:30,669-Speed 12997.33 samples/sec Loss 11.3052 LearningRate 0.1980 Epoch: 2 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:32,239-Speed 13045.46 samples/sec Loss 11.3734 LearningRate 0.1980 Epoch: 2 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:33,849-Speed 12729.76 samples/sec Loss 11.2277 LearningRate 0.1980 Epoch: 2 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:35,422-Speed 13028.63 samples/sec Loss 11.3035 LearningRate 0.1979 Epoch: 2 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:37,008-Speed 12928.23 samples/sec Loss 11.2799 LearningRate 0.1979 Epoch: 2 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:38,585-Speed 12995.23 samples/sec Loss 11.3420 LearningRate 0.1978 Epoch: 2 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:40,192-Speed 12751.10 samples/sec Loss 11.3526 LearningRate 0.1978 Epoch: 2 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:41,794-Speed 12791.94 samples/sec Loss 11.3266 LearningRate 0.1978 Epoch: 2 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:43,383-Speed 12905.17 samples/sec Loss 11.3124 LearningRate 0.1977 Epoch: 2 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:44,966-Speed 12942.11 samples/sec Loss 11.2473 LearningRate 0.1977 Epoch: 2 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:46,539-Speed 13023.78 samples/sec Loss 11.3931 LearningRate 0.1976 Epoch: 2 Global Step: 5630 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 13:58:48,129-Speed 12888.91 samples/sec Loss 11.3310 LearningRate 0.1976 Epoch: 2 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:49,713-Speed 12946.37 samples/sec Loss 11.3651 LearningRate 0.1975 Epoch: 2 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:51,290-Speed 12993.75 samples/sec Loss 11.1986 LearningRate 0.1975 Epoch: 2 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:52,857-Speed 13076.45 samples/sec Loss 11.3360 LearningRate 0.1975 Epoch: 2 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:54,458-Speed 12803.35 samples/sec Loss 11.2384 LearningRate 0.1974 Epoch: 2 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:56,013-Speed 13177.91 samples/sec Loss 11.1383 LearningRate 0.1974 Epoch: 2 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:57,607-Speed 12859.78 samples/sec Loss 11.3394 LearningRate 0.1973 Epoch: 2 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:58:59,220-Speed 12707.27 samples/sec Loss 11.2938 LearningRate 0.1973 Epoch: 2 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:00,771-Speed 13212.54 samples/sec Loss 11.1438 LearningRate 0.1973 Epoch: 2 Global Step: 5720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:02,362-Speed 12878.59 samples/sec Loss 11.3354 LearningRate 0.1972 Epoch: 2 Global Step: 5730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:03,960-Speed 12826.85 samples/sec Loss 11.2809 LearningRate 0.1972 Epoch: 2 Global Step: 5740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:05,528-Speed 13060.34 samples/sec Loss 11.2794 LearningRate 0.1971 Epoch: 2 Global Step: 5750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:07,107-Speed 12988.54 samples/sec Loss 11.2924 LearningRate 0.1971 Epoch: 2 Global Step: 5760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:08,701-Speed 12857.81 samples/sec Loss 11.1278 LearningRate 0.1970 Epoch: 2 Global Step: 5770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:10,298-Speed 12831.25 samples/sec Loss 11.1199 LearningRate 0.1970 Epoch: 2 Global Step: 5780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:11,877-Speed 12982.78 samples/sec Loss 11.1417 LearningRate 0.1970 Epoch: 2 Global Step: 5790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:13,457-Speed 12973.29 samples/sec Loss 11.1540 LearningRate 0.1969 Epoch: 2 Global Step: 5800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:15,029-Speed 13038.91 samples/sec Loss 11.1596 LearningRate 0.1969 Epoch: 2 Global Step: 5810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:16,621-Speed 12870.81 samples/sec Loss 11.1961 LearningRate 0.1968 Epoch: 2 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:18,193-Speed 13037.47 samples/sec Loss 11.1093 LearningRate 0.1968 Epoch: 2 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:19,755-Speed 13124.15 samples/sec Loss 10.9988 LearningRate 0.1968 Epoch: 2 Global Step: 5840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:21,319-Speed 13103.92 samples/sec Loss 11.1734 LearningRate 0.1967 Epoch: 2 Global Step: 5850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:22,899-Speed 12970.93 samples/sec Loss 11.1556 LearningRate 0.1967 Epoch: 2 Global Step: 5860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:24,499-Speed 12812.45 samples/sec Loss 11.0702 LearningRate 0.1966 Epoch: 2 Global Step: 5870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:26,080-Speed 12960.33 samples/sec Loss 11.1837 LearningRate 0.1966 Epoch: 2 Global Step: 5880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:27,653-Speed 13031.60 samples/sec Loss 11.0186 LearningRate 0.1966 Epoch: 2 Global Step: 5890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:29,260-Speed 12758.69 samples/sec Loss 11.0309 LearningRate 0.1965 Epoch: 2 Global Step: 5900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:30,846-Speed 12924.35 samples/sec Loss 11.1662 LearningRate 0.1965 Epoch: 2 Global Step: 5910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:32,476-Speed 12579.43 samples/sec Loss 11.0991 LearningRate 0.1964 Epoch: 2 Global Step: 5920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:34,035-Speed 13143.05 samples/sec Loss 11.0509 LearningRate 0.1964 Epoch: 2 Global Step: 5930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 13:59:35,629-Speed 12856.08 samples/sec Loss 11.0758 LearningRate 0.1963 Epoch: 2 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:37,232-Speed 12780.84 samples/sec Loss 11.1473 LearningRate 0.1963 Epoch: 2 Global Step: 5950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:38,802-Speed 13053.01 samples/sec Loss 11.0745 LearningRate 0.1963 Epoch: 2 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:40,378-Speed 13005.97 samples/sec Loss 11.0461 LearningRate 0.1962 Epoch: 2 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:41,991-Speed 12697.21 samples/sec Loss 10.9499 LearningRate 0.1962 Epoch: 2 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:43,566-Speed 13024.51 samples/sec Loss 11.1165 LearningRate 0.1961 Epoch: 2 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:45,141-Speed 13009.42 samples/sec Loss 11.0808 LearningRate 0.1961 Epoch: 2 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:46,736-Speed 12844.47 samples/sec Loss 10.9797 LearningRate 0.1961 Epoch: 2 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:48,312-Speed 13004.30 samples/sec Loss 11.0349 LearningRate 0.1960 Epoch: 2 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:49,922-Speed 12721.90 samples/sec Loss 10.9977 LearningRate 0.1960 Epoch: 2 Global Step: 6030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 13:59:51,496-Speed 13022.96 samples/sec Loss 11.0801 LearningRate 0.1959 Epoch: 2 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:53,079-Speed 12942.08 samples/sec Loss 11.0357 LearningRate 0.1959 Epoch: 2 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:54,657-Speed 12985.06 samples/sec Loss 10.9806 LearningRate 0.1959 Epoch: 2 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:56,235-Speed 12984.26 samples/sec Loss 10.9996 LearningRate 0.1958 Epoch: 2 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:57,819-Speed 12931.46 samples/sec Loss 10.9950 LearningRate 0.1958 Epoch: 2 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 13:59:59,386-Speed 13080.66 samples/sec Loss 11.0157 LearningRate 0.1957 Epoch: 2 Global Step: 6090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:00,956-Speed 13050.52 samples/sec Loss 10.9960 LearningRate 0.1957 Epoch: 2 Global Step: 6100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:02,558-Speed 12790.68 samples/sec Loss 10.9354 LearningRate 0.1956 Epoch: 2 Global Step: 6110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:04,128-Speed 13054.09 samples/sec Loss 11.0299 LearningRate 0.1956 Epoch: 2 Global Step: 6120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:05,694-Speed 13083.93 samples/sec Loss 11.0327 LearningRate 0.1956 Epoch: 2 Global Step: 6130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:07,266-Speed 13032.48 samples/sec Loss 10.9065 LearningRate 0.1955 Epoch: 2 Global Step: 6140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:08,834-Speed 13087.10 samples/sec Loss 10.9576 LearningRate 0.1955 Epoch: 2 Global Step: 6150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:10,438-Speed 12777.05 samples/sec Loss 11.0449 LearningRate 0.1954 Epoch: 2 Global Step: 6160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:12,038-Speed 12804.67 samples/sec Loss 10.8940 LearningRate 0.1954 Epoch: 2 Global Step: 6170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:13,600-Speed 13142.47 samples/sec Loss 10.7995 LearningRate 0.1954 Epoch: 2 Global Step: 6180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:15,184-Speed 12935.16 samples/sec Loss 10.9190 LearningRate 0.1953 Epoch: 2 Global Step: 6190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:16,741-Speed 13156.20 samples/sec Loss 10.8913 LearningRate 0.1953 Epoch: 2 Global Step: 6200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:18,340-Speed 12816.90 samples/sec Loss 10.9270 LearningRate 0.1952 Epoch: 2 Global Step: 6210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:19,915-Speed 13012.14 samples/sec Loss 10.9278 LearningRate 0.1952 Epoch: 2 Global Step: 6220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:21,487-Speed 13030.23 samples/sec Loss 10.8544 LearningRate 0.1952 Epoch: 2 Global Step: 6230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:23,086-Speed 12812.15 samples/sec Loss 10.8232 LearningRate 0.1951 Epoch: 2 Global Step: 6240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:24,640-Speed 13190.46 samples/sec Loss 10.7520 LearningRate 0.1951 Epoch: 2 Global Step: 6250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:26,195-Speed 13174.77 samples/sec Loss 10.7968 LearningRate 0.1950 Epoch: 2 Global Step: 6260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:27,811-Speed 12681.87 samples/sec Loss 10.9616 LearningRate 0.1950 Epoch: 2 Global Step: 6270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:29,388-Speed 12994.38 samples/sec Loss 10.9046 LearningRate 0.1949 Epoch: 2 Global Step: 6280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:30,966-Speed 12979.55 samples/sec Loss 10.7690 LearningRate 0.1949 Epoch: 2 Global Step: 6290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:32,545-Speed 12979.41 samples/sec Loss 10.8138 LearningRate 0.1949 Epoch: 2 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:34,112-Speed 13072.93 samples/sec Loss 10.7073 LearningRate 0.1948 Epoch: 2 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:35,692-Speed 12970.58 samples/sec Loss 10.8377 LearningRate 0.1948 Epoch: 2 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:37,275-Speed 12955.11 samples/sec Loss 10.7985 LearningRate 0.1947 Epoch: 2 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:38,840-Speed 13089.52 samples/sec Loss 10.8105 LearningRate 0.1947 Epoch: 2 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:40,420-Speed 12967.74 samples/sec Loss 10.8173 LearningRate 0.1947 Epoch: 2 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:41,984-Speed 13211.24 samples/sec Loss 10.7359 LearningRate 0.1946 Epoch: 2 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:43,564-Speed 12969.46 samples/sec Loss 10.6397 LearningRate 0.1946 Epoch: 2 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:45,185-Speed 12634.96 samples/sec Loss 10.7654 LearningRate 0.1945 Epoch: 2 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:46,759-Speed 13024.04 samples/sec Loss 10.7696 LearningRate 0.1945 Epoch: 2 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:00:48,328-Speed 13058.97 samples/sec Loss 10.7187 LearningRate 0.1945 Epoch: 2 Global Step: 6400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:49,890-Speed 13121.62 samples/sec Loss 10.6950 LearningRate 0.1944 Epoch: 2 Global Step: 6410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:00:51,471-Speed 12961.89 samples/sec Loss 10.7340 LearningRate 0.1944 Epoch: 2 Global Step: 6420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:53,031-Speed 13131.58 samples/sec Loss 10.8448 LearningRate 0.1943 Epoch: 2 Global Step: 6430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:54,617-Speed 12922.31 samples/sec Loss 10.6624 LearningRate 0.1943 Epoch: 2 Global Step: 6440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:56,189-Speed 13026.71 samples/sec Loss 10.6696 LearningRate 0.1942 Epoch: 2 Global Step: 6450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:57,767-Speed 12986.35 samples/sec Loss 10.7685 LearningRate 0.1942 Epoch: 2 Global Step: 6460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:00:59,351-Speed 12934.46 samples/sec Loss 10.6586 LearningRate 0.1942 Epoch: 2 Global Step: 6470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:01:00,938-Speed 12914.89 samples/sec Loss 10.6731 LearningRate 0.1941 Epoch: 2 Global Step: 6480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:01:02,508-Speed 13045.15 samples/sec Loss 10.6830 LearningRate 0.1941 Epoch: 2 Global Step: 6490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:01:04,088-Speed 12968.56 samples/sec Loss 10.6732 LearningRate 0.1940 Epoch: 2 Global Step: 6500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:01:05,642-Speed 13189.07 samples/sec Loss 10.7218 LearningRate 0.1940 Epoch: 2 Global Step: 6510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:01:07,217-Speed 13010.48 samples/sec Loss 10.6786 LearningRate 0.1940 Epoch: 2 Global Step: 6520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:08,796-Speed 12977.97 samples/sec Loss 10.6751 LearningRate 0.1939 Epoch: 2 Global Step: 6530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:10,356-Speed 13136.19 samples/sec Loss 10.7322 LearningRate 0.1939 Epoch: 2 Global Step: 6540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:11,937-Speed 12963.67 samples/sec Loss 10.6822 LearningRate 0.1938 Epoch: 2 Global Step: 6550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:13,521-Speed 12932.85 samples/sec Loss 10.6779 LearningRate 0.1938 Epoch: 2 Global Step: 6560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:15,127-Speed 12753.68 samples/sec Loss 10.5104 LearningRate 0.1938 Epoch: 2 Global Step: 6570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:16,728-Speed 12804.00 samples/sec Loss 10.6207 LearningRate 0.1937 Epoch: 2 Global Step: 6580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:18,301-Speed 13028.15 samples/sec Loss 10.6241 LearningRate 0.1937 Epoch: 2 Global Step: 6590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:19,872-Speed 13038.74 samples/sec Loss 10.6074 LearningRate 0.1936 Epoch: 2 Global Step: 6600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:21,439-Speed 13080.57 samples/sec Loss 10.6318 LearningRate 0.1936 Epoch: 2 Global Step: 6610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:01:23,002-Speed 13106.45 samples/sec Loss 10.6381 LearningRate 0.1936 Epoch: 2 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:24,559-Speed 13162.78 samples/sec Loss 10.6209 LearningRate 0.1935 Epoch: 2 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:26,125-Speed 13081.58 samples/sec Loss 10.6104 LearningRate 0.1935 Epoch: 2 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:27,689-Speed 13096.26 samples/sec Loss 10.4955 LearningRate 0.1934 Epoch: 2 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:29,247-Speed 13148.13 samples/sec Loss 10.5378 LearningRate 0.1934 Epoch: 2 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:30,841-Speed 12862.80 samples/sec Loss 10.5850 LearningRate 0.1933 Epoch: 2 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:32,416-Speed 13007.21 samples/sec Loss 10.6507 LearningRate 0.1933 Epoch: 2 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:33,997-Speed 12962.48 samples/sec Loss 10.5047 LearningRate 0.1933 Epoch: 2 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:35,565-Speed 13063.31 samples/sec Loss 10.4910 LearningRate 0.1932 Epoch: 2 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:37,136-Speed 13041.76 samples/sec Loss 10.5178 LearningRate 0.1932 Epoch: 2 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:38,691-Speed 13177.67 samples/sec Loss 10.6078 LearningRate 0.1931 Epoch: 2 Global Step: 6720 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:01:40,281-Speed 12889.08 samples/sec Loss 10.4539 LearningRate 0.1931 Epoch: 2 Global Step: 6730 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:01:41,881-Speed 12814.17 samples/sec Loss 10.5404 LearningRate 0.1931 Epoch: 2 Global Step: 6740 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:01:43,461-Speed 12968.33 samples/sec Loss 10.5601 LearningRate 0.1930 Epoch: 2 Global Step: 6750 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:01:45,039-Speed 12982.48 samples/sec Loss 10.5862 LearningRate 0.1930 Epoch: 2 Global Step: 6760 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:01:46,620-Speed 12963.65 samples/sec Loss 10.4901 LearningRate 0.1929 Epoch: 2 Global Step: 6770 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:01:48,196-Speed 12999.24 samples/sec Loss 10.4784 LearningRate 0.1929 Epoch: 2 Global Step: 6780 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:01:49,765-Speed 13057.16 samples/sec Loss 10.4146 LearningRate 0.1929 Epoch: 2 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:51,364-Speed 12814.80 samples/sec Loss 10.4024 LearningRate 0.1928 Epoch: 2 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:52,956-Speed 12868.55 samples/sec Loss 10.3265 LearningRate 0.1928 Epoch: 2 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:54,538-Speed 12955.17 samples/sec Loss 10.4658 LearningRate 0.1927 Epoch: 2 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:56,121-Speed 12943.27 samples/sec Loss 10.5292 LearningRate 0.1927 Epoch: 2 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:57,698-Speed 12993.69 samples/sec Loss 10.5169 LearningRate 0.1927 Epoch: 2 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:01:59,258-Speed 13136.31 samples/sec Loss 10.4762 LearningRate 0.1926 Epoch: 2 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:00,831-Speed 13027.70 samples/sec Loss 10.5186 LearningRate 0.1926 Epoch: 2 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:02,383-Speed 13202.62 samples/sec Loss 10.4258 LearningRate 0.1925 Epoch: 2 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:03,966-Speed 12940.19 samples/sec Loss 10.4597 LearningRate 0.1925 Epoch: 2 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:05,555-Speed 12889.48 samples/sec Loss 10.3615 LearningRate 0.1924 Epoch: 2 Global Step: 6890 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:02:07,123-Speed 13067.78 samples/sec Loss 10.3891 LearningRate 0.1924 Epoch: 2 Global Step: 6900 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:02:08,712-Speed 12897.33 samples/sec Loss 10.3580 LearningRate 0.1924 Epoch: 2 Global Step: 6910 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:02:10,296-Speed 12935.00 samples/sec Loss 10.3512 LearningRate 0.1923 Epoch: 2 Global Step: 6920 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:02:11,870-Speed 13021.24 samples/sec Loss 10.3382 LearningRate 0.1923 Epoch: 2 Global Step: 6930 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:02:13,413-Speed 13277.34 samples/sec Loss 10.3628 LearningRate 0.1922 Epoch: 2 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:14,978-Speed 13093.59 samples/sec Loss 10.3394 LearningRate 0.1922 Epoch: 2 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:16,551-Speed 13018.76 samples/sec Loss 10.3924 LearningRate 0.1922 Epoch: 2 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:18,141-Speed 12888.51 samples/sec Loss 10.3507 LearningRate 0.1921 Epoch: 2 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:19,697-Speed 13172.86 samples/sec Loss 10.3977 LearningRate 0.1921 Epoch: 2 Global Step: 6980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:21,282-Speed 12925.78 samples/sec Loss 10.3820 LearningRate 0.1920 Epoch: 2 Global Step: 6990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:22,852-Speed 13051.39 samples/sec Loss 10.2552 LearningRate 0.1920 Epoch: 2 Global Step: 7000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:24,433-Speed 12959.05 samples/sec Loss 10.3240 LearningRate 0.1920 Epoch: 2 Global Step: 7010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:26,037-Speed 12774.48 samples/sec Loss 10.2209 LearningRate 0.1919 Epoch: 2 Global Step: 7020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:27,617-Speed 12968.20 samples/sec Loss 10.2971 LearningRate 0.1919 Epoch: 2 Global Step: 7030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:29,187-Speed 13047.96 samples/sec Loss 10.2825 LearningRate 0.1918 Epoch: 2 Global Step: 7040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:30,783-Speed 12848.73 samples/sec Loss 10.3405 LearningRate 0.1918 Epoch: 2 Global Step: 7050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:32,336-Speed 13192.45 samples/sec Loss 10.2978 LearningRate 0.1918 Epoch: 2 Global Step: 7060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:33,947-Speed 12721.03 samples/sec Loss 10.3354 LearningRate 0.1917 Epoch: 2 Global Step: 7070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:35,531-Speed 12928.96 samples/sec Loss 10.3737 LearningRate 0.1917 Epoch: 2 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:37,091-Speed 13139.76 samples/sec Loss 10.3330 LearningRate 0.1916 Epoch: 2 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:02:38,636-Speed 13261.49 samples/sec Loss 10.3672 LearningRate 0.1916 Epoch: 2 Global Step: 7100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:40,209-Speed 13029.14 samples/sec Loss 10.2869 LearningRate 0.1916 Epoch: 2 Global Step: 7110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:41,767-Speed 13147.97 samples/sec Loss 10.3268 LearningRate 0.1915 Epoch: 2 Global Step: 7120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:43,374-Speed 12749.26 samples/sec Loss 10.1983 LearningRate 0.1915 Epoch: 2 Global Step: 7130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:44,960-Speed 12922.31 samples/sec Loss 10.1956 LearningRate 0.1914 Epoch: 2 Global Step: 7140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:46,561-Speed 12799.74 samples/sec Loss 10.3151 LearningRate 0.1914 Epoch: 2 Global Step: 7150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:48,133-Speed 13048.26 samples/sec Loss 10.3582 LearningRate 0.1913 Epoch: 2 Global Step: 7160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:49,683-Speed 13220.07 samples/sec Loss 10.4112 LearningRate 0.1913 Epoch: 2 Global Step: 7170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:51,262-Speed 12976.88 samples/sec Loss 10.2778 LearningRate 0.1913 Epoch: 2 Global Step: 7180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:52,834-Speed 13030.83 samples/sec Loss 10.3130 LearningRate 0.1912 Epoch: 2 Global Step: 7190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:02:54,434-Speed 12804.51 samples/sec Loss 10.2673 LearningRate 0.1912 Epoch: 2 Global Step: 7200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:56,018-Speed 12933.90 samples/sec Loss 10.1462 LearningRate 0.1911 Epoch: 2 Global Step: 7210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:57,609-Speed 12883.58 samples/sec Loss 10.3810 LearningRate 0.1911 Epoch: 2 Global Step: 7220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:02:59,160-Speed 13212.11 samples/sec Loss 10.2709 LearningRate 0.1911 Epoch: 2 Global Step: 7230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:03:00,735-Speed 13006.66 samples/sec Loss 10.2132 LearningRate 0.1910 Epoch: 2 Global Step: 7240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:03:02,295-Speed 13135.86 samples/sec Loss 10.3432 LearningRate 0.1910 Epoch: 2 Global Step: 7250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:03:03,877-Speed 12953.26 samples/sec Loss 10.2112 LearningRate 0.1909 Epoch: 2 Global Step: 7260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:03:05,453-Speed 13001.20 samples/sec Loss 10.1921 LearningRate 0.1909 Epoch: 2 Global Step: 7270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:03:07,010-Speed 13157.20 samples/sec Loss 10.2397 LearningRate 0.1909 Epoch: 2 Global Step: 7280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:03:08,555-Speed 13258.98 samples/sec Loss 10.2370 LearningRate 0.1908 Epoch: 2 Global Step: 7290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:03:10,128-Speed 13029.38 samples/sec Loss 10.1149 LearningRate 0.1908 Epoch: 2 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:11,692-Speed 13099.75 samples/sec Loss 10.2230 LearningRate 0.1907 Epoch: 2 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:13,265-Speed 13021.15 samples/sec Loss 10.2764 LearningRate 0.1907 Epoch: 2 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:14,835-Speed 13053.15 samples/sec Loss 10.2255 LearningRate 0.1907 Epoch: 2 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:16,402-Speed 13072.32 samples/sec Loss 10.1759 LearningRate 0.1906 Epoch: 2 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:17,974-Speed 13036.18 samples/sec Loss 10.2325 LearningRate 0.1906 Epoch: 2 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:19,552-Speed 12989.56 samples/sec Loss 10.1694 LearningRate 0.1905 Epoch: 2 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:21,129-Speed 12992.08 samples/sec Loss 10.2152 LearningRate 0.1905 Epoch: 2 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:22,729-Speed 12804.97 samples/sec Loss 10.2042 LearningRate 0.1905 Epoch: 2 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:24,295-Speed 13082.86 samples/sec Loss 10.1238 LearningRate 0.1904 Epoch: 2 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:25,847-Speed 13199.96 samples/sec Loss 10.0913 LearningRate 0.1904 Epoch: 2 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:03:27,418-Speed 13048.48 samples/sec Loss 10.1998 LearningRate 0.1903 Epoch: 2 Global Step: 7410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:28,974-Speed 13164.28 samples/sec Loss 10.0830 LearningRate 0.1903 Epoch: 2 Global Step: 7420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:30,560-Speed 12920.72 samples/sec Loss 10.0722 LearningRate 0.1902 Epoch: 2 Global Step: 7430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:32,139-Speed 12994.84 samples/sec Loss 10.1651 LearningRate 0.1902 Epoch: 2 Global Step: 7440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:33,737-Speed 12818.91 samples/sec Loss 10.1437 LearningRate 0.1902 Epoch: 2 Global Step: 7450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:35,322-Speed 12926.77 samples/sec Loss 10.0992 LearningRate 0.1901 Epoch: 2 Global Step: 7460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:36,905-Speed 12943.20 samples/sec Loss 10.1333 LearningRate 0.1901 Epoch: 2 Global Step: 7470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:38,477-Speed 13032.56 samples/sec Loss 10.1482 LearningRate 0.1900 Epoch: 2 Global Step: 7480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:40,063-Speed 12922.54 samples/sec Loss 10.0919 LearningRate 0.1900 Epoch: 2 Global Step: 7490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:41,647-Speed 12930.33 samples/sec Loss 10.2063 LearningRate 0.1900 Epoch: 2 Global Step: 7500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:03:43,241-Speed 12866.29 samples/sec Loss 10.0969 LearningRate 0.1899 Epoch: 2 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:03:44,794-Speed 13194.51 samples/sec Loss 10.0705 LearningRate 0.1899 Epoch: 2 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:03:46,361-Speed 13077.61 samples/sec Loss 10.0740 LearningRate 0.1898 Epoch: 2 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:03:47,930-Speed 13057.84 samples/sec Loss 10.0836 LearningRate 0.1898 Epoch: 2 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:03:49,517-Speed 12920.57 samples/sec Loss 10.1228 LearningRate 0.1898 Epoch: 2 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:03:51,058-Speed 13291.72 samples/sec Loss 10.0698 LearningRate 0.1897 Epoch: 2 Global Step: 7560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:03:52,634-Speed 12997.06 samples/sec Loss 9.9978 LearningRate 0.1897 Epoch: 2 Global Step: 7570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:03:54,229-Speed 12847.58 samples/sec Loss 10.1737 LearningRate 0.1896 Epoch: 2 Global Step: 7580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:04:09,878-Speed 1308.87 samples/sec Loss 9.7405 LearningRate 0.1896 Epoch: 3 Global Step: 7590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:04:11,529-Speed 12414.17 samples/sec Loss 9.0742 LearningRate 0.1896 Epoch: 3 Global Step: 7600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:04:13,115-Speed 12921.73 samples/sec Loss 9.0510 LearningRate 0.1895 Epoch: 3 Global Step: 7610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:04:14,688-Speed 13021.11 samples/sec Loss 9.0729 LearningRate 0.1895 Epoch: 3 Global Step: 7620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:04:16,287-Speed 12816.50 samples/sec Loss 9.1192 LearningRate 0.1894 Epoch: 3 Global Step: 7630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:04:17,879-Speed 12970.12 samples/sec Loss 9.1479 LearningRate 0.1894 Epoch: 3 Global Step: 7640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:04:19,469-Speed 12887.42 samples/sec Loss 9.1648 LearningRate 0.1894 Epoch: 3 Global Step: 7650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 14:04:21,112-Speed 12465.80 samples/sec Loss 9.1555 LearningRate 0.1893 Epoch: 3 Global Step: 7660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:22,689-Speed 13000.51 samples/sec Loss 9.1550 LearningRate 0.1893 Epoch: 3 Global Step: 7670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:24,273-Speed 12934.43 samples/sec Loss 9.1356 LearningRate 0.1892 Epoch: 3 Global Step: 7680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:25,866-Speed 12859.06 samples/sec Loss 9.1629 LearningRate 0.1892 Epoch: 3 Global Step: 7690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:27,449-Speed 12945.78 samples/sec Loss 9.3043 LearningRate 0.1892 Epoch: 3 Global Step: 7700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:29,029-Speed 12967.74 samples/sec Loss 9.1467 LearningRate 0.1891 Epoch: 3 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:30,636-Speed 12760.52 samples/sec Loss 9.1556 LearningRate 0.1891 Epoch: 3 Global Step: 7720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:32,224-Speed 12899.74 samples/sec Loss 9.2555 LearningRate 0.1890 Epoch: 3 Global Step: 7730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:33,811-Speed 12922.90 samples/sec Loss 9.3038 LearningRate 0.1890 Epoch: 3 Global Step: 7740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:35,404-Speed 12860.76 samples/sec Loss 9.3159 LearningRate 0.1890 Epoch: 3 Global Step: 7750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:04:36,980-Speed 13001.06 samples/sec Loss 9.3357 LearningRate 0.1889 Epoch: 3 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:38,592-Speed 12709.35 samples/sec Loss 9.3233 LearningRate 0.1889 Epoch: 3 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:40,181-Speed 12897.41 samples/sec Loss 9.3228 LearningRate 0.1888 Epoch: 3 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:41,745-Speed 13100.96 samples/sec Loss 9.2371 LearningRate 0.1888 Epoch: 3 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:43,335-Speed 12883.43 samples/sec Loss 9.3279 LearningRate 0.1887 Epoch: 3 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:44,962-Speed 12595.52 samples/sec Loss 9.3453 LearningRate 0.1887 Epoch: 3 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:46,541-Speed 12979.68 samples/sec Loss 9.2769 LearningRate 0.1887 Epoch: 3 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:48,118-Speed 13016.57 samples/sec Loss 9.3512 LearningRate 0.1886 Epoch: 3 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:49,692-Speed 13017.73 samples/sec Loss 9.3392 LearningRate 0.1886 Epoch: 3 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:51,288-Speed 12834.42 samples/sec Loss 9.1925 LearningRate 0.1885 Epoch: 3 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:04:52,896-Speed 12745.50 samples/sec Loss 9.4411 LearningRate 0.1885 Epoch: 3 Global Step: 7860 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:04:54,490-Speed 12877.65 samples/sec Loss 9.3567 LearningRate 0.1885 Epoch: 3 Global Step: 7870 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:04:56,079-Speed 12895.27 samples/sec Loss 9.4031 LearningRate 0.1884 Epoch: 3 Global Step: 7880 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:04:57,672-Speed 12868.04 samples/sec Loss 9.3373 LearningRate 0.1884 Epoch: 3 Global Step: 7890 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:04:59,245-Speed 13023.67 samples/sec Loss 9.4903 LearningRate 0.1883 Epoch: 3 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:00,826-Speed 12963.56 samples/sec Loss 9.4531 LearningRate 0.1883 Epoch: 3 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:02,422-Speed 12830.17 samples/sec Loss 9.3615 LearningRate 0.1883 Epoch: 3 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:03,995-Speed 13031.10 samples/sec Loss 9.3940 LearningRate 0.1882 Epoch: 3 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:05,587-Speed 12864.12 samples/sec Loss 9.4219 LearningRate 0.1882 Epoch: 3 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:07,187-Speed 12828.61 samples/sec Loss 9.4109 LearningRate 0.1881 Epoch: 3 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:08,755-Speed 13068.24 samples/sec Loss 9.5057 LearningRate 0.1881 Epoch: 3 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:10,328-Speed 13029.28 samples/sec Loss 9.5401 LearningRate 0.1881 Epoch: 3 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:11,888-Speed 13128.18 samples/sec Loss 9.4494 LearningRate 0.1880 Epoch: 3 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:13,463-Speed 13013.08 samples/sec Loss 9.5936 LearningRate 0.1880 Epoch: 3 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:15,061-Speed 12821.06 samples/sec Loss 9.4620 LearningRate 0.1879 Epoch: 3 Global Step: 8000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:05:16,637-Speed 13001.21 samples/sec Loss 9.4339 LearningRate 0.1879 Epoch: 3 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:18,228-Speed 12883.46 samples/sec Loss 9.4700 LearningRate 0.1879 Epoch: 3 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:19,829-Speed 12801.59 samples/sec Loss 9.5169 LearningRate 0.1878 Epoch: 3 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:21,418-Speed 12898.64 samples/sec Loss 9.4304 LearningRate 0.1878 Epoch: 3 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:22,984-Speed 13084.07 samples/sec Loss 9.4217 LearningRate 0.1877 Epoch: 3 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:05:24,565-Speed 12956.49 samples/sec Loss 9.5733 LearningRate 0.1877 Epoch: 3 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:05:26,158-Speed 12868.00 samples/sec Loss 9.5290 LearningRate 0.1877 Epoch: 3 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:05:27,701-Speed 13275.18 samples/sec Loss 9.5141 LearningRate 0.1876 Epoch: 3 Global Step: 8080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:29,311-Speed 12723.42 samples/sec Loss 9.3696 LearningRate 0.1876 Epoch: 3 Global Step: 8090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:30,893-Speed 12952.90 samples/sec Loss 9.4371 LearningRate 0.1875 Epoch: 3 Global Step: 8100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:32,468-Speed 13010.06 samples/sec Loss 9.5079 LearningRate 0.1875 Epoch: 3 Global Step: 8110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:34,054-Speed 12915.00 samples/sec Loss 9.4681 LearningRate 0.1875 Epoch: 3 Global Step: 8120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:35,634-Speed 12969.27 samples/sec Loss 9.5678 LearningRate 0.1874 Epoch: 3 Global Step: 8130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:37,209-Speed 13012.16 samples/sec Loss 9.5332 LearningRate 0.1874 Epoch: 3 Global Step: 8140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:38,771-Speed 13122.84 samples/sec Loss 9.4797 LearningRate 0.1873 Epoch: 3 Global Step: 8150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:40,338-Speed 13074.26 samples/sec Loss 9.4264 LearningRate 0.1873 Epoch: 3 Global Step: 8160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:41,919-Speed 12957.70 samples/sec Loss 9.5080 LearningRate 0.1873 Epoch: 3 Global Step: 8170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:05:43,505-Speed 12924.10 samples/sec Loss 9.6065 LearningRate 0.1872 Epoch: 3 Global Step: 8180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:45,055-Speed 13215.29 samples/sec Loss 9.5146 LearningRate 0.1872 Epoch: 3 Global Step: 8190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:46,634-Speed 12975.61 samples/sec Loss 9.4205 LearningRate 0.1871 Epoch: 3 Global Step: 8200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:48,220-Speed 12920.17 samples/sec Loss 9.4819 LearningRate 0.1871 Epoch: 3 Global Step: 8210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:49,812-Speed 12878.60 samples/sec Loss 9.5172 LearningRate 0.1871 Epoch: 3 Global Step: 8220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:51,377-Speed 13085.33 samples/sec Loss 9.4260 LearningRate 0.1870 Epoch: 3 Global Step: 8230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:52,958-Speed 12962.69 samples/sec Loss 9.5588 LearningRate 0.1870 Epoch: 3 Global Step: 8240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:54,533-Speed 13012.19 samples/sec Loss 9.4856 LearningRate 0.1869 Epoch: 3 Global Step: 8250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:56,125-Speed 12868.47 samples/sec Loss 9.5957 LearningRate 0.1869 Epoch: 3 Global Step: 8260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:57,703-Speed 12986.38 samples/sec Loss 9.4576 LearningRate 0.1869 Epoch: 3 Global Step: 8270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:05:59,295-Speed 12869.33 samples/sec Loss 9.6459 LearningRate 0.1868 Epoch: 3 Global Step: 8280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:00,867-Speed 13057.48 samples/sec Loss 9.4431 LearningRate 0.1868 Epoch: 3 Global Step: 8290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:02,437-Speed 13052.71 samples/sec Loss 9.4207 LearningRate 0.1867 Epoch: 3 Global Step: 8300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:04,025-Speed 12896.30 samples/sec Loss 9.5564 LearningRate 0.1867 Epoch: 3 Global Step: 8310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:05,613-Speed 12908.46 samples/sec Loss 9.4647 LearningRate 0.1867 Epoch: 3 Global Step: 8320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:07,185-Speed 13036.52 samples/sec Loss 9.4619 LearningRate 0.1866 Epoch: 3 Global Step: 8330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:08,767-Speed 12946.22 samples/sec Loss 9.5222 LearningRate 0.1866 Epoch: 3 Global Step: 8340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:10,328-Speed 13127.86 samples/sec Loss 9.5970 LearningRate 0.1865 Epoch: 3 Global Step: 8350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:11,911-Speed 12943.36 samples/sec Loss 9.5661 LearningRate 0.1865 Epoch: 3 Global Step: 8360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:13,504-Speed 12866.09 samples/sec Loss 9.5805 LearningRate 0.1865 Epoch: 3 Global Step: 8370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:15,093-Speed 12892.38 samples/sec Loss 9.4682 LearningRate 0.1864 Epoch: 3 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:16,656-Speed 13108.24 samples/sec Loss 9.5686 LearningRate 0.1864 Epoch: 3 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:18,243-Speed 12909.99 samples/sec Loss 9.5388 LearningRate 0.1863 Epoch: 3 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:19,818-Speed 13007.99 samples/sec Loss 9.5133 LearningRate 0.1863 Epoch: 3 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:21,408-Speed 12889.19 samples/sec Loss 9.5621 LearningRate 0.1863 Epoch: 3 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:22,983-Speed 13012.65 samples/sec Loss 9.5957 LearningRate 0.1862 Epoch: 3 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:24,557-Speed 13017.56 samples/sec Loss 9.4804 LearningRate 0.1862 Epoch: 3 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:26,146-Speed 12891.76 samples/sec Loss 9.4316 LearningRate 0.1861 Epoch: 3 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:27,700-Speed 13185.52 samples/sec Loss 9.5069 LearningRate 0.1861 Epoch: 3 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:29,288-Speed 12902.03 samples/sec Loss 9.6285 LearningRate 0.1861 Epoch: 3 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:30,851-Speed 13114.20 samples/sec Loss 9.5713 LearningRate 0.1860 Epoch: 3 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:32,421-Speed 13044.25 samples/sec Loss 9.4959 LearningRate 0.1860 Epoch: 3 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:34,002-Speed 12963.47 samples/sec Loss 9.4389 LearningRate 0.1859 Epoch: 3 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:35,569-Speed 13074.18 samples/sec Loss 9.5367 LearningRate 0.1859 Epoch: 3 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:37,163-Speed 12861.49 samples/sec Loss 9.5335 LearningRate 0.1858 Epoch: 3 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:38,753-Speed 12878.68 samples/sec Loss 9.5388 LearningRate 0.1858 Epoch: 3 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:40,320-Speed 13078.90 samples/sec Loss 9.6338 LearningRate 0.1858 Epoch: 3 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:41,888-Speed 13063.13 samples/sec Loss 9.5418 LearningRate 0.1857 Epoch: 3 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:06:43,467-Speed 12984.33 samples/sec Loss 9.5242 LearningRate 0.1857 Epoch: 3 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:45,045-Speed 12978.67 samples/sec Loss 9.5931 LearningRate 0.1856 Epoch: 3 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:46,629-Speed 12933.48 samples/sec Loss 9.5318 LearningRate 0.1856 Epoch: 3 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:48,227-Speed 12825.34 samples/sec Loss 9.4576 LearningRate 0.1856 Epoch: 3 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:49,814-Speed 12909.08 samples/sec Loss 9.4113 LearningRate 0.1855 Epoch: 3 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:51,403-Speed 12900.76 samples/sec Loss 9.5175 LearningRate 0.1855 Epoch: 3 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:52,959-Speed 13163.87 samples/sec Loss 9.4523 LearningRate 0.1854 Epoch: 3 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:54,525-Speed 13083.63 samples/sec Loss 9.4752 LearningRate 0.1854 Epoch: 3 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:56,111-Speed 12919.91 samples/sec Loss 9.4820 LearningRate 0.1854 Epoch: 3 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:57,701-Speed 12888.13 samples/sec Loss 9.4467 LearningRate 0.1853 Epoch: 3 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:06:59,270-Speed 13060.35 samples/sec Loss 9.4568 LearningRate 0.1853 Epoch: 3 Global Step: 8660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:00,843-Speed 13024.25 samples/sec Loss 9.4322 LearningRate 0.1852 Epoch: 3 Global Step: 8670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:02,426-Speed 12939.67 samples/sec Loss 9.4676 LearningRate 0.1852 Epoch: 3 Global Step: 8680 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:04,010-Speed 12941.70 samples/sec Loss 9.5043 LearningRate 0.1852 Epoch: 3 Global Step: 8690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:05,597-Speed 12905.91 samples/sec Loss 9.5018 LearningRate 0.1851 Epoch: 3 Global Step: 8700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:07,205-Speed 12744.83 samples/sec Loss 9.4858 LearningRate 0.1851 Epoch: 3 Global Step: 8710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:08,782-Speed 12992.71 samples/sec Loss 9.5317 LearningRate 0.1850 Epoch: 3 Global Step: 8720 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:10,339-Speed 13165.64 samples/sec Loss 9.6149 LearningRate 0.1850 Epoch: 3 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:11,914-Speed 13002.21 samples/sec Loss 9.5279 LearningRate 0.1850 Epoch: 3 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:13,486-Speed 13034.94 samples/sec Loss 9.4873 LearningRate 0.1849 Epoch: 3 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:15,045-Speed 13142.66 samples/sec Loss 9.4296 LearningRate 0.1849 Epoch: 3 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:16,632-Speed 12911.24 samples/sec Loss 9.5129 LearningRate 0.1848 Epoch: 3 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:18,221-Speed 12899.32 samples/sec Loss 9.5691 LearningRate 0.1848 Epoch: 3 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:19,802-Speed 12959.09 samples/sec Loss 9.4152 LearningRate 0.1848 Epoch: 3 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:21,391-Speed 12895.66 samples/sec Loss 9.4625 LearningRate 0.1847 Epoch: 3 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:22,973-Speed 12949.07 samples/sec Loss 9.3570 LearningRate 0.1847 Epoch: 3 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:24,544-Speed 13047.43 samples/sec Loss 9.3529 LearningRate 0.1846 Epoch: 3 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:26,147-Speed 12785.15 samples/sec Loss 9.4889 LearningRate 0.1846 Epoch: 3 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:27,702-Speed 13174.08 samples/sec Loss 9.4542 LearningRate 0.1846 Epoch: 3 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:29,288-Speed 12913.87 samples/sec Loss 9.3507 LearningRate 0.1845 Epoch: 3 Global Step: 8850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:30,902-Speed 12700.70 samples/sec Loss 9.4188 LearningRate 0.1845 Epoch: 3 Global Step: 8860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:32,479-Speed 12986.23 samples/sec Loss 9.3835 LearningRate 0.1844 Epoch: 3 Global Step: 8870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:34,044-Speed 13094.90 samples/sec Loss 9.3805 LearningRate 0.1844 Epoch: 3 Global Step: 8880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:35,633-Speed 12899.03 samples/sec Loss 9.3384 LearningRate 0.1844 Epoch: 3 Global Step: 8890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:37,221-Speed 12907.61 samples/sec Loss 9.4054 LearningRate 0.1843 Epoch: 3 Global Step: 8900 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:38,784-Speed 13106.10 samples/sec Loss 9.4449 LearningRate 0.1843 Epoch: 3 Global Step: 8910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:40,366-Speed 12948.94 samples/sec Loss 9.4372 LearningRate 0.1842 Epoch: 3 Global Step: 8920 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:41,960-Speed 12851.86 samples/sec Loss 9.3805 LearningRate 0.1842 Epoch: 3 Global Step: 8930 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-14 14:07:43,528-Speed 13066.82 samples/sec Loss 9.4949 LearningRate 0.1842 Epoch: 3 Global Step: 8940 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:45,117-Speed 12894.85 samples/sec Loss 9.4177 LearningRate 0.1841 Epoch: 3 Global Step: 8950 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:46,689-Speed 13036.98 samples/sec Loss 9.3755 LearningRate 0.1841 Epoch: 3 Global Step: 8960 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:07:48,249-Speed 13134.22 samples/sec Loss 9.4823 LearningRate 0.1840 Epoch: 3 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:49,834-Speed 12936.97 samples/sec Loss 9.4373 LearningRate 0.1840 Epoch: 3 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:51,406-Speed 13032.75 samples/sec Loss 9.4423 LearningRate 0.1840 Epoch: 3 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:52,991-Speed 12930.43 samples/sec Loss 9.5446 LearningRate 0.1839 Epoch: 3 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:54,599-Speed 12741.46 samples/sec Loss 9.4013 LearningRate 0.1839 Epoch: 3 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:56,149-Speed 13222.83 samples/sec Loss 9.3791 LearningRate 0.1838 Epoch: 3 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:57,718-Speed 13051.57 samples/sec Loss 9.4711 LearningRate 0.1838 Epoch: 3 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:07:59,284-Speed 13092.12 samples/sec Loss 9.4073 LearningRate 0.1838 Epoch: 3 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:00,860-Speed 13000.71 samples/sec Loss 9.4442 LearningRate 0.1837 Epoch: 3 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:02,424-Speed 13094.60 samples/sec Loss 9.4173 LearningRate 0.1837 Epoch: 3 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:04,005-Speed 12964.32 samples/sec Loss 9.4250 LearningRate 0.1836 Epoch: 3 Global Step: 9070 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:05,580-Speed 13002.11 samples/sec Loss 9.4383 LearningRate 0.1836 Epoch: 3 Global Step: 9080 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:07,157-Speed 12996.50 samples/sec Loss 9.3956 LearningRate 0.1836 Epoch: 3 Global Step: 9090 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:08,736-Speed 12974.77 samples/sec Loss 9.4255 LearningRate 0.1835 Epoch: 3 Global Step: 9100 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:10,324-Speed 12898.38 samples/sec Loss 9.4879 LearningRate 0.1835 Epoch: 3 Global Step: 9110 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:11,903-Speed 12983.85 samples/sec Loss 9.5255 LearningRate 0.1835 Epoch: 3 Global Step: 9120 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:13,496-Speed 12861.65 samples/sec Loss 9.2777 LearningRate 0.1834 Epoch: 3 Global Step: 9130 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:15,070-Speed 13017.03 samples/sec Loss 9.2952 LearningRate 0.1834 Epoch: 3 Global Step: 9140 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:16,630-Speed 13140.77 samples/sec Loss 9.3269 LearningRate 0.1833 Epoch: 3 Global Step: 9150 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:18,188-Speed 13147.40 samples/sec Loss 9.3392 LearningRate 0.1833 Epoch: 3 Global Step: 9160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:19,778-Speed 12891.07 samples/sec Loss 9.3990 LearningRate 0.1833 Epoch: 3 Global Step: 9170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:21,341-Speed 13110.32 samples/sec Loss 9.2865 LearningRate 0.1832 Epoch: 3 Global Step: 9180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:22,908-Speed 13078.38 samples/sec Loss 9.3105 LearningRate 0.1832 Epoch: 3 Global Step: 9190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:24,460-Speed 13199.83 samples/sec Loss 9.4043 LearningRate 0.1831 Epoch: 3 Global Step: 9200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:26,053-Speed 12864.13 samples/sec Loss 9.2769 LearningRate 0.1831 Epoch: 3 Global Step: 9210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:27,647-Speed 12857.50 samples/sec Loss 9.3557 LearningRate 0.1831 Epoch: 3 Global Step: 9220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:29,208-Speed 13124.98 samples/sec Loss 9.2381 LearningRate 0.1830 Epoch: 3 Global Step: 9230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:30,774-Speed 13087.63 samples/sec Loss 9.2523 LearningRate 0.1830 Epoch: 3 Global Step: 9240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:32,359-Speed 12926.21 samples/sec Loss 9.2285 LearningRate 0.1829 Epoch: 3 Global Step: 9250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:08:33,947-Speed 12902.69 samples/sec Loss 9.4028 LearningRate 0.1829 Epoch: 3 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:35,520-Speed 13023.57 samples/sec Loss 9.3237 LearningRate 0.1829 Epoch: 3 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:37,112-Speed 12877.15 samples/sec Loss 9.2321 LearningRate 0.1828 Epoch: 3 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:38,682-Speed 13044.77 samples/sec Loss 9.2874 LearningRate 0.1828 Epoch: 3 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:40,261-Speed 12975.97 samples/sec Loss 9.3631 LearningRate 0.1827 Epoch: 3 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:41,872-Speed 12721.66 samples/sec Loss 9.4601 LearningRate 0.1827 Epoch: 3 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:43,430-Speed 13155.36 samples/sec Loss 9.4157 LearningRate 0.1827 Epoch: 3 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:45,009-Speed 12972.61 samples/sec Loss 9.3195 LearningRate 0.1826 Epoch: 3 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:46,565-Speed 13174.28 samples/sec Loss 9.3583 LearningRate 0.1826 Epoch: 3 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:48,168-Speed 12774.83 samples/sec Loss 9.3223 LearningRate 0.1825 Epoch: 3 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:08:49,729-Speed 13131.83 samples/sec Loss 9.3586 LearningRate 0.1825 Epoch: 3 Global Step: 9360 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:51,310-Speed 12955.01 samples/sec Loss 9.2828 LearningRate 0.1825 Epoch: 3 Global Step: 9370 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:52,880-Speed 13053.90 samples/sec Loss 9.3209 LearningRate 0.1824 Epoch: 3 Global Step: 9380 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:54,454-Speed 13019.98 samples/sec Loss 9.3117 LearningRate 0.1824 Epoch: 3 Global Step: 9390 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:56,021-Speed 13077.73 samples/sec Loss 9.3184 LearningRate 0.1823 Epoch: 3 Global Step: 9400 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:57,597-Speed 13006.87 samples/sec Loss 9.3921 LearningRate 0.1823 Epoch: 3 Global Step: 9410 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:08:59,169-Speed 13035.33 samples/sec Loss 9.1835 LearningRate 0.1823 Epoch: 3 Global Step: 9420 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:00,752-Speed 12941.19 samples/sec Loss 9.3008 LearningRate 0.1822 Epoch: 3 Global Step: 9430 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:02,327-Speed 13008.90 samples/sec Loss 9.3642 LearningRate 0.1822 Epoch: 3 Global Step: 9440 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:03,905-Speed 12984.80 samples/sec Loss 9.3589 LearningRate 0.1821 Epoch: 3 Global Step: 9450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:05,475-Speed 13053.13 samples/sec Loss 9.3776 LearningRate 0.1821 Epoch: 3 Global Step: 9460 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:07,071-Speed 12833.70 samples/sec Loss 9.2914 LearningRate 0.1821 Epoch: 3 Global Step: 9470 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:08,635-Speed 13102.75 samples/sec Loss 9.2629 LearningRate 0.1820 Epoch: 3 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:10,245-Speed 12726.34 samples/sec Loss 9.3123 LearningRate 0.1820 Epoch: 3 Global Step: 9490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:11,798-Speed 13189.38 samples/sec Loss 9.0901 LearningRate 0.1819 Epoch: 3 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:13,384-Speed 12919.05 samples/sec Loss 9.1790 LearningRate 0.1819 Epoch: 3 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:14,954-Speed 13054.38 samples/sec Loss 9.3460 LearningRate 0.1819 Epoch: 3 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:16,513-Speed 13137.20 samples/sec Loss 9.2810 LearningRate 0.1818 Epoch: 3 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:18,087-Speed 13020.92 samples/sec Loss 9.2238 LearningRate 0.1818 Epoch: 3 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:19,669-Speed 12955.85 samples/sec Loss 9.2394 LearningRate 0.1817 Epoch: 3 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:21,236-Speed 13074.54 samples/sec Loss 9.2482 LearningRate 0.1817 Epoch: 3 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:22,807-Speed 13045.32 samples/sec Loss 9.2200 LearningRate 0.1817 Epoch: 3 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:24,405-Speed 12818.18 samples/sec Loss 9.1228 LearningRate 0.1816 Epoch: 3 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:25,971-Speed 13085.21 samples/sec Loss 9.1718 LearningRate 0.1816 Epoch: 3 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:27,578-Speed 12752.87 samples/sec Loss 9.3825 LearningRate 0.1815 Epoch: 3 Global Step: 9600 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:29,148-Speed 13049.30 samples/sec Loss 9.2222 LearningRate 0.1815 Epoch: 3 Global Step: 9610 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:30,713-Speed 13096.91 samples/sec Loss 9.3004 LearningRate 0.1815 Epoch: 3 Global Step: 9620 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:32,291-Speed 12983.64 samples/sec Loss 9.2649 LearningRate 0.1814 Epoch: 3 Global Step: 9630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:33,916-Speed 12600.69 samples/sec Loss 9.1819 LearningRate 0.1814 Epoch: 3 Global Step: 9640 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:35,515-Speed 12820.41 samples/sec Loss 9.2777 LearningRate 0.1813 Epoch: 3 Global Step: 9650 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:37,094-Speed 12976.07 samples/sec Loss 9.3041 LearningRate 0.1813 Epoch: 3 Global Step: 9660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:38,656-Speed 13118.71 samples/sec Loss 9.3396 LearningRate 0.1813 Epoch: 3 Global Step: 9670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:40,249-Speed 12861.54 samples/sec Loss 9.3120 LearningRate 0.1812 Epoch: 3 Global Step: 9680 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:41,860-Speed 12717.90 samples/sec Loss 9.0971 LearningRate 0.1812 Epoch: 3 Global Step: 9690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:43,431-Speed 13046.98 samples/sec Loss 9.1646 LearningRate 0.1811 Epoch: 3 Global Step: 9700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:45,015-Speed 12935.67 samples/sec Loss 9.2561 LearningRate 0.1811 Epoch: 3 Global Step: 9710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:46,570-Speed 13177.27 samples/sec Loss 9.1874 LearningRate 0.1811 Epoch: 3 Global Step: 9720 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:09:48,138-Speed 13059.76 samples/sec Loss 9.1214 LearningRate 0.1810 Epoch: 3 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:49,731-Speed 12865.64 samples/sec Loss 9.2367 LearningRate 0.1810 Epoch: 3 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:51,309-Speed 12985.54 samples/sec Loss 9.2033 LearningRate 0.1809 Epoch: 3 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:52,896-Speed 12912.27 samples/sec Loss 9.1736 LearningRate 0.1809 Epoch: 3 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:54,475-Speed 12969.74 samples/sec Loss 9.1189 LearningRate 0.1809 Epoch: 3 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:56,037-Speed 13118.69 samples/sec Loss 9.1456 LearningRate 0.1808 Epoch: 3 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:57,623-Speed 12916.27 samples/sec Loss 9.1560 LearningRate 0.1808 Epoch: 3 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:09:59,190-Speed 13078.45 samples/sec Loss 9.2052 LearningRate 0.1807 Epoch: 3 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:00,757-Speed 13077.04 samples/sec Loss 9.2123 LearningRate 0.1807 Epoch: 3 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:02,339-Speed 12948.53 samples/sec Loss 9.2103 LearningRate 0.1807 Epoch: 3 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:03,879-Speed 13305.90 samples/sec Loss 9.1524 LearningRate 0.1806 Epoch: 3 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:05,466-Speed 12916.27 samples/sec Loss 9.2543 LearningRate 0.1806 Epoch: 3 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:07,041-Speed 13010.08 samples/sec Loss 9.0847 LearningRate 0.1806 Epoch: 3 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:08,597-Speed 13166.16 samples/sec Loss 9.1597 LearningRate 0.1805 Epoch: 3 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:10,205-Speed 12742.82 samples/sec Loss 9.1965 LearningRate 0.1805 Epoch: 3 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:11,796-Speed 12885.58 samples/sec Loss 9.0986 LearningRate 0.1804 Epoch: 3 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:13,406-Speed 12718.90 samples/sec Loss 9.2304 LearningRate 0.1804 Epoch: 3 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:14,977-Speed 13040.43 samples/sec Loss 9.1235 LearningRate 0.1804 Epoch: 3 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:16,536-Speed 13148.22 samples/sec Loss 9.2192 LearningRate 0.1803 Epoch: 3 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:18,135-Speed 12812.96 samples/sec Loss 9.1385 LearningRate 0.1803 Epoch: 3 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:19,742-Speed 12750.08 samples/sec Loss 9.0658 LearningRate 0.1802 Epoch: 3 Global Step: 9930 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:10:21,324-Speed 12947.95 samples/sec Loss 9.1286 LearningRate 0.1802 Epoch: 3 Global Step: 9940 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:10:22,918-Speed 12857.90 samples/sec Loss 9.1315 LearningRate 0.1802 Epoch: 3 Global Step: 9950 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:10:24,496-Speed 12984.36 samples/sec Loss 9.1573 LearningRate 0.1801 Epoch: 3 Global Step: 9960 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:10:26,067-Speed 13041.75 samples/sec Loss 9.1031 LearningRate 0.1801 Epoch: 3 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:27,623-Speed 13169.16 samples/sec Loss 9.0857 LearningRate 0.1800 Epoch: 3 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:29,198-Speed 13008.59 samples/sec Loss 9.0924 LearningRate 0.1800 Epoch: 3 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:30,773-Speed 13013.99 samples/sec Loss 9.1607 LearningRate 0.1800 Epoch: 3 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:10:52,676-[lfw][10000]XNorm: 13.725787 Training: 2022-01-14 14:10:52,677-[lfw][10000]Accuracy-Flip: 0.99383+-0.00358 Training: 2022-01-14 14:10:52,677-[lfw][10000]Accuracy-Highest: 0.99383 Training: 2022-01-14 14:11:18,100-[cfp_fp][10000]XNorm: 11.573416 Training: 2022-01-14 14:11:18,101-[cfp_fp][10000]Accuracy-Flip: 0.92371+-0.01350 Training: 2022-01-14 14:11:18,101-[cfp_fp][10000]Accuracy-Highest: 0.92371 Training: 2022-01-14 14:11:40,044-[agedb_30][10000]XNorm: 13.367748 Training: 2022-01-14 14:11:40,045-[agedb_30][10000]Accuracy-Flip: 0.94183+-0.01315 Training: 2022-01-14 14:11:40,045-[agedb_30][10000]Accuracy-Highest: 0.94183 Training: 2022-01-14 14:11:41,622-Speed 289.07 samples/sec Loss 9.0436 LearningRate 0.1799 Epoch: 3 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:43,187-Speed 13096.88 samples/sec Loss 9.2145 LearningRate 0.1799 Epoch: 3 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:44,789-Speed 12788.89 samples/sec Loss 9.0889 LearningRate 0.1798 Epoch: 3 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:46,361-Speed 13043.12 samples/sec Loss 9.1472 LearningRate 0.1798 Epoch: 3 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:47,961-Speed 12813.30 samples/sec Loss 9.0886 LearningRate 0.1798 Epoch: 3 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:49,535-Speed 13023.51 samples/sec Loss 9.0457 LearningRate 0.1797 Epoch: 3 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:51,128-Speed 12867.35 samples/sec Loss 9.1120 LearningRate 0.1797 Epoch: 3 Global Step: 10070 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:11:52,731-Speed 12785.52 samples/sec Loss 9.0483 LearningRate 0.1796 Epoch: 3 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:54,322-Speed 12880.65 samples/sec Loss 9.2257 LearningRate 0.1796 Epoch: 3 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:55,897-Speed 13009.20 samples/sec Loss 9.0809 LearningRate 0.1796 Epoch: 3 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:11:57,512-Speed 12693.62 samples/sec Loss 9.1000 LearningRate 0.1795 Epoch: 3 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:12:12,277-Speed 1387.62 samples/sec Loss 8.7279 LearningRate 0.1795 Epoch: 4 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:12:13,935-Speed 12371.18 samples/sec Loss 8.1289 LearningRate 0.1794 Epoch: 4 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:12:15,652-Speed 11936.25 samples/sec Loss 8.2175 LearningRate 0.1794 Epoch: 4 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:12:17,240-Speed 12903.05 samples/sec Loss 8.0976 LearningRate 0.1794 Epoch: 4 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:12:18,863-Speed 12632.46 samples/sec Loss 8.1527 LearningRate 0.1793 Epoch: 4 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:12:20,458-Speed 12852.49 samples/sec Loss 8.1915 LearningRate 0.1793 Epoch: 4 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 14:12:22,059-Speed 12801.52 samples/sec Loss 8.2756 LearningRate 0.1792 Epoch: 4 Global Step: 10180 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:12:23,660-Speed 12800.17 samples/sec Loss 8.2551 LearningRate 0.1792 Epoch: 4 Global Step: 10190 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:12:25,309-Speed 12429.80 samples/sec Loss 8.2236 LearningRate 0.1792 Epoch: 4 Global Step: 10200 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:12:26,883-Speed 13027.22 samples/sec Loss 8.2636 LearningRate 0.1791 Epoch: 4 Global Step: 10210 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-14 14:12:28,494-Speed 12717.06 samples/sec Loss 8.3511 LearningRate 0.1791 Epoch: 4 Global Step: 10220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:12:30,100-Speed 12774.85 samples/sec Loss 8.2744 LearningRate 0.1791 Epoch: 4 Global Step: 10230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:12:31,682-Speed 12953.13 samples/sec Loss 8.2978 LearningRate 0.1790 Epoch: 4 Global Step: 10240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 14:12:33,278-Speed 12834.25 samples/sec Loss 8.2585 LearningRate 0.1790 Epoch: 4 Global Step: 10250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:34,844-Speed 13087.50 samples/sec Loss 8.3256 LearningRate 0.1789 Epoch: 4 Global Step: 10260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:36,438-Speed 12857.28 samples/sec Loss 8.4257 LearningRate 0.1789 Epoch: 4 Global Step: 10270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:38,053-Speed 12682.76 samples/sec Loss 8.3315 LearningRate 0.1789 Epoch: 4 Global Step: 10280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:39,651-Speed 12827.51 samples/sec Loss 8.4834 LearningRate 0.1788 Epoch: 4 Global Step: 10290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:41,254-Speed 12784.38 samples/sec Loss 8.3891 LearningRate 0.1788 Epoch: 4 Global Step: 10300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:42,841-Speed 12913.37 samples/sec Loss 8.4269 LearningRate 0.1787 Epoch: 4 Global Step: 10310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:44,415-Speed 13018.23 samples/sec Loss 8.2914 LearningRate 0.1787 Epoch: 4 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:12:45,998-Speed 12943.37 samples/sec Loss 8.3210 LearningRate 0.1787 Epoch: 4 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:12:47,659-Speed 12334.74 samples/sec Loss 8.4925 LearningRate 0.1786 Epoch: 4 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:12:49,223-Speed 13106.96 samples/sec Loss 8.5262 LearningRate 0.1786 Epoch: 4 Global Step: 10350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:50,797-Speed 13016.05 samples/sec Loss 8.4504 LearningRate 0.1785 Epoch: 4 Global Step: 10360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:52,383-Speed 12927.69 samples/sec Loss 8.4533 LearningRate 0.1785 Epoch: 4 Global Step: 10370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:53,962-Speed 12978.99 samples/sec Loss 8.4555 LearningRate 0.1785 Epoch: 4 Global Step: 10380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:55,542-Speed 12965.60 samples/sec Loss 8.4743 LearningRate 0.1784 Epoch: 4 Global Step: 10390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:57,109-Speed 13075.32 samples/sec Loss 8.4758 LearningRate 0.1784 Epoch: 4 Global Step: 10400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:12:58,684-Speed 13015.73 samples/sec Loss 8.4614 LearningRate 0.1783 Epoch: 4 Global Step: 10410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:00,295-Speed 12723.24 samples/sec Loss 8.4732 LearningRate 0.1783 Epoch: 4 Global Step: 10420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:01,884-Speed 12896.31 samples/sec Loss 8.4662 LearningRate 0.1783 Epoch: 4 Global Step: 10430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:03,480-Speed 12843.15 samples/sec Loss 8.5728 LearningRate 0.1782 Epoch: 4 Global Step: 10440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:05,081-Speed 12802.14 samples/sec Loss 8.5660 LearningRate 0.1782 Epoch: 4 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:06,697-Speed 12682.97 samples/sec Loss 8.4527 LearningRate 0.1781 Epoch: 4 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:08,265-Speed 13070.03 samples/sec Loss 8.5953 LearningRate 0.1781 Epoch: 4 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:09,845-Speed 12970.46 samples/sec Loss 8.4615 LearningRate 0.1781 Epoch: 4 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:11,442-Speed 12832.89 samples/sec Loss 8.5986 LearningRate 0.1780 Epoch: 4 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:13,030-Speed 12907.44 samples/sec Loss 8.5002 LearningRate 0.1780 Epoch: 4 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:14,598-Speed 13070.18 samples/sec Loss 8.5745 LearningRate 0.1780 Epoch: 4 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:16,165-Speed 13070.95 samples/sec Loss 8.5419 LearningRate 0.1779 Epoch: 4 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:17,822-Speed 12373.38 samples/sec Loss 8.6499 LearningRate 0.1779 Epoch: 4 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:19,419-Speed 12836.87 samples/sec Loss 8.6509 LearningRate 0.1778 Epoch: 4 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:21,046-Speed 12593.43 samples/sec Loss 8.6250 LearningRate 0.1778 Epoch: 4 Global Step: 10550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:13:22,645-Speed 12815.10 samples/sec Loss 8.6156 LearningRate 0.1778 Epoch: 4 Global Step: 10560 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:13:24,215-Speed 13058.42 samples/sec Loss 8.5668 LearningRate 0.1777 Epoch: 4 Global Step: 10570 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:13:25,805-Speed 12888.58 samples/sec Loss 8.7314 LearningRate 0.1777 Epoch: 4 Global Step: 10580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:27,393-Speed 12902.95 samples/sec Loss 8.6128 LearningRate 0.1776 Epoch: 4 Global Step: 10590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:28,989-Speed 12847.67 samples/sec Loss 8.6705 LearningRate 0.1776 Epoch: 4 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:30,601-Speed 12710.23 samples/sec Loss 8.6181 LearningRate 0.1776 Epoch: 4 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:32,184-Speed 12951.87 samples/sec Loss 8.6991 LearningRate 0.1775 Epoch: 4 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:33,783-Speed 12813.38 samples/sec Loss 8.6933 LearningRate 0.1775 Epoch: 4 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:35,415-Speed 12556.56 samples/sec Loss 8.6177 LearningRate 0.1774 Epoch: 4 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:37,006-Speed 12881.86 samples/sec Loss 8.5921 LearningRate 0.1774 Epoch: 4 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:38,602-Speed 12843.12 samples/sec Loss 8.7276 LearningRate 0.1774 Epoch: 4 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:40,184-Speed 12961.77 samples/sec Loss 8.7593 LearningRate 0.1773 Epoch: 4 Global Step: 10670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:41,784-Speed 12801.90 samples/sec Loss 8.7303 LearningRate 0.1773 Epoch: 4 Global Step: 10680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:43,374-Speed 12894.18 samples/sec Loss 8.6653 LearningRate 0.1772 Epoch: 4 Global Step: 10690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:44,935-Speed 13129.25 samples/sec Loss 8.6764 LearningRate 0.1772 Epoch: 4 Global Step: 10700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:46,546-Speed 12719.45 samples/sec Loss 8.6915 LearningRate 0.1772 Epoch: 4 Global Step: 10710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:48,157-Speed 12723.99 samples/sec Loss 8.5466 LearningRate 0.1771 Epoch: 4 Global Step: 10720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:49,736-Speed 12978.37 samples/sec Loss 8.7087 LearningRate 0.1771 Epoch: 4 Global Step: 10730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:51,352-Speed 12673.40 samples/sec Loss 8.6927 LearningRate 0.1770 Epoch: 4 Global Step: 10740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:52,969-Speed 12712.31 samples/sec Loss 8.7092 LearningRate 0.1770 Epoch: 4 Global Step: 10750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:54,541-Speed 13033.28 samples/sec Loss 8.6702 LearningRate 0.1770 Epoch: 4 Global Step: 10760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:13:56,124-Speed 12949.18 samples/sec Loss 8.5545 LearningRate 0.1769 Epoch: 4 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:57,679-Speed 13178.23 samples/sec Loss 8.7294 LearningRate 0.1769 Epoch: 4 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:13:59,259-Speed 12977.88 samples/sec Loss 8.7569 LearningRate 0.1769 Epoch: 4 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:00,884-Speed 12606.48 samples/sec Loss 8.6762 LearningRate 0.1768 Epoch: 4 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:02,463-Speed 12979.92 samples/sec Loss 8.6318 LearningRate 0.1768 Epoch: 4 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:04,041-Speed 12988.58 samples/sec Loss 8.6425 LearningRate 0.1767 Epoch: 4 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:05,628-Speed 12917.94 samples/sec Loss 8.6508 LearningRate 0.1767 Epoch: 4 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:07,221-Speed 12863.38 samples/sec Loss 8.7170 LearningRate 0.1767 Epoch: 4 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:08,827-Speed 12765.06 samples/sec Loss 8.7163 LearningRate 0.1766 Epoch: 4 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:10,416-Speed 12899.02 samples/sec Loss 8.7022 LearningRate 0.1766 Epoch: 4 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:11,975-Speed 13144.00 samples/sec Loss 8.7709 LearningRate 0.1765 Epoch: 4 Global Step: 10870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:13,580-Speed 12769.25 samples/sec Loss 8.7088 LearningRate 0.1765 Epoch: 4 Global Step: 10880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:15,152-Speed 13037.54 samples/sec Loss 8.7077 LearningRate 0.1765 Epoch: 4 Global Step: 10890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:16,741-Speed 12893.24 samples/sec Loss 8.6606 LearningRate 0.1764 Epoch: 4 Global Step: 10900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:18,344-Speed 12788.64 samples/sec Loss 8.6617 LearningRate 0.1764 Epoch: 4 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:19,919-Speed 13012.09 samples/sec Loss 8.7623 LearningRate 0.1763 Epoch: 4 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:21,504-Speed 12925.66 samples/sec Loss 8.7064 LearningRate 0.1763 Epoch: 4 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:23,117-Speed 12708.75 samples/sec Loss 8.7834 LearningRate 0.1763 Epoch: 4 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:24,716-Speed 12818.80 samples/sec Loss 8.6439 LearningRate 0.1762 Epoch: 4 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:26,290-Speed 13013.98 samples/sec Loss 8.7505 LearningRate 0.1762 Epoch: 4 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:27,870-Speed 12971.63 samples/sec Loss 8.7084 LearningRate 0.1761 Epoch: 4 Global Step: 10970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:29,450-Speed 12970.93 samples/sec Loss 8.7120 LearningRate 0.1761 Epoch: 4 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:31,025-Speed 13011.16 samples/sec Loss 8.6310 LearningRate 0.1761 Epoch: 4 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:32,619-Speed 12863.00 samples/sec Loss 8.7087 LearningRate 0.1760 Epoch: 4 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:34,214-Speed 12840.87 samples/sec Loss 8.7639 LearningRate 0.1760 Epoch: 4 Global Step: 11010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:35,806-Speed 12875.98 samples/sec Loss 8.7807 LearningRate 0.1760 Epoch: 4 Global Step: 11020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:37,385-Speed 12983.97 samples/sec Loss 8.7102 LearningRate 0.1759 Epoch: 4 Global Step: 11030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:38,990-Speed 12768.44 samples/sec Loss 8.7026 LearningRate 0.1759 Epoch: 4 Global Step: 11040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:40,571-Speed 12967.86 samples/sec Loss 8.8133 LearningRate 0.1758 Epoch: 4 Global Step: 11050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:42,141-Speed 13055.32 samples/sec Loss 8.7598 LearningRate 0.1758 Epoch: 4 Global Step: 11060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:43,754-Speed 12705.22 samples/sec Loss 8.7550 LearningRate 0.1758 Epoch: 4 Global Step: 11070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:45,347-Speed 12858.59 samples/sec Loss 8.6804 LearningRate 0.1757 Epoch: 4 Global Step: 11080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:46,955-Speed 12751.10 samples/sec Loss 8.7023 LearningRate 0.1757 Epoch: 4 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:48,524-Speed 13066.08 samples/sec Loss 8.7660 LearningRate 0.1756 Epoch: 4 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:50,097-Speed 13025.24 samples/sec Loss 8.6752 LearningRate 0.1756 Epoch: 4 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:51,728-Speed 12564.39 samples/sec Loss 8.7599 LearningRate 0.1756 Epoch: 4 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:53,291-Speed 13112.52 samples/sec Loss 8.6396 LearningRate 0.1755 Epoch: 4 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:54,882-Speed 12891.74 samples/sec Loss 8.7547 LearningRate 0.1755 Epoch: 4 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:56,462-Speed 12970.72 samples/sec Loss 8.7919 LearningRate 0.1754 Epoch: 4 Global Step: 11150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:14:58,024-Speed 13123.61 samples/sec Loss 8.7173 LearningRate 0.1754 Epoch: 4 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:14:59,642-Speed 12672.08 samples/sec Loss 8.7531 LearningRate 0.1754 Epoch: 4 Global Step: 11170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:01,222-Speed 12970.69 samples/sec Loss 8.6930 LearningRate 0.1753 Epoch: 4 Global Step: 11180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:02,792-Speed 13050.83 samples/sec Loss 8.6983 LearningRate 0.1753 Epoch: 4 Global Step: 11190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:04,412-Speed 12658.12 samples/sec Loss 8.7386 LearningRate 0.1753 Epoch: 4 Global Step: 11200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:05,988-Speed 12998.90 samples/sec Loss 8.6975 LearningRate 0.1752 Epoch: 4 Global Step: 11210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:07,584-Speed 12839.00 samples/sec Loss 8.7662 LearningRate 0.1752 Epoch: 4 Global Step: 11220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:09,142-Speed 13159.17 samples/sec Loss 8.6795 LearningRate 0.1751 Epoch: 4 Global Step: 11230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:10,726-Speed 12934.93 samples/sec Loss 8.7983 LearningRate 0.1751 Epoch: 4 Global Step: 11240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:12,340-Speed 12703.13 samples/sec Loss 8.7630 LearningRate 0.1751 Epoch: 4 Global Step: 11250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:13,915-Speed 13008.50 samples/sec Loss 8.7951 LearningRate 0.1750 Epoch: 4 Global Step: 11260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:15,513-Speed 12829.35 samples/sec Loss 8.7319 LearningRate 0.1750 Epoch: 4 Global Step: 11270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:17,121-Speed 12742.01 samples/sec Loss 8.7217 LearningRate 0.1749 Epoch: 4 Global Step: 11280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:18,675-Speed 13188.10 samples/sec Loss 8.7321 LearningRate 0.1749 Epoch: 4 Global Step: 11290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:20,241-Speed 13088.74 samples/sec Loss 8.6699 LearningRate 0.1749 Epoch: 4 Global Step: 11300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:21,846-Speed 12767.55 samples/sec Loss 8.6659 LearningRate 0.1748 Epoch: 4 Global Step: 11310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:23,441-Speed 12848.79 samples/sec Loss 8.7825 LearningRate 0.1748 Epoch: 4 Global Step: 11320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:25,023-Speed 12959.36 samples/sec Loss 8.7340 LearningRate 0.1747 Epoch: 4 Global Step: 11330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:26,601-Speed 12987.74 samples/sec Loss 8.7959 LearningRate 0.1747 Epoch: 4 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:28,171-Speed 13048.95 samples/sec Loss 8.6481 LearningRate 0.1747 Epoch: 4 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:15:29,826-Speed 12390.74 samples/sec Loss 8.7158 LearningRate 0.1746 Epoch: 4 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:31,425-Speed 12819.51 samples/sec Loss 8.7975 LearningRate 0.1746 Epoch: 4 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:33,006-Speed 12958.07 samples/sec Loss 8.6874 LearningRate 0.1746 Epoch: 4 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:34,595-Speed 12899.54 samples/sec Loss 8.6952 LearningRate 0.1745 Epoch: 4 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:36,232-Speed 12521.37 samples/sec Loss 8.7992 LearningRate 0.1745 Epoch: 4 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:37,802-Speed 13073.14 samples/sec Loss 8.7294 LearningRate 0.1744 Epoch: 4 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:39,392-Speed 12885.06 samples/sec Loss 8.6748 LearningRate 0.1744 Epoch: 4 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:40,988-Speed 12839.94 samples/sec Loss 8.7082 LearningRate 0.1744 Epoch: 4 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:42,571-Speed 12961.10 samples/sec Loss 8.8050 LearningRate 0.1743 Epoch: 4 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:44,178-Speed 12747.23 samples/sec Loss 8.7288 LearningRate 0.1743 Epoch: 4 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:45,765-Speed 12918.41 samples/sec Loss 8.6560 LearningRate 0.1742 Epoch: 4 Global Step: 11460 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:15:47,325-Speed 13131.47 samples/sec Loss 8.6522 LearningRate 0.1742 Epoch: 4 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:48,974-Speed 12434.76 samples/sec Loss 8.7971 LearningRate 0.1742 Epoch: 4 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:50,589-Speed 12689.51 samples/sec Loss 8.7087 LearningRate 0.1741 Epoch: 4 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:52,176-Speed 12917.20 samples/sec Loss 8.6730 LearningRate 0.1741 Epoch: 4 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:53,763-Speed 12915.07 samples/sec Loss 8.6825 LearningRate 0.1740 Epoch: 4 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:55,352-Speed 12898.78 samples/sec Loss 8.6728 LearningRate 0.1740 Epoch: 4 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:56,935-Speed 12948.17 samples/sec Loss 8.7248 LearningRate 0.1740 Epoch: 4 Global Step: 11530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:15:58,527-Speed 12872.09 samples/sec Loss 8.6495 LearningRate 0.1739 Epoch: 4 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:00,122-Speed 12849.18 samples/sec Loss 8.6109 LearningRate 0.1739 Epoch: 4 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:01,716-Speed 12859.35 samples/sec Loss 8.6835 LearningRate 0.1739 Epoch: 4 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:03,279-Speed 13106.58 samples/sec Loss 8.6800 LearningRate 0.1738 Epoch: 4 Global Step: 11570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:04,854-Speed 13016.04 samples/sec Loss 8.6537 LearningRate 0.1738 Epoch: 4 Global Step: 11580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:06,450-Speed 12841.66 samples/sec Loss 8.7567 LearningRate 0.1737 Epoch: 4 Global Step: 11590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:08,029-Speed 12975.18 samples/sec Loss 8.7031 LearningRate 0.1737 Epoch: 4 Global Step: 11600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:09,608-Speed 12972.47 samples/sec Loss 8.6696 LearningRate 0.1737 Epoch: 4 Global Step: 11610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:11,254-Speed 12450.97 samples/sec Loss 8.6137 LearningRate 0.1736 Epoch: 4 Global Step: 11620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:12,824-Speed 13059.62 samples/sec Loss 8.5935 LearningRate 0.1736 Epoch: 4 Global Step: 11630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:14,432-Speed 12747.11 samples/sec Loss 8.7374 LearningRate 0.1735 Epoch: 4 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:15,988-Speed 13168.42 samples/sec Loss 8.7251 LearningRate 0.1735 Epoch: 4 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:17,589-Speed 12804.52 samples/sec Loss 8.7126 LearningRate 0.1735 Epoch: 4 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:19,168-Speed 12976.59 samples/sec Loss 8.6520 LearningRate 0.1734 Epoch: 4 Global Step: 11670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:20,780-Speed 12711.84 samples/sec Loss 8.7318 LearningRate 0.1734 Epoch: 4 Global Step: 11680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:22,365-Speed 12931.14 samples/sec Loss 8.7198 LearningRate 0.1733 Epoch: 4 Global Step: 11690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:23,963-Speed 12827.40 samples/sec Loss 8.6620 LearningRate 0.1733 Epoch: 4 Global Step: 11700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:25,542-Speed 12981.81 samples/sec Loss 8.6622 LearningRate 0.1733 Epoch: 4 Global Step: 11710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:27,148-Speed 12762.51 samples/sec Loss 8.6404 LearningRate 0.1732 Epoch: 4 Global Step: 11720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:28,721-Speed 13031.47 samples/sec Loss 8.6623 LearningRate 0.1732 Epoch: 4 Global Step: 11730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:30,327-Speed 12764.80 samples/sec Loss 8.6823 LearningRate 0.1732 Epoch: 4 Global Step: 11740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:31,903-Speed 12999.65 samples/sec Loss 8.7115 LearningRate 0.1731 Epoch: 4 Global Step: 11750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:33,489-Speed 12927.87 samples/sec Loss 8.6297 LearningRate 0.1731 Epoch: 4 Global Step: 11760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:35,110-Speed 12643.73 samples/sec Loss 8.6932 LearningRate 0.1730 Epoch: 4 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:36,705-Speed 12847.88 samples/sec Loss 8.7396 LearningRate 0.1730 Epoch: 4 Global Step: 11780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:38,293-Speed 12934.60 samples/sec Loss 8.4865 LearningRate 0.1730 Epoch: 4 Global Step: 11790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:39,900-Speed 12753.18 samples/sec Loss 8.6331 LearningRate 0.1729 Epoch: 4 Global Step: 11800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:41,468-Speed 13071.03 samples/sec Loss 8.7816 LearningRate 0.1729 Epoch: 4 Global Step: 11810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:43,047-Speed 12983.57 samples/sec Loss 8.7361 LearningRate 0.1728 Epoch: 4 Global Step: 11820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:44,675-Speed 12586.11 samples/sec Loss 8.5806 LearningRate 0.1728 Epoch: 4 Global Step: 11830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:46,265-Speed 12891.98 samples/sec Loss 8.6146 LearningRate 0.1728 Epoch: 4 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:47,847-Speed 12954.44 samples/sec Loss 8.7419 LearningRate 0.1727 Epoch: 4 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:49,405-Speed 13155.99 samples/sec Loss 8.6211 LearningRate 0.1727 Epoch: 4 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:51,014-Speed 12731.79 samples/sec Loss 8.6380 LearningRate 0.1727 Epoch: 4 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:16:52,597-Speed 12952.88 samples/sec Loss 8.5281 LearningRate 0.1726 Epoch: 4 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:54,215-Speed 12666.47 samples/sec Loss 8.7013 LearningRate 0.1726 Epoch: 4 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:55,781-Speed 13094.68 samples/sec Loss 8.5052 LearningRate 0.1725 Epoch: 4 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:57,356-Speed 13011.02 samples/sec Loss 8.7082 LearningRate 0.1725 Epoch: 4 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:16:58,982-Speed 12608.84 samples/sec Loss 8.7581 LearningRate 0.1725 Epoch: 4 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:00,551-Speed 13062.66 samples/sec Loss 8.6673 LearningRate 0.1724 Epoch: 4 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:02,162-Speed 12715.90 samples/sec Loss 8.5198 LearningRate 0.1724 Epoch: 4 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:03,733-Speed 13048.50 samples/sec Loss 8.6701 LearningRate 0.1723 Epoch: 4 Global Step: 11950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:05,316-Speed 12941.30 samples/sec Loss 8.6832 LearningRate 0.1723 Epoch: 4 Global Step: 11960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:06,934-Speed 12663.08 samples/sec Loss 8.6818 LearningRate 0.1723 Epoch: 4 Global Step: 11970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:08,499-Speed 13095.38 samples/sec Loss 8.6331 LearningRate 0.1722 Epoch: 4 Global Step: 11980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:10,057-Speed 13150.55 samples/sec Loss 8.6606 LearningRate 0.1722 Epoch: 4 Global Step: 11990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:11,621-Speed 13102.51 samples/sec Loss 8.5715 LearningRate 0.1721 Epoch: 4 Global Step: 12000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:13,210-Speed 12905.88 samples/sec Loss 8.6411 LearningRate 0.1721 Epoch: 4 Global Step: 12010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:14,767-Speed 13161.16 samples/sec Loss 8.6002 LearningRate 0.1721 Epoch: 4 Global Step: 12020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:16,354-Speed 12906.02 samples/sec Loss 8.6151 LearningRate 0.1720 Epoch: 4 Global Step: 12030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:17,943-Speed 12898.75 samples/sec Loss 8.7300 LearningRate 0.1720 Epoch: 4 Global Step: 12040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:17:19,540-Speed 12834.23 samples/sec Loss 8.5738 LearningRate 0.1720 Epoch: 4 Global Step: 12050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:21,145-Speed 12766.31 samples/sec Loss 8.6316 LearningRate 0.1719 Epoch: 4 Global Step: 12060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:22,714-Speed 13064.48 samples/sec Loss 8.4874 LearningRate 0.1719 Epoch: 4 Global Step: 12070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:24,297-Speed 12945.34 samples/sec Loss 8.6035 LearningRate 0.1718 Epoch: 4 Global Step: 12080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:25,852-Speed 13176.07 samples/sec Loss 8.6572 LearningRate 0.1718 Epoch: 4 Global Step: 12090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:27,469-Speed 12683.07 samples/sec Loss 8.5466 LearningRate 0.1718 Epoch: 4 Global Step: 12100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:29,053-Speed 12938.52 samples/sec Loss 8.6970 LearningRate 0.1717 Epoch: 4 Global Step: 12110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:30,612-Speed 13142.78 samples/sec Loss 8.5774 LearningRate 0.1717 Epoch: 4 Global Step: 12120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:32,224-Speed 12713.75 samples/sec Loss 8.6856 LearningRate 0.1716 Epoch: 4 Global Step: 12130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:33,854-Speed 12576.25 samples/sec Loss 8.5484 LearningRate 0.1716 Epoch: 4 Global Step: 12140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:17:35,433-Speed 12972.91 samples/sec Loss 8.6818 LearningRate 0.1716 Epoch: 4 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:36,997-Speed 13101.91 samples/sec Loss 8.6373 LearningRate 0.1715 Epoch: 4 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:38,580-Speed 12943.81 samples/sec Loss 8.7334 LearningRate 0.1715 Epoch: 4 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:40,180-Speed 12812.56 samples/sec Loss 8.6961 LearningRate 0.1715 Epoch: 4 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:41,743-Speed 13115.64 samples/sec Loss 8.5697 LearningRate 0.1714 Epoch: 4 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:43,328-Speed 12932.39 samples/sec Loss 8.5978 LearningRate 0.1714 Epoch: 4 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:44,896-Speed 13067.45 samples/sec Loss 8.5832 LearningRate 0.1713 Epoch: 4 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:46,486-Speed 12891.74 samples/sec Loss 8.7177 LearningRate 0.1713 Epoch: 4 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:48,084-Speed 12821.89 samples/sec Loss 8.6220 LearningRate 0.1713 Epoch: 4 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:49,668-Speed 12937.47 samples/sec Loss 8.6600 LearningRate 0.1712 Epoch: 4 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:51,263-Speed 12847.04 samples/sec Loss 8.5673 LearningRate 0.1712 Epoch: 4 Global Step: 12250 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:17:52,837-Speed 13019.84 samples/sec Loss 8.5117 LearningRate 0.1711 Epoch: 4 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:54,475-Speed 12511.06 samples/sec Loss 8.5576 LearningRate 0.1711 Epoch: 4 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:56,056-Speed 12967.08 samples/sec Loss 8.5518 LearningRate 0.1711 Epoch: 4 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:57,633-Speed 12994.49 samples/sec Loss 8.5750 LearningRate 0.1710 Epoch: 4 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:17:59,220-Speed 12909.94 samples/sec Loss 8.6261 LearningRate 0.1710 Epoch: 4 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:00,825-Speed 12772.82 samples/sec Loss 8.5814 LearningRate 0.1710 Epoch: 4 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:02,398-Speed 13032.42 samples/sec Loss 8.6414 LearningRate 0.1709 Epoch: 4 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:03,977-Speed 12970.69 samples/sec Loss 8.6023 LearningRate 0.1709 Epoch: 4 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:05,565-Speed 12904.42 samples/sec Loss 8.5604 LearningRate 0.1708 Epoch: 4 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:07,161-Speed 12843.47 samples/sec Loss 8.6615 LearningRate 0.1708 Epoch: 4 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:08,764-Speed 12787.51 samples/sec Loss 8.4811 LearningRate 0.1708 Epoch: 4 Global Step: 12360 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:18:10,347-Speed 12944.05 samples/sec Loss 8.5792 LearningRate 0.1707 Epoch: 4 Global Step: 12370 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:18:11,927-Speed 12967.24 samples/sec Loss 8.5572 LearningRate 0.1707 Epoch: 4 Global Step: 12380 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:18:13,520-Speed 12869.48 samples/sec Loss 8.5616 LearningRate 0.1706 Epoch: 4 Global Step: 12390 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:18:15,105-Speed 12927.12 samples/sec Loss 8.5761 LearningRate 0.1706 Epoch: 4 Global Step: 12400 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:18:16,698-Speed 12866.08 samples/sec Loss 8.4976 LearningRate 0.1706 Epoch: 4 Global Step: 12410 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:18:18,284-Speed 12917.68 samples/sec Loss 8.6257 LearningRate 0.1705 Epoch: 4 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:19,863-Speed 12981.09 samples/sec Loss 8.5929 LearningRate 0.1705 Epoch: 4 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:21,485-Speed 12641.29 samples/sec Loss 8.5823 LearningRate 0.1705 Epoch: 4 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:23,062-Speed 12997.62 samples/sec Loss 8.6462 LearningRate 0.1704 Epoch: 4 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:24,655-Speed 12894.89 samples/sec Loss 8.5412 LearningRate 0.1704 Epoch: 4 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:26,250-Speed 12842.08 samples/sec Loss 8.5244 LearningRate 0.1703 Epoch: 4 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:27,849-Speed 12820.89 samples/sec Loss 8.5240 LearningRate 0.1703 Epoch: 4 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:29,430-Speed 12969.69 samples/sec Loss 8.5289 LearningRate 0.1703 Epoch: 4 Global Step: 12490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:31,005-Speed 13012.46 samples/sec Loss 8.5762 LearningRate 0.1702 Epoch: 4 Global Step: 12500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:32,586-Speed 12965.21 samples/sec Loss 8.6075 LearningRate 0.1702 Epoch: 4 Global Step: 12510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:34,157-Speed 13044.54 samples/sec Loss 8.4600 LearningRate 0.1701 Epoch: 4 Global Step: 12520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:35,717-Speed 13132.12 samples/sec Loss 8.5857 LearningRate 0.1701 Epoch: 4 Global Step: 12530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:37,304-Speed 12910.49 samples/sec Loss 8.6522 LearningRate 0.1701 Epoch: 4 Global Step: 12540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:38,887-Speed 12949.38 samples/sec Loss 8.5832 LearningRate 0.1700 Epoch: 4 Global Step: 12550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:40,492-Speed 12771.17 samples/sec Loss 8.6104 LearningRate 0.1700 Epoch: 4 Global Step: 12560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:42,106-Speed 12700.61 samples/sec Loss 8.5944 LearningRate 0.1700 Epoch: 4 Global Step: 12570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:43,691-Speed 12937.51 samples/sec Loss 8.5654 LearningRate 0.1699 Epoch: 4 Global Step: 12580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:18:45,289-Speed 12822.41 samples/sec Loss 8.6568 LearningRate 0.1699 Epoch: 4 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:46,880-Speed 12875.37 samples/sec Loss 8.4846 LearningRate 0.1698 Epoch: 4 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:48,478-Speed 12826.42 samples/sec Loss 8.4828 LearningRate 0.1698 Epoch: 4 Global Step: 12610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:50,075-Speed 12832.78 samples/sec Loss 8.5544 LearningRate 0.1698 Epoch: 4 Global Step: 12620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:51,732-Speed 12365.47 samples/sec Loss 8.5644 LearningRate 0.1697 Epoch: 4 Global Step: 12630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:18:53,234-Speed 13646.93 samples/sec Loss 8.6067 LearningRate 0.1697 Epoch: 4 Global Step: 12640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:06,945-Speed 1493.97 samples/sec Loss 8.1354 LearningRate 0.1696 Epoch: 5 Global Step: 12650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:08,611-Speed 12305.78 samples/sec Loss 7.6787 LearningRate 0.1696 Epoch: 5 Global Step: 12660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:10,219-Speed 12752.04 samples/sec Loss 7.6124 LearningRate 0.1696 Epoch: 5 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:12,002-Speed 11495.77 samples/sec Loss 7.6393 LearningRate 0.1695 Epoch: 5 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:13,595-Speed 12866.87 samples/sec Loss 7.6872 LearningRate 0.1695 Epoch: 5 Global Step: 12690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:15,227-Speed 12563.20 samples/sec Loss 7.6749 LearningRate 0.1695 Epoch: 5 Global Step: 12700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:16,799-Speed 13034.88 samples/sec Loss 7.7022 LearningRate 0.1694 Epoch: 5 Global Step: 12710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:18,398-Speed 12817.61 samples/sec Loss 7.6313 LearningRate 0.1694 Epoch: 5 Global Step: 12720 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:20,006-Speed 12749.81 samples/sec Loss 7.7301 LearningRate 0.1693 Epoch: 5 Global Step: 12730 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:21,568-Speed 13122.44 samples/sec Loss 7.7634 LearningRate 0.1693 Epoch: 5 Global Step: 12740 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:23,184-Speed 12675.89 samples/sec Loss 7.7331 LearningRate 0.1693 Epoch: 5 Global Step: 12750 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:24,791-Speed 12756.79 samples/sec Loss 7.6428 LearningRate 0.1692 Epoch: 5 Global Step: 12760 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:26,407-Speed 12681.95 samples/sec Loss 7.6904 LearningRate 0.1692 Epoch: 5 Global Step: 12770 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:27,989-Speed 12955.80 samples/sec Loss 7.6745 LearningRate 0.1692 Epoch: 5 Global Step: 12780 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:29,608-Speed 12658.16 samples/sec Loss 7.6795 LearningRate 0.1691 Epoch: 5 Global Step: 12790 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:31,180-Speed 13040.58 samples/sec Loss 7.7934 LearningRate 0.1691 Epoch: 5 Global Step: 12800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:32,749-Speed 13086.31 samples/sec Loss 7.8371 LearningRate 0.1690 Epoch: 5 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:34,331-Speed 12954.58 samples/sec Loss 7.7814 LearningRate 0.1690 Epoch: 5 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:35,945-Speed 12694.86 samples/sec Loss 7.8989 LearningRate 0.1690 Epoch: 5 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:37,517-Speed 13031.63 samples/sec Loss 7.8627 LearningRate 0.1689 Epoch: 5 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:39,098-Speed 12964.35 samples/sec Loss 7.8769 LearningRate 0.1689 Epoch: 5 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:40,682-Speed 12938.93 samples/sec Loss 7.8627 LearningRate 0.1688 Epoch: 5 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:42,256-Speed 13018.42 samples/sec Loss 7.8608 LearningRate 0.1688 Epoch: 5 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:43,844-Speed 12901.68 samples/sec Loss 7.8822 LearningRate 0.1688 Epoch: 5 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:45,447-Speed 12785.28 samples/sec Loss 7.8085 LearningRate 0.1687 Epoch: 5 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:47,029-Speed 12955.47 samples/sec Loss 7.9360 LearningRate 0.1687 Epoch: 5 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:48,625-Speed 12834.64 samples/sec Loss 7.9417 LearningRate 0.1687 Epoch: 5 Global Step: 12910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:50,211-Speed 12929.85 samples/sec Loss 7.9320 LearningRate 0.1686 Epoch: 5 Global Step: 12920 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:51,769-Speed 13156.24 samples/sec Loss 7.9012 LearningRate 0.1686 Epoch: 5 Global Step: 12930 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:19:53,363-Speed 12855.58 samples/sec Loss 7.9274 LearningRate 0.1685 Epoch: 5 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:54,952-Speed 12915.89 samples/sec Loss 8.0250 LearningRate 0.1685 Epoch: 5 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:56,552-Speed 12811.07 samples/sec Loss 8.0572 LearningRate 0.1685 Epoch: 5 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:58,143-Speed 12874.10 samples/sec Loss 7.9153 LearningRate 0.1684 Epoch: 5 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:19:59,751-Speed 12744.45 samples/sec Loss 8.0362 LearningRate 0.1684 Epoch: 5 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:01,343-Speed 12874.74 samples/sec Loss 7.9794 LearningRate 0.1683 Epoch: 5 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:02,955-Speed 12714.90 samples/sec Loss 8.0477 LearningRate 0.1683 Epoch: 5 Global Step: 13000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:04,535-Speed 12980.91 samples/sec Loss 7.9808 LearningRate 0.1683 Epoch: 5 Global Step: 13010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:06,093-Speed 13154.26 samples/sec Loss 8.1050 LearningRate 0.1682 Epoch: 5 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:07,693-Speed 12806.30 samples/sec Loss 8.1001 LearningRate 0.1682 Epoch: 5 Global Step: 13030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:09,286-Speed 12864.60 samples/sec Loss 8.0291 LearningRate 0.1682 Epoch: 5 Global Step: 13040 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:20:10,863-Speed 12996.13 samples/sec Loss 8.0991 LearningRate 0.1681 Epoch: 5 Global Step: 13050 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:20:12,480-Speed 12673.63 samples/sec Loss 8.1004 LearningRate 0.1681 Epoch: 5 Global Step: 13060 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:20:14,054-Speed 13023.38 samples/sec Loss 8.0477 LearningRate 0.1680 Epoch: 5 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:15,636-Speed 12960.34 samples/sec Loss 7.9205 LearningRate 0.1680 Epoch: 5 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:17,268-Speed 12552.47 samples/sec Loss 8.0830 LearningRate 0.1680 Epoch: 5 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:18,846-Speed 12987.66 samples/sec Loss 8.0921 LearningRate 0.1679 Epoch: 5 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:20,415-Speed 13066.70 samples/sec Loss 8.0665 LearningRate 0.1679 Epoch: 5 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:22,023-Speed 12740.62 samples/sec Loss 8.1284 LearningRate 0.1679 Epoch: 5 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:23,619-Speed 12845.22 samples/sec Loss 8.0863 LearningRate 0.1678 Epoch: 5 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:25,214-Speed 12848.94 samples/sec Loss 8.0737 LearningRate 0.1678 Epoch: 5 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:26,791-Speed 12999.41 samples/sec Loss 8.1133 LearningRate 0.1677 Epoch: 5 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:28,396-Speed 12764.12 samples/sec Loss 8.0503 LearningRate 0.1677 Epoch: 5 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:29,979-Speed 12952.68 samples/sec Loss 8.1198 LearningRate 0.1677 Epoch: 5 Global Step: 13170 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:20:31,530-Speed 13209.65 samples/sec Loss 8.2105 LearningRate 0.1676 Epoch: 5 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:33,108-Speed 12991.06 samples/sec Loss 8.0941 LearningRate 0.1676 Epoch: 5 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:34,676-Speed 13066.00 samples/sec Loss 8.0505 LearningRate 0.1675 Epoch: 5 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:36,273-Speed 12836.41 samples/sec Loss 8.0910 LearningRate 0.1675 Epoch: 5 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:37,826-Speed 13197.51 samples/sec Loss 8.0996 LearningRate 0.1675 Epoch: 5 Global Step: 13220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:39,412-Speed 12919.48 samples/sec Loss 8.0867 LearningRate 0.1674 Epoch: 5 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:40,979-Speed 13076.39 samples/sec Loss 8.1109 LearningRate 0.1674 Epoch: 5 Global Step: 13240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:42,555-Speed 13002.62 samples/sec Loss 8.1591 LearningRate 0.1674 Epoch: 5 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:44,136-Speed 12968.02 samples/sec Loss 8.1124 LearningRate 0.1673 Epoch: 5 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:45,724-Speed 12901.79 samples/sec Loss 8.1697 LearningRate 0.1673 Epoch: 5 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:47,301-Speed 12999.39 samples/sec Loss 8.1193 LearningRate 0.1672 Epoch: 5 Global Step: 13280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:20:48,913-Speed 12712.66 samples/sec Loss 8.2171 LearningRate 0.1672 Epoch: 5 Global Step: 13290 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:20:50,458-Speed 13268.63 samples/sec Loss 8.1919 LearningRate 0.1672 Epoch: 5 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:20:52,033-Speed 13009.42 samples/sec Loss 8.2514 LearningRate 0.1671 Epoch: 5 Global Step: 13310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:20:53,627-Speed 12853.52 samples/sec Loss 8.2103 LearningRate 0.1671 Epoch: 5 Global Step: 13320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:20:55,223-Speed 12845.79 samples/sec Loss 8.1640 LearningRate 0.1671 Epoch: 5 Global Step: 13330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:20:56,819-Speed 12834.65 samples/sec Loss 8.2723 LearningRate 0.1670 Epoch: 5 Global Step: 13340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:20:58,403-Speed 12963.04 samples/sec Loss 8.1768 LearningRate 0.1670 Epoch: 5 Global Step: 13350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:21:00,008-Speed 12763.40 samples/sec Loss 8.0296 LearningRate 0.1669 Epoch: 5 Global Step: 13360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:21:01,590-Speed 12955.78 samples/sec Loss 8.1415 LearningRate 0.1669 Epoch: 5 Global Step: 13370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:21:03,182-Speed 12872.24 samples/sec Loss 8.1307 LearningRate 0.1669 Epoch: 5 Global Step: 13380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:21:04,757-Speed 13006.73 samples/sec Loss 8.1847 LearningRate 0.1668 Epoch: 5 Global Step: 13390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:21:06,317-Speed 13137.87 samples/sec Loss 8.1938 LearningRate 0.1668 Epoch: 5 Global Step: 13400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:21:07,911-Speed 12870.51 samples/sec Loss 8.2451 LearningRate 0.1667 Epoch: 5 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:09,521-Speed 12732.42 samples/sec Loss 8.1245 LearningRate 0.1667 Epoch: 5 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:11,139-Speed 12666.94 samples/sec Loss 8.1550 LearningRate 0.1667 Epoch: 5 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:12,759-Speed 12647.45 samples/sec Loss 8.3299 LearningRate 0.1666 Epoch: 5 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:14,340-Speed 12963.61 samples/sec Loss 8.2172 LearningRate 0.1666 Epoch: 5 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:15,917-Speed 12995.86 samples/sec Loss 8.2429 LearningRate 0.1666 Epoch: 5 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:17,503-Speed 12918.49 samples/sec Loss 8.1666 LearningRate 0.1665 Epoch: 5 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:19,091-Speed 12907.38 samples/sec Loss 8.2114 LearningRate 0.1665 Epoch: 5 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:20,716-Speed 12611.27 samples/sec Loss 8.0890 LearningRate 0.1664 Epoch: 5 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:22,283-Speed 13080.84 samples/sec Loss 8.1767 LearningRate 0.1664 Epoch: 5 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:23,871-Speed 12903.88 samples/sec Loss 8.1632 LearningRate 0.1664 Epoch: 5 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:25,455-Speed 12942.60 samples/sec Loss 8.1359 LearningRate 0.1663 Epoch: 5 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:27,087-Speed 12552.56 samples/sec Loss 8.3196 LearningRate 0.1663 Epoch: 5 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:28,678-Speed 12878.45 samples/sec Loss 8.2685 LearningRate 0.1663 Epoch: 5 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:30,274-Speed 12840.52 samples/sec Loss 8.1513 LearningRate 0.1662 Epoch: 5 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:31,857-Speed 12943.36 samples/sec Loss 8.2609 LearningRate 0.1662 Epoch: 5 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:33,472-Speed 12696.69 samples/sec Loss 8.3060 LearningRate 0.1661 Epoch: 5 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:35,064-Speed 12873.79 samples/sec Loss 8.2309 LearningRate 0.1661 Epoch: 5 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:36,642-Speed 12984.48 samples/sec Loss 8.2602 LearningRate 0.1661 Epoch: 5 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:38,210-Speed 13071.41 samples/sec Loss 8.1458 LearningRate 0.1660 Epoch: 5 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:39,802-Speed 12873.52 samples/sec Loss 8.1798 LearningRate 0.1660 Epoch: 5 Global Step: 13610 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:21:41,375-Speed 13022.83 samples/sec Loss 8.2008 LearningRate 0.1660 Epoch: 5 Global Step: 13620 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:21:42,932-Speed 13165.13 samples/sec Loss 8.3120 LearningRate 0.1659 Epoch: 5 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:44,491-Speed 13147.89 samples/sec Loss 8.1601 LearningRate 0.1659 Epoch: 5 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:46,105-Speed 12695.86 samples/sec Loss 8.2917 LearningRate 0.1658 Epoch: 5 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:47,713-Speed 12741.16 samples/sec Loss 8.2481 LearningRate 0.1658 Epoch: 5 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:49,281-Speed 13079.36 samples/sec Loss 8.1372 LearningRate 0.1658 Epoch: 5 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:50,856-Speed 13010.46 samples/sec Loss 8.2513 LearningRate 0.1657 Epoch: 5 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:52,440-Speed 12943.32 samples/sec Loss 8.2068 LearningRate 0.1657 Epoch: 5 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:54,000-Speed 13138.17 samples/sec Loss 8.2112 LearningRate 0.1656 Epoch: 5 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:55,623-Speed 12630.41 samples/sec Loss 8.1625 LearningRate 0.1656 Epoch: 5 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:57,204-Speed 12961.16 samples/sec Loss 8.1505 LearningRate 0.1656 Epoch: 5 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:21:58,796-Speed 12868.59 samples/sec Loss 8.1354 LearningRate 0.1655 Epoch: 5 Global Step: 13730 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:22:00,367-Speed 13043.10 samples/sec Loss 8.2795 LearningRate 0.1655 Epoch: 5 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:01,952-Speed 12931.21 samples/sec Loss 8.1988 LearningRate 0.1655 Epoch: 5 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:03,527-Speed 13009.28 samples/sec Loss 8.2598 LearningRate 0.1654 Epoch: 5 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:05,102-Speed 13014.93 samples/sec Loss 8.2074 LearningRate 0.1654 Epoch: 5 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:06,666-Speed 13106.19 samples/sec Loss 8.2140 LearningRate 0.1653 Epoch: 5 Global Step: 13780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:08,250-Speed 12940.06 samples/sec Loss 8.1101 LearningRate 0.1653 Epoch: 5 Global Step: 13790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:09,851-Speed 12799.26 samples/sec Loss 8.2174 LearningRate 0.1653 Epoch: 5 Global Step: 13800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:11,444-Speed 12884.55 samples/sec Loss 8.2600 LearningRate 0.1652 Epoch: 5 Global Step: 13810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:13,007-Speed 13116.96 samples/sec Loss 8.2261 LearningRate 0.1652 Epoch: 5 Global Step: 13820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:14,602-Speed 12841.98 samples/sec Loss 8.1159 LearningRate 0.1652 Epoch: 5 Global Step: 13830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:16,197-Speed 12848.06 samples/sec Loss 8.1508 LearningRate 0.1651 Epoch: 5 Global Step: 13840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:17,788-Speed 12877.86 samples/sec Loss 8.2651 LearningRate 0.1651 Epoch: 5 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:19,358-Speed 13057.19 samples/sec Loss 8.2013 LearningRate 0.1650 Epoch: 5 Global Step: 13860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:20,991-Speed 12548.82 samples/sec Loss 8.2730 LearningRate 0.1650 Epoch: 5 Global Step: 13870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:22,559-Speed 13073.61 samples/sec Loss 8.2152 LearningRate 0.1650 Epoch: 5 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:24,145-Speed 12921.03 samples/sec Loss 8.2776 LearningRate 0.1649 Epoch: 5 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:25,725-Speed 12966.99 samples/sec Loss 8.1461 LearningRate 0.1649 Epoch: 5 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:27,299-Speed 13023.42 samples/sec Loss 8.2418 LearningRate 0.1649 Epoch: 5 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:28,870-Speed 13047.86 samples/sec Loss 8.2879 LearningRate 0.1648 Epoch: 5 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:30,490-Speed 12652.79 samples/sec Loss 8.2575 LearningRate 0.1648 Epoch: 5 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:32,077-Speed 12910.98 samples/sec Loss 8.1904 LearningRate 0.1647 Epoch: 5 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:33,666-Speed 12894.20 samples/sec Loss 8.2380 LearningRate 0.1647 Epoch: 5 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:35,249-Speed 12946.33 samples/sec Loss 8.1859 LearningRate 0.1647 Epoch: 5 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:36,823-Speed 13023.05 samples/sec Loss 8.2492 LearningRate 0.1646 Epoch: 5 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:38,399-Speed 13006.15 samples/sec Loss 8.3158 LearningRate 0.1646 Epoch: 5 Global Step: 13980 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:22:39,983-Speed 12940.67 samples/sec Loss 8.3022 LearningRate 0.1646 Epoch: 5 Global Step: 13990 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:22:41,543-Speed 13132.03 samples/sec Loss 8.3332 LearningRate 0.1645 Epoch: 5 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:43,109-Speed 13117.86 samples/sec Loss 8.2368 LearningRate 0.1645 Epoch: 5 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:44,710-Speed 12803.16 samples/sec Loss 8.3451 LearningRate 0.1644 Epoch: 5 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:22:46,262-Speed 13206.57 samples/sec Loss 8.1399 LearningRate 0.1644 Epoch: 5 Global Step: 14030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:47,839-Speed 12990.10 samples/sec Loss 8.1419 LearningRate 0.1644 Epoch: 5 Global Step: 14040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:49,434-Speed 12855.68 samples/sec Loss 8.2823 LearningRate 0.1643 Epoch: 5 Global Step: 14050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:51,065-Speed 12567.25 samples/sec Loss 8.2437 LearningRate 0.1643 Epoch: 5 Global Step: 14060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:52,644-Speed 12978.82 samples/sec Loss 8.2458 LearningRate 0.1642 Epoch: 5 Global Step: 14070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:54,224-Speed 12963.21 samples/sec Loss 8.1993 LearningRate 0.1642 Epoch: 5 Global Step: 14080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:55,813-Speed 12900.72 samples/sec Loss 8.1546 LearningRate 0.1642 Epoch: 5 Global Step: 14090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:57,394-Speed 12963.76 samples/sec Loss 8.2378 LearningRate 0.1641 Epoch: 5 Global Step: 14100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:22:58,993-Speed 12818.01 samples/sec Loss 8.1875 LearningRate 0.1641 Epoch: 5 Global Step: 14110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:00,602-Speed 12734.83 samples/sec Loss 8.2417 LearningRate 0.1641 Epoch: 5 Global Step: 14120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:02,178-Speed 13006.10 samples/sec Loss 8.2833 LearningRate 0.1640 Epoch: 5 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:03,756-Speed 12985.03 samples/sec Loss 8.1742 LearningRate 0.1640 Epoch: 5 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:05,342-Speed 12921.77 samples/sec Loss 8.1663 LearningRate 0.1639 Epoch: 5 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:06,978-Speed 12528.69 samples/sec Loss 8.1524 LearningRate 0.1639 Epoch: 5 Global Step: 14160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:08,566-Speed 12907.68 samples/sec Loss 8.2987 LearningRate 0.1639 Epoch: 5 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:10,161-Speed 12846.34 samples/sec Loss 8.1807 LearningRate 0.1638 Epoch: 5 Global Step: 14180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:11,758-Speed 12831.07 samples/sec Loss 8.1935 LearningRate 0.1638 Epoch: 5 Global Step: 14190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:13,337-Speed 12973.36 samples/sec Loss 8.2751 LearningRate 0.1638 Epoch: 5 Global Step: 14200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:14,934-Speed 12839.82 samples/sec Loss 8.1987 LearningRate 0.1637 Epoch: 5 Global Step: 14210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:16,535-Speed 12800.25 samples/sec Loss 8.1745 LearningRate 0.1637 Epoch: 5 Global Step: 14220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:18,116-Speed 12956.69 samples/sec Loss 8.2214 LearningRate 0.1636 Epoch: 5 Global Step: 14230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:19,698-Speed 12964.06 samples/sec Loss 8.2174 LearningRate 0.1636 Epoch: 5 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:21,290-Speed 12871.59 samples/sec Loss 8.2806 LearningRate 0.1636 Epoch: 5 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:22,865-Speed 13007.53 samples/sec Loss 8.2309 LearningRate 0.1635 Epoch: 5 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:24,471-Speed 12759.83 samples/sec Loss 8.2130 LearningRate 0.1635 Epoch: 5 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:26,063-Speed 12873.60 samples/sec Loss 8.1900 LearningRate 0.1635 Epoch: 5 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:27,637-Speed 13018.43 samples/sec Loss 8.2222 LearningRate 0.1634 Epoch: 5 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:29,232-Speed 12849.63 samples/sec Loss 8.2094 LearningRate 0.1634 Epoch: 5 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:30,812-Speed 12978.35 samples/sec Loss 8.2170 LearningRate 0.1633 Epoch: 5 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:32,423-Speed 12737.82 samples/sec Loss 8.1818 LearningRate 0.1633 Epoch: 5 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:34,003-Speed 12979.51 samples/sec Loss 8.1304 LearningRate 0.1633 Epoch: 5 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:35,591-Speed 12900.13 samples/sec Loss 8.3471 LearningRate 0.1632 Epoch: 5 Global Step: 14340 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:23:37,186-Speed 12843.63 samples/sec Loss 8.1952 LearningRate 0.1632 Epoch: 5 Global Step: 14350 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:23:38,766-Speed 12979.37 samples/sec Loss 8.2127 LearningRate 0.1632 Epoch: 5 Global Step: 14360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:40,332-Speed 13082.05 samples/sec Loss 8.1773 LearningRate 0.1631 Epoch: 5 Global Step: 14370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:41,927-Speed 12844.70 samples/sec Loss 8.1805 LearningRate 0.1631 Epoch: 5 Global Step: 14380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:43,525-Speed 12828.43 samples/sec Loss 8.2184 LearningRate 0.1630 Epoch: 5 Global Step: 14390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:45,102-Speed 12991.66 samples/sec Loss 8.1439 LearningRate 0.1630 Epoch: 5 Global Step: 14400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:46,700-Speed 12825.09 samples/sec Loss 8.1628 LearningRate 0.1630 Epoch: 5 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:48,307-Speed 12747.32 samples/sec Loss 8.1951 LearningRate 0.1629 Epoch: 5 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:49,916-Speed 12737.42 samples/sec Loss 8.2492 LearningRate 0.1629 Epoch: 5 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:51,497-Speed 12964.73 samples/sec Loss 8.2285 LearningRate 0.1629 Epoch: 5 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:53,103-Speed 12764.21 samples/sec Loss 8.1864 LearningRate 0.1628 Epoch: 5 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:23:54,641-Speed 13324.73 samples/sec Loss 8.2231 LearningRate 0.1628 Epoch: 5 Global Step: 14460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:56,237-Speed 12842.59 samples/sec Loss 8.1532 LearningRate 0.1627 Epoch: 5 Global Step: 14470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:57,826-Speed 12894.94 samples/sec Loss 8.2477 LearningRate 0.1627 Epoch: 5 Global Step: 14480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:23:59,397-Speed 13043.15 samples/sec Loss 8.2435 LearningRate 0.1627 Epoch: 5 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:01,025-Speed 12591.19 samples/sec Loss 8.1405 LearningRate 0.1626 Epoch: 5 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:02,594-Speed 13062.09 samples/sec Loss 8.1612 LearningRate 0.1626 Epoch: 5 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:04,171-Speed 12997.92 samples/sec Loss 8.0965 LearningRate 0.1626 Epoch: 5 Global Step: 14520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:05,747-Speed 13001.20 samples/sec Loss 8.2026 LearningRate 0.1625 Epoch: 5 Global Step: 14530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:07,356-Speed 12733.39 samples/sec Loss 8.1593 LearningRate 0.1625 Epoch: 5 Global Step: 14540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:08,938-Speed 12959.38 samples/sec Loss 8.2241 LearningRate 0.1624 Epoch: 5 Global Step: 14550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:10,506-Speed 13073.55 samples/sec Loss 8.1141 LearningRate 0.1624 Epoch: 5 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:12,096-Speed 12889.78 samples/sec Loss 8.1777 LearningRate 0.1624 Epoch: 5 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:13,660-Speed 13100.03 samples/sec Loss 8.1523 LearningRate 0.1623 Epoch: 5 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:15,254-Speed 12859.56 samples/sec Loss 8.2061 LearningRate 0.1623 Epoch: 5 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:16,855-Speed 12793.73 samples/sec Loss 8.0502 LearningRate 0.1623 Epoch: 5 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:18,431-Speed 13007.57 samples/sec Loss 8.2108 LearningRate 0.1622 Epoch: 5 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:20,030-Speed 12817.05 samples/sec Loss 8.1482 LearningRate 0.1622 Epoch: 5 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:21,618-Speed 12908.00 samples/sec Loss 8.0652 LearningRate 0.1621 Epoch: 5 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:23,183-Speed 13099.37 samples/sec Loss 8.1673 LearningRate 0.1621 Epoch: 5 Global Step: 14640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:24,769-Speed 12922.91 samples/sec Loss 8.1139 LearningRate 0.1621 Epoch: 5 Global Step: 14650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:26,353-Speed 12942.09 samples/sec Loss 8.1639 LearningRate 0.1620 Epoch: 5 Global Step: 14660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:27,952-Speed 12812.13 samples/sec Loss 8.1106 LearningRate 0.1620 Epoch: 5 Global Step: 14670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:29,524-Speed 13035.76 samples/sec Loss 8.1233 LearningRate 0.1620 Epoch: 5 Global Step: 14680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:31,132-Speed 12748.61 samples/sec Loss 8.1807 LearningRate 0.1619 Epoch: 5 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:32,701-Speed 13054.92 samples/sec Loss 8.1808 LearningRate 0.1619 Epoch: 5 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:34,275-Speed 13025.94 samples/sec Loss 8.3091 LearningRate 0.1618 Epoch: 5 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:35,862-Speed 12916.70 samples/sec Loss 8.1543 LearningRate 0.1618 Epoch: 5 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:37,452-Speed 12888.57 samples/sec Loss 8.1322 LearningRate 0.1618 Epoch: 5 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:39,057-Speed 12775.31 samples/sec Loss 8.1484 LearningRate 0.1617 Epoch: 5 Global Step: 14740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:40,633-Speed 13002.05 samples/sec Loss 8.2054 LearningRate 0.1617 Epoch: 5 Global Step: 14750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:42,230-Speed 12831.85 samples/sec Loss 8.1575 LearningRate 0.1617 Epoch: 5 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:43,804-Speed 13023.76 samples/sec Loss 8.2711 LearningRate 0.1616 Epoch: 5 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:45,392-Speed 12903.32 samples/sec Loss 8.1907 LearningRate 0.1616 Epoch: 5 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:46,954-Speed 13122.77 samples/sec Loss 8.0360 LearningRate 0.1615 Epoch: 5 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:48,572-Speed 12665.51 samples/sec Loss 8.1089 LearningRate 0.1615 Epoch: 5 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:50,168-Speed 12841.15 samples/sec Loss 8.1789 LearningRate 0.1615 Epoch: 5 Global Step: 14810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:51,737-Speed 13058.84 samples/sec Loss 8.1109 LearningRate 0.1614 Epoch: 5 Global Step: 14820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:53,328-Speed 12884.76 samples/sec Loss 8.2579 LearningRate 0.1614 Epoch: 5 Global Step: 14830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:54,935-Speed 12755.55 samples/sec Loss 8.2113 LearningRate 0.1614 Epoch: 5 Global Step: 14840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:56,528-Speed 12896.14 samples/sec Loss 8.0499 LearningRate 0.1613 Epoch: 5 Global Step: 14850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:24:58,106-Speed 12979.33 samples/sec Loss 8.1886 LearningRate 0.1613 Epoch: 5 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:24:59,688-Speed 12956.18 samples/sec Loss 8.1728 LearningRate 0.1612 Epoch: 5 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:25:01,261-Speed 13033.05 samples/sec Loss 8.1338 LearningRate 0.1612 Epoch: 5 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:25:02,854-Speed 12873.19 samples/sec Loss 8.2196 LearningRate 0.1612 Epoch: 5 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:25:04,437-Speed 12938.26 samples/sec Loss 8.2002 LearningRate 0.1611 Epoch: 5 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:25:06,025-Speed 12907.54 samples/sec Loss 8.1142 LearningRate 0.1611 Epoch: 5 Global Step: 14910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:25:07,613-Speed 12906.76 samples/sec Loss 8.0377 LearningRate 0.1611 Epoch: 5 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:25:09,196-Speed 12944.66 samples/sec Loss 8.1349 LearningRate 0.1610 Epoch: 5 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:25:10,765-Speed 13061.15 samples/sec Loss 8.1259 LearningRate 0.1610 Epoch: 5 Global Step: 14940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:25:12,352-Speed 12919.33 samples/sec Loss 8.0102 LearningRate 0.1609 Epoch: 5 Global Step: 14950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:25:13,929-Speed 12997.61 samples/sec Loss 8.0591 LearningRate 0.1609 Epoch: 5 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:25:15,515-Speed 12917.11 samples/sec Loss 8.2396 LearningRate 0.1609 Epoch: 5 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:25:17,114-Speed 12817.77 samples/sec Loss 8.1354 LearningRate 0.1608 Epoch: 5 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:25:18,713-Speed 12817.90 samples/sec Loss 8.1559 LearningRate 0.1608 Epoch: 5 Global Step: 14990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:25:20,300-Speed 12913.59 samples/sec Loss 8.0471 LearningRate 0.1608 Epoch: 5 Global Step: 15000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:25:42,455-[lfw][15000]XNorm: 13.172281 Training: 2022-01-14 14:25:42,456-[lfw][15000]Accuracy-Flip: 0.99467+-0.00433 Training: 2022-01-14 14:25:42,456-[lfw][15000]Accuracy-Highest: 0.99467 Training: 2022-01-14 14:26:07,753-[cfp_fp][15000]XNorm: 11.060920 Training: 2022-01-14 14:26:07,754-[cfp_fp][15000]Accuracy-Flip: 0.93686+-0.01698 Training: 2022-01-14 14:26:07,754-[cfp_fp][15000]Accuracy-Highest: 0.93686 Training: 2022-01-14 14:26:29,739-[agedb_30][15000]XNorm: 12.850334 Training: 2022-01-14 14:26:29,740-[agedb_30][15000]Accuracy-Flip: 0.94883+-0.01169 Training: 2022-01-14 14:26:29,740-[agedb_30][15000]Accuracy-Highest: 0.94883 Training: 2022-01-14 14:26:31,369-Speed 288.18 samples/sec Loss 8.1043 LearningRate 0.1607 Epoch: 5 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:26:32,963-Speed 12857.44 samples/sec Loss 8.1772 LearningRate 0.1607 Epoch: 5 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:26:34,524-Speed 13126.80 samples/sec Loss 8.2378 LearningRate 0.1606 Epoch: 5 Global Step: 15030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:26:36,088-Speed 13110.12 samples/sec Loss 8.1740 LearningRate 0.1606 Epoch: 5 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:37,693-Speed 12765.06 samples/sec Loss 8.1763 LearningRate 0.1606 Epoch: 5 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:39,318-Speed 12612.23 samples/sec Loss 8.2197 LearningRate 0.1605 Epoch: 5 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:40,911-Speed 12862.92 samples/sec Loss 8.1320 LearningRate 0.1605 Epoch: 5 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:42,482-Speed 13049.73 samples/sec Loss 8.0855 LearningRate 0.1605 Epoch: 5 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:44,077-Speed 12842.06 samples/sec Loss 8.0644 LearningRate 0.1604 Epoch: 5 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:45,685-Speed 12746.38 samples/sec Loss 8.1284 LearningRate 0.1604 Epoch: 5 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:47,259-Speed 13015.07 samples/sec Loss 8.0616 LearningRate 0.1603 Epoch: 5 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:48,864-Speed 12772.19 samples/sec Loss 8.0277 LearningRate 0.1603 Epoch: 5 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:50,439-Speed 13013.83 samples/sec Loss 8.1489 LearningRate 0.1603 Epoch: 5 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:26:52,027-Speed 12906.75 samples/sec Loss 8.1352 LearningRate 0.1602 Epoch: 5 Global Step: 15140 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:26:53,653-Speed 12603.88 samples/sec Loss 8.1068 LearningRate 0.1602 Epoch: 5 Global Step: 15150 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:26:55,333-Speed 12196.57 samples/sec Loss 8.1315 LearningRate 0.1602 Epoch: 5 Global Step: 15160 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:26:56,853-Speed 13479.31 samples/sec Loss 8.2884 LearningRate 0.1601 Epoch: 5 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:11,539-Speed 1394.66 samples/sec Loss 7.6183 LearningRate 0.1601 Epoch: 6 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:13,160-Speed 12645.07 samples/sec Loss 7.1617 LearningRate 0.1601 Epoch: 6 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:14,747-Speed 12918.53 samples/sec Loss 7.1341 LearningRate 0.1600 Epoch: 6 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:16,352-Speed 12765.15 samples/sec Loss 7.1596 LearningRate 0.1600 Epoch: 6 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:17,923-Speed 13053.46 samples/sec Loss 7.1561 LearningRate 0.1599 Epoch: 6 Global Step: 15220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:19,563-Speed 12494.60 samples/sec Loss 7.2748 LearningRate 0.1599 Epoch: 6 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:21,197-Speed 12542.42 samples/sec Loss 7.2646 LearningRate 0.1599 Epoch: 6 Global Step: 15240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:22,794-Speed 12830.07 samples/sec Loss 7.2911 LearningRate 0.1598 Epoch: 6 Global Step: 15250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:24,398-Speed 12789.62 samples/sec Loss 7.3176 LearningRate 0.1598 Epoch: 6 Global Step: 15260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:25,997-Speed 12818.70 samples/sec Loss 7.3970 LearningRate 0.1598 Epoch: 6 Global Step: 15270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:27,599-Speed 12789.06 samples/sec Loss 7.3494 LearningRate 0.1597 Epoch: 6 Global Step: 15280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:29,190-Speed 12882.66 samples/sec Loss 7.3069 LearningRate 0.1597 Epoch: 6 Global Step: 15290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:30,764-Speed 13019.75 samples/sec Loss 7.3382 LearningRate 0.1596 Epoch: 6 Global Step: 15300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:32,377-Speed 12707.59 samples/sec Loss 7.3081 LearningRate 0.1596 Epoch: 6 Global Step: 15310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:27:33,956-Speed 12978.30 samples/sec Loss 7.4017 LearningRate 0.1596 Epoch: 6 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:35,559-Speed 12814.83 samples/sec Loss 7.4030 LearningRate 0.1595 Epoch: 6 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:37,180-Speed 12642.09 samples/sec Loss 7.3095 LearningRate 0.1595 Epoch: 6 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:38,759-Speed 12985.29 samples/sec Loss 7.4311 LearningRate 0.1595 Epoch: 6 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:40,361-Speed 12795.86 samples/sec Loss 7.3680 LearningRate 0.1594 Epoch: 6 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:41,965-Speed 12772.24 samples/sec Loss 7.3840 LearningRate 0.1594 Epoch: 6 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:43,551-Speed 12922.66 samples/sec Loss 7.4160 LearningRate 0.1593 Epoch: 6 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:45,148-Speed 12834.09 samples/sec Loss 7.5476 LearningRate 0.1593 Epoch: 6 Global Step: 15390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:27:46,698-Speed 13222.27 samples/sec Loss 7.5092 LearningRate 0.1593 Epoch: 6 Global Step: 15400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:48,275-Speed 12997.56 samples/sec Loss 7.5403 LearningRate 0.1592 Epoch: 6 Global Step: 15410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:49,897-Speed 12637.32 samples/sec Loss 7.5377 LearningRate 0.1592 Epoch: 6 Global Step: 15420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:51,478-Speed 12963.43 samples/sec Loss 7.5343 LearningRate 0.1592 Epoch: 6 Global Step: 15430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:53,088-Speed 12725.89 samples/sec Loss 7.5347 LearningRate 0.1591 Epoch: 6 Global Step: 15440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:54,707-Speed 12657.64 samples/sec Loss 7.5871 LearningRate 0.1591 Epoch: 6 Global Step: 15450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:56,294-Speed 12914.23 samples/sec Loss 7.5582 LearningRate 0.1590 Epoch: 6 Global Step: 15460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:57,894-Speed 12812.84 samples/sec Loss 7.5243 LearningRate 0.1590 Epoch: 6 Global Step: 15470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:27:59,485-Speed 12883.52 samples/sec Loss 7.6055 LearningRate 0.1590 Epoch: 6 Global Step: 15480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:28:01,060-Speed 13007.58 samples/sec Loss 7.5770 LearningRate 0.1589 Epoch: 6 Global Step: 15490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:28:02,652-Speed 12872.64 samples/sec Loss 7.5345 LearningRate 0.1589 Epoch: 6 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:04,226-Speed 13021.50 samples/sec Loss 7.5843 LearningRate 0.1589 Epoch: 6 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:05,837-Speed 12728.44 samples/sec Loss 7.5738 LearningRate 0.1588 Epoch: 6 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:07,427-Speed 12886.93 samples/sec Loss 7.5821 LearningRate 0.1588 Epoch: 6 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:09,040-Speed 12698.80 samples/sec Loss 7.5985 LearningRate 0.1588 Epoch: 6 Global Step: 15540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:10,638-Speed 12831.32 samples/sec Loss 7.5960 LearningRate 0.1587 Epoch: 6 Global Step: 15550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:12,234-Speed 12844.94 samples/sec Loss 7.5603 LearningRate 0.1587 Epoch: 6 Global Step: 15560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:13,852-Speed 12664.96 samples/sec Loss 7.6500 LearningRate 0.1586 Epoch: 6 Global Step: 15570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:15,437-Speed 12931.98 samples/sec Loss 7.6208 LearningRate 0.1586 Epoch: 6 Global Step: 15580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:17,011-Speed 13017.79 samples/sec Loss 7.7278 LearningRate 0.1586 Epoch: 6 Global Step: 15590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:28:18,644-Speed 12549.49 samples/sec Loss 7.6728 LearningRate 0.1585 Epoch: 6 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:20,197-Speed 13210.41 samples/sec Loss 7.6621 LearningRate 0.1585 Epoch: 6 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:21,764-Speed 13079.33 samples/sec Loss 7.6343 LearningRate 0.1585 Epoch: 6 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:23,348-Speed 12934.89 samples/sec Loss 7.7065 LearningRate 0.1584 Epoch: 6 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:24,923-Speed 13016.10 samples/sec Loss 7.6354 LearningRate 0.1584 Epoch: 6 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:26,521-Speed 12825.00 samples/sec Loss 7.7062 LearningRate 0.1583 Epoch: 6 Global Step: 15650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:28,094-Speed 13020.44 samples/sec Loss 7.6184 LearningRate 0.1583 Epoch: 6 Global Step: 15660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:29,700-Speed 12776.06 samples/sec Loss 7.7616 LearningRate 0.1583 Epoch: 6 Global Step: 15670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:31,270-Speed 13053.98 samples/sec Loss 7.7135 LearningRate 0.1582 Epoch: 6 Global Step: 15680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:32,879-Speed 12735.54 samples/sec Loss 7.6682 LearningRate 0.1582 Epoch: 6 Global Step: 15690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:34,457-Speed 12983.98 samples/sec Loss 7.6885 LearningRate 0.1582 Epoch: 6 Global Step: 15700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:28:36,045-Speed 12920.68 samples/sec Loss 7.6350 LearningRate 0.1581 Epoch: 6 Global Step: 15710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:28:37,618-Speed 13028.88 samples/sec Loss 7.6693 LearningRate 0.1581 Epoch: 6 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:39,258-Speed 12502.27 samples/sec Loss 7.6948 LearningRate 0.1580 Epoch: 6 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:40,856-Speed 12821.85 samples/sec Loss 7.6657 LearningRate 0.1580 Epoch: 6 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:42,439-Speed 12946.65 samples/sec Loss 7.6662 LearningRate 0.1580 Epoch: 6 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:44,015-Speed 13003.02 samples/sec Loss 7.8018 LearningRate 0.1579 Epoch: 6 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:45,575-Speed 13140.99 samples/sec Loss 7.7503 LearningRate 0.1579 Epoch: 6 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:47,163-Speed 12903.74 samples/sec Loss 7.7041 LearningRate 0.1579 Epoch: 6 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:48,776-Speed 12698.43 samples/sec Loss 7.8386 LearningRate 0.1578 Epoch: 6 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:50,351-Speed 13025.20 samples/sec Loss 7.6724 LearningRate 0.1578 Epoch: 6 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:51,969-Speed 12659.47 samples/sec Loss 7.6376 LearningRate 0.1578 Epoch: 6 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:53,534-Speed 13106.15 samples/sec Loss 7.6595 LearningRate 0.1577 Epoch: 6 Global Step: 15820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:28:55,104-Speed 13086.40 samples/sec Loss 7.6914 LearningRate 0.1577 Epoch: 6 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:56,719-Speed 12688.24 samples/sec Loss 7.7068 LearningRate 0.1576 Epoch: 6 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:58,293-Speed 13020.83 samples/sec Loss 7.6802 LearningRate 0.1576 Epoch: 6 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:28:59,901-Speed 12828.59 samples/sec Loss 7.7044 LearningRate 0.1576 Epoch: 6 Global Step: 15860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:01,499-Speed 12822.26 samples/sec Loss 7.7652 LearningRate 0.1575 Epoch: 6 Global Step: 15870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:03,108-Speed 12738.34 samples/sec Loss 7.7696 LearningRate 0.1575 Epoch: 6 Global Step: 15880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:04,702-Speed 12854.32 samples/sec Loss 7.8100 LearningRate 0.1575 Epoch: 6 Global Step: 15890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:06,280-Speed 12990.36 samples/sec Loss 7.7411 LearningRate 0.1574 Epoch: 6 Global Step: 15900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:07,879-Speed 12811.64 samples/sec Loss 7.8480 LearningRate 0.1574 Epoch: 6 Global Step: 15910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:09,509-Speed 12575.43 samples/sec Loss 7.7216 LearningRate 0.1573 Epoch: 6 Global Step: 15920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:11,097-Speed 12912.06 samples/sec Loss 7.7831 LearningRate 0.1573 Epoch: 6 Global Step: 15930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:12,661-Speed 13098.20 samples/sec Loss 7.8580 LearningRate 0.1573 Epoch: 6 Global Step: 15940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:14,265-Speed 12779.44 samples/sec Loss 7.7405 LearningRate 0.1572 Epoch: 6 Global Step: 15950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:29:15,836-Speed 13045.25 samples/sec Loss 7.8036 LearningRate 0.1572 Epoch: 6 Global Step: 15960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:17,440-Speed 12773.89 samples/sec Loss 7.8169 LearningRate 0.1572 Epoch: 6 Global Step: 15970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:19,024-Speed 12944.39 samples/sec Loss 7.8436 LearningRate 0.1571 Epoch: 6 Global Step: 15980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:20,600-Speed 13012.62 samples/sec Loss 7.8559 LearningRate 0.1571 Epoch: 6 Global Step: 15990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:22,172-Speed 13034.26 samples/sec Loss 7.7405 LearningRate 0.1570 Epoch: 6 Global Step: 16000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:23,755-Speed 12946.38 samples/sec Loss 7.8145 LearningRate 0.1570 Epoch: 6 Global Step: 16010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:25,321-Speed 13093.80 samples/sec Loss 7.7237 LearningRate 0.1570 Epoch: 6 Global Step: 16020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:26,894-Speed 13028.34 samples/sec Loss 7.8139 LearningRate 0.1569 Epoch: 6 Global Step: 16030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:28,487-Speed 12881.98 samples/sec Loss 7.8490 LearningRate 0.1569 Epoch: 6 Global Step: 16040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:30,066-Speed 13006.99 samples/sec Loss 7.7876 LearningRate 0.1569 Epoch: 6 Global Step: 16050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:29:31,650-Speed 12939.58 samples/sec Loss 7.7680 LearningRate 0.1568 Epoch: 6 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:33,218-Speed 13070.11 samples/sec Loss 7.8628 LearningRate 0.1568 Epoch: 6 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:34,803-Speed 12927.20 samples/sec Loss 7.7914 LearningRate 0.1568 Epoch: 6 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:36,388-Speed 12930.58 samples/sec Loss 7.8300 LearningRate 0.1567 Epoch: 6 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:37,968-Speed 12982.51 samples/sec Loss 7.8725 LearningRate 0.1567 Epoch: 6 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:39,551-Speed 12942.88 samples/sec Loss 7.7980 LearningRate 0.1566 Epoch: 6 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:41,164-Speed 12705.36 samples/sec Loss 7.8904 LearningRate 0.1566 Epoch: 6 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:42,723-Speed 13146.74 samples/sec Loss 7.8924 LearningRate 0.1566 Epoch: 6 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:44,299-Speed 13008.85 samples/sec Loss 7.7077 LearningRate 0.1565 Epoch: 6 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:45,894-Speed 12853.76 samples/sec Loss 7.7702 LearningRate 0.1565 Epoch: 6 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:47,460-Speed 13086.53 samples/sec Loss 7.8835 LearningRate 0.1565 Epoch: 6 Global Step: 16160 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:29:49,023-Speed 13109.15 samples/sec Loss 7.8677 LearningRate 0.1564 Epoch: 6 Global Step: 16170 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:29:50,642-Speed 12653.35 samples/sec Loss 7.8985 LearningRate 0.1564 Epoch: 6 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:52,213-Speed 13047.82 samples/sec Loss 7.8628 LearningRate 0.1563 Epoch: 6 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:53,780-Speed 13087.31 samples/sec Loss 7.9164 LearningRate 0.1563 Epoch: 6 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:55,378-Speed 12824.60 samples/sec Loss 7.7881 LearningRate 0.1563 Epoch: 6 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:56,942-Speed 13098.04 samples/sec Loss 7.9273 LearningRate 0.1562 Epoch: 6 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:29:58,524-Speed 12960.03 samples/sec Loss 7.8916 LearningRate 0.1562 Epoch: 6 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:00,147-Speed 12631.47 samples/sec Loss 7.8821 LearningRate 0.1562 Epoch: 6 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:01,728-Speed 12958.97 samples/sec Loss 7.9278 LearningRate 0.1561 Epoch: 6 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:03,338-Speed 12727.02 samples/sec Loss 7.8585 LearningRate 0.1561 Epoch: 6 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:04,906-Speed 13075.80 samples/sec Loss 7.8774 LearningRate 0.1561 Epoch: 6 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:06,463-Speed 13167.08 samples/sec Loss 7.7648 LearningRate 0.1560 Epoch: 6 Global Step: 16280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:08,039-Speed 12997.34 samples/sec Loss 7.8163 LearningRate 0.1560 Epoch: 6 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:09,661-Speed 12659.02 samples/sec Loss 7.7397 LearningRate 0.1559 Epoch: 6 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:11,237-Speed 13009.36 samples/sec Loss 7.8202 LearningRate 0.1559 Epoch: 6 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:12,888-Speed 12426.53 samples/sec Loss 7.7492 LearningRate 0.1559 Epoch: 6 Global Step: 16320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:14,497-Speed 12741.07 samples/sec Loss 7.8957 LearningRate 0.1558 Epoch: 6 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:16,070-Speed 13031.49 samples/sec Loss 7.8074 LearningRate 0.1558 Epoch: 6 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:17,655-Speed 12924.98 samples/sec Loss 7.9227 LearningRate 0.1558 Epoch: 6 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:19,243-Speed 12907.39 samples/sec Loss 7.8871 LearningRate 0.1557 Epoch: 6 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:20,804-Speed 13135.81 samples/sec Loss 7.8244 LearningRate 0.1557 Epoch: 6 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:22,399-Speed 12845.98 samples/sec Loss 7.7937 LearningRate 0.1557 Epoch: 6 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:23,969-Speed 13056.56 samples/sec Loss 7.9563 LearningRate 0.1556 Epoch: 6 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:25,538-Speed 13079.62 samples/sec Loss 7.8698 LearningRate 0.1556 Epoch: 6 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:27,131-Speed 12856.76 samples/sec Loss 7.7994 LearningRate 0.1555 Epoch: 6 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:28,710-Speed 12985.86 samples/sec Loss 7.8223 LearningRate 0.1555 Epoch: 6 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:30,289-Speed 13018.41 samples/sec Loss 7.8738 LearningRate 0.1555 Epoch: 6 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:31,886-Speed 12835.77 samples/sec Loss 7.9881 LearningRate 0.1554 Epoch: 6 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:33,465-Speed 12978.51 samples/sec Loss 7.9433 LearningRate 0.1554 Epoch: 6 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:35,054-Speed 12896.21 samples/sec Loss 7.7714 LearningRate 0.1554 Epoch: 6 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:36,635-Speed 12959.50 samples/sec Loss 7.8299 LearningRate 0.1553 Epoch: 6 Global Step: 16470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:38,248-Speed 12716.88 samples/sec Loss 7.9075 LearningRate 0.1553 Epoch: 6 Global Step: 16480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:30:39,799-Speed 13226.94 samples/sec Loss 7.7758 LearningRate 0.1552 Epoch: 6 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:41,359-Speed 13185.85 samples/sec Loss 7.8405 LearningRate 0.1552 Epoch: 6 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:42,926-Speed 13086.75 samples/sec Loss 7.8079 LearningRate 0.1552 Epoch: 6 Global Step: 16510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:44,516-Speed 12881.56 samples/sec Loss 7.7658 LearningRate 0.1551 Epoch: 6 Global Step: 16520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:46,085-Speed 13066.81 samples/sec Loss 7.8220 LearningRate 0.1551 Epoch: 6 Global Step: 16530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:47,647-Speed 13125.12 samples/sec Loss 7.8194 LearningRate 0.1551 Epoch: 6 Global Step: 16540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:49,228-Speed 12958.53 samples/sec Loss 7.9306 LearningRate 0.1550 Epoch: 6 Global Step: 16550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:50,819-Speed 12885.31 samples/sec Loss 7.9001 LearningRate 0.1550 Epoch: 6 Global Step: 16560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:52,402-Speed 12947.01 samples/sec Loss 7.9005 LearningRate 0.1550 Epoch: 6 Global Step: 16570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:53,992-Speed 12894.85 samples/sec Loss 7.8707 LearningRate 0.1549 Epoch: 6 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:55,587-Speed 12849.09 samples/sec Loss 7.9538 LearningRate 0.1549 Epoch: 6 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:30:57,157-Speed 13060.07 samples/sec Loss 7.8523 LearningRate 0.1548 Epoch: 6 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:30:58,783-Speed 12600.80 samples/sec Loss 7.9598 LearningRate 0.1548 Epoch: 6 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:00,344-Speed 13133.56 samples/sec Loss 7.8011 LearningRate 0.1548 Epoch: 6 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:01,926-Speed 12959.58 samples/sec Loss 7.8162 LearningRate 0.1547 Epoch: 6 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:03,539-Speed 12698.15 samples/sec Loss 7.8749 LearningRate 0.1547 Epoch: 6 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:05,129-Speed 12891.75 samples/sec Loss 7.9907 LearningRate 0.1547 Epoch: 6 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:06,721-Speed 12879.23 samples/sec Loss 7.8311 LearningRate 0.1546 Epoch: 6 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:08,304-Speed 12941.55 samples/sec Loss 7.8986 LearningRate 0.1546 Epoch: 6 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:09,871-Speed 13118.82 samples/sec Loss 7.9358 LearningRate 0.1546 Epoch: 6 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:11,463-Speed 12863.52 samples/sec Loss 7.9390 LearningRate 0.1545 Epoch: 6 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:13,069-Speed 12762.71 samples/sec Loss 7.8981 LearningRate 0.1545 Epoch: 6 Global Step: 16700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:31:14,648-Speed 12978.27 samples/sec Loss 7.7799 LearningRate 0.1544 Epoch: 6 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:16,233-Speed 12928.01 samples/sec Loss 7.8566 LearningRate 0.1544 Epoch: 6 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:17,809-Speed 13011.71 samples/sec Loss 7.8293 LearningRate 0.1544 Epoch: 6 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:19,428-Speed 12657.97 samples/sec Loss 7.8432 LearningRate 0.1543 Epoch: 6 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:21,006-Speed 12995.01 samples/sec Loss 7.9061 LearningRate 0.1543 Epoch: 6 Global Step: 16750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:22,612-Speed 12760.34 samples/sec Loss 7.7166 LearningRate 0.1543 Epoch: 6 Global Step: 16760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:24,188-Speed 13004.98 samples/sec Loss 7.8012 LearningRate 0.1542 Epoch: 6 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:25,775-Speed 12912.40 samples/sec Loss 7.7354 LearningRate 0.1542 Epoch: 6 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:27,373-Speed 12828.60 samples/sec Loss 7.8977 LearningRate 0.1541 Epoch: 6 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:28,935-Speed 13114.03 samples/sec Loss 7.9023 LearningRate 0.1541 Epoch: 6 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:30,523-Speed 12913.59 samples/sec Loss 7.8323 LearningRate 0.1541 Epoch: 6 Global Step: 16810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:31:32,062-Speed 13316.06 samples/sec Loss 7.8202 LearningRate 0.1540 Epoch: 6 Global Step: 16820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:33,664-Speed 12795.92 samples/sec Loss 7.9771 LearningRate 0.1540 Epoch: 6 Global Step: 16830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:35,225-Speed 13133.92 samples/sec Loss 7.8777 LearningRate 0.1540 Epoch: 6 Global Step: 16840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:36,790-Speed 13095.31 samples/sec Loss 7.8175 LearningRate 0.1539 Epoch: 6 Global Step: 16850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:38,381-Speed 12874.48 samples/sec Loss 7.8041 LearningRate 0.1539 Epoch: 6 Global Step: 16860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:39,957-Speed 13015.06 samples/sec Loss 7.8499 LearningRate 0.1539 Epoch: 6 Global Step: 16870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:41,540-Speed 12947.13 samples/sec Loss 7.8172 LearningRate 0.1538 Epoch: 6 Global Step: 16880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:43,135-Speed 12852.08 samples/sec Loss 7.9139 LearningRate 0.1538 Epoch: 6 Global Step: 16890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:44,719-Speed 12939.22 samples/sec Loss 7.8201 LearningRate 0.1537 Epoch: 6 Global Step: 16900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:46,311-Speed 12872.42 samples/sec Loss 7.8564 LearningRate 0.1537 Epoch: 6 Global Step: 16910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:31:47,909-Speed 12833.68 samples/sec Loss 7.7118 LearningRate 0.1537 Epoch: 6 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:49,508-Speed 12955.47 samples/sec Loss 7.7515 LearningRate 0.1536 Epoch: 6 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:51,114-Speed 12765.30 samples/sec Loss 7.8332 LearningRate 0.1536 Epoch: 6 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:52,661-Speed 13254.33 samples/sec Loss 7.8484 LearningRate 0.1536 Epoch: 6 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:54,242-Speed 12976.42 samples/sec Loss 7.7971 LearningRate 0.1535 Epoch: 6 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:55,813-Speed 13044.73 samples/sec Loss 7.7599 LearningRate 0.1535 Epoch: 6 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:57,398-Speed 12935.94 samples/sec Loss 7.8015 LearningRate 0.1535 Epoch: 6 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:31:58,970-Speed 13040.14 samples/sec Loss 7.8063 LearningRate 0.1534 Epoch: 6 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:00,537-Speed 13100.31 samples/sec Loss 7.9777 LearningRate 0.1534 Epoch: 6 Global Step: 17000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:02,170-Speed 12545.30 samples/sec Loss 7.8079 LearningRate 0.1533 Epoch: 6 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:03,737-Speed 13083.03 samples/sec Loss 7.9243 LearningRate 0.1533 Epoch: 6 Global Step: 17020 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:32:05,307-Speed 13050.87 samples/sec Loss 7.8752 LearningRate 0.1533 Epoch: 6 Global Step: 17030 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:32:06,877-Speed 13048.38 samples/sec Loss 7.8759 LearningRate 0.1532 Epoch: 6 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:08,456-Speed 12981.57 samples/sec Loss 7.9044 LearningRate 0.1532 Epoch: 6 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:10,026-Speed 13060.82 samples/sec Loss 7.8474 LearningRate 0.1532 Epoch: 6 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:11,599-Speed 13020.06 samples/sec Loss 7.8497 LearningRate 0.1531 Epoch: 6 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:13,227-Speed 12592.53 samples/sec Loss 7.8558 LearningRate 0.1531 Epoch: 6 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:14,809-Speed 12953.12 samples/sec Loss 7.8740 LearningRate 0.1531 Epoch: 6 Global Step: 17090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:16,370-Speed 13123.74 samples/sec Loss 7.7819 LearningRate 0.1530 Epoch: 6 Global Step: 17100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:17,950-Speed 12999.03 samples/sec Loss 7.7805 LearningRate 0.1530 Epoch: 6 Global Step: 17110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:19,538-Speed 12904.58 samples/sec Loss 7.7637 LearningRate 0.1529 Epoch: 6 Global Step: 17120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:21,103-Speed 13095.46 samples/sec Loss 7.7464 LearningRate 0.1529 Epoch: 6 Global Step: 17130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:22,689-Speed 12919.34 samples/sec Loss 7.8232 LearningRate 0.1529 Epoch: 6 Global Step: 17140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:24,282-Speed 12864.61 samples/sec Loss 7.7901 LearningRate 0.1528 Epoch: 6 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:25,866-Speed 12931.44 samples/sec Loss 7.8213 LearningRate 0.1528 Epoch: 6 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:27,458-Speed 12879.94 samples/sec Loss 7.8674 LearningRate 0.1528 Epoch: 6 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:29,063-Speed 12763.94 samples/sec Loss 7.8543 LearningRate 0.1527 Epoch: 6 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:30,630-Speed 13087.33 samples/sec Loss 7.8770 LearningRate 0.1527 Epoch: 6 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:32,213-Speed 12954.68 samples/sec Loss 7.9173 LearningRate 0.1527 Epoch: 6 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:33,780-Speed 13070.48 samples/sec Loss 7.9122 LearningRate 0.1526 Epoch: 6 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:35,404-Speed 12619.19 samples/sec Loss 7.8936 LearningRate 0.1526 Epoch: 6 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:36,974-Speed 13055.60 samples/sec Loss 7.8048 LearningRate 0.1525 Epoch: 6 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:38,553-Speed 12976.32 samples/sec Loss 7.8359 LearningRate 0.1525 Epoch: 6 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:40,104-Speed 13219.40 samples/sec Loss 7.7244 LearningRate 0.1525 Epoch: 6 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:41,701-Speed 12834.85 samples/sec Loss 7.7696 LearningRate 0.1524 Epoch: 6 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:43,267-Speed 13113.40 samples/sec Loss 7.8826 LearningRate 0.1524 Epoch: 6 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:44,860-Speed 12858.03 samples/sec Loss 7.8176 LearningRate 0.1524 Epoch: 6 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:46,442-Speed 12954.70 samples/sec Loss 7.7974 LearningRate 0.1523 Epoch: 6 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:47,998-Speed 13176.96 samples/sec Loss 7.7950 LearningRate 0.1523 Epoch: 6 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:49,560-Speed 13114.62 samples/sec Loss 7.7638 LearningRate 0.1523 Epoch: 6 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:51,162-Speed 12799.19 samples/sec Loss 7.7971 LearningRate 0.1522 Epoch: 6 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:32:52,734-Speed 13044.97 samples/sec Loss 7.8553 LearningRate 0.1522 Epoch: 6 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:54,351-Speed 12669.98 samples/sec Loss 7.8711 LearningRate 0.1521 Epoch: 6 Global Step: 17340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:55,927-Speed 13008.29 samples/sec Loss 7.8590 LearningRate 0.1521 Epoch: 6 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:57,510-Speed 12957.22 samples/sec Loss 7.7691 LearningRate 0.1521 Epoch: 6 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:32:59,070-Speed 13134.83 samples/sec Loss 7.8792 LearningRate 0.1520 Epoch: 6 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:00,649-Speed 12973.18 samples/sec Loss 7.8549 LearningRate 0.1520 Epoch: 6 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:02,239-Speed 12891.63 samples/sec Loss 7.7624 LearningRate 0.1520 Epoch: 6 Global Step: 17390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:03,845-Speed 12758.52 samples/sec Loss 7.8664 LearningRate 0.1519 Epoch: 6 Global Step: 17400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:05,421-Speed 13005.06 samples/sec Loss 7.8851 LearningRate 0.1519 Epoch: 6 Global Step: 17410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:07,010-Speed 12893.21 samples/sec Loss 7.8103 LearningRate 0.1519 Epoch: 6 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:08,579-Speed 13069.54 samples/sec Loss 7.8237 LearningRate 0.1518 Epoch: 6 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:10,155-Speed 13000.79 samples/sec Loss 7.8150 LearningRate 0.1518 Epoch: 6 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:11,739-Speed 12943.56 samples/sec Loss 7.7437 LearningRate 0.1517 Epoch: 6 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:13,344-Speed 12764.17 samples/sec Loss 7.7684 LearningRate 0.1517 Epoch: 6 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:14,927-Speed 12951.22 samples/sec Loss 7.8699 LearningRate 0.1517 Epoch: 6 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:16,496-Speed 13070.91 samples/sec Loss 7.7674 LearningRate 0.1516 Epoch: 6 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:18,067-Speed 13044.58 samples/sec Loss 7.8943 LearningRate 0.1516 Epoch: 6 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:19,649-Speed 12957.70 samples/sec Loss 7.7836 LearningRate 0.1516 Epoch: 6 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:21,236-Speed 12918.12 samples/sec Loss 7.7385 LearningRate 0.1515 Epoch: 6 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:22,852-Speed 12683.09 samples/sec Loss 7.6628 LearningRate 0.1515 Epoch: 6 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:24,437-Speed 12926.39 samples/sec Loss 7.8456 LearningRate 0.1515 Epoch: 6 Global Step: 17530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:26,026-Speed 12899.55 samples/sec Loss 7.8061 LearningRate 0.1514 Epoch: 6 Global Step: 17540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:27,603-Speed 12998.65 samples/sec Loss 7.8015 LearningRate 0.1514 Epoch: 6 Global Step: 17550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:29,199-Speed 12837.32 samples/sec Loss 7.7846 LearningRate 0.1513 Epoch: 6 Global Step: 17560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:30,798-Speed 12820.25 samples/sec Loss 7.8309 LearningRate 0.1513 Epoch: 6 Global Step: 17570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:32,379-Speed 12987.10 samples/sec Loss 7.8675 LearningRate 0.1513 Epoch: 6 Global Step: 17580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:33,933-Speed 13187.30 samples/sec Loss 7.8124 LearningRate 0.1512 Epoch: 6 Global Step: 17590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:35,498-Speed 13099.01 samples/sec Loss 7.8442 LearningRate 0.1512 Epoch: 6 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:37,076-Speed 12982.37 samples/sec Loss 7.8104 LearningRate 0.1512 Epoch: 6 Global Step: 17610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:38,665-Speed 12902.33 samples/sec Loss 7.8885 LearningRate 0.1511 Epoch: 6 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:40,229-Speed 13117.13 samples/sec Loss 7.7562 LearningRate 0.1511 Epoch: 6 Global Step: 17630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:41,830-Speed 12800.49 samples/sec Loss 7.9304 LearningRate 0.1511 Epoch: 6 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:43,410-Speed 12973.12 samples/sec Loss 7.9209 LearningRate 0.1510 Epoch: 6 Global Step: 17650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:33:45,004-Speed 12853.22 samples/sec Loss 7.6783 LearningRate 0.1510 Epoch: 6 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:46,591-Speed 12910.76 samples/sec Loss 7.7596 LearningRate 0.1510 Epoch: 6 Global Step: 17670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:48,192-Speed 12803.29 samples/sec Loss 7.7350 LearningRate 0.1509 Epoch: 6 Global Step: 17680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:49,825-Speed 12546.99 samples/sec Loss 7.6942 LearningRate 0.1509 Epoch: 6 Global Step: 17690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:33:51,339-Speed 13533.07 samples/sec Loss 7.7720 LearningRate 0.1508 Epoch: 6 Global Step: 17700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:07,254-Speed 1286.98 samples/sec Loss 7.2005 LearningRate 0.1508 Epoch: 7 Global Step: 17710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:08,875-Speed 12650.94 samples/sec Loss 6.8726 LearningRate 0.1508 Epoch: 7 Global Step: 17720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:10,440-Speed 13092.26 samples/sec Loss 6.8321 LearningRate 0.1507 Epoch: 7 Global Step: 17730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:12,010-Speed 13056.13 samples/sec Loss 6.8676 LearningRate 0.1507 Epoch: 7 Global Step: 17740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:13,645-Speed 12533.21 samples/sec Loss 6.8718 LearningRate 0.1507 Epoch: 7 Global Step: 17750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:15,229-Speed 12942.72 samples/sec Loss 6.9281 LearningRate 0.1506 Epoch: 7 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:16,805-Speed 13005.24 samples/sec Loss 6.9622 LearningRate 0.1506 Epoch: 7 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:18,383-Speed 12988.44 samples/sec Loss 7.0315 LearningRate 0.1506 Epoch: 7 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:19,981-Speed 12822.03 samples/sec Loss 6.9268 LearningRate 0.1505 Epoch: 7 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:21,558-Speed 13000.27 samples/sec Loss 6.9626 LearningRate 0.1505 Epoch: 7 Global Step: 17800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:23,125-Speed 13072.65 samples/sec Loss 6.9041 LearningRate 0.1504 Epoch: 7 Global Step: 17810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:24,742-Speed 12671.95 samples/sec Loss 6.9540 LearningRate 0.1504 Epoch: 7 Global Step: 17820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:26,310-Speed 13070.15 samples/sec Loss 7.0691 LearningRate 0.1504 Epoch: 7 Global Step: 17830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:27,875-Speed 13137.52 samples/sec Loss 7.0025 LearningRate 0.1503 Epoch: 7 Global Step: 17840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:29,485-Speed 12726.97 samples/sec Loss 7.1616 LearningRate 0.1503 Epoch: 7 Global Step: 17850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:31,052-Speed 13086.18 samples/sec Loss 7.0165 LearningRate 0.1503 Epoch: 7 Global Step: 17860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:32,651-Speed 12808.84 samples/sec Loss 7.1158 LearningRate 0.1502 Epoch: 7 Global Step: 17870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:34,216-Speed 13099.23 samples/sec Loss 7.0437 LearningRate 0.1502 Epoch: 7 Global Step: 17880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:35,790-Speed 13029.06 samples/sec Loss 7.0718 LearningRate 0.1502 Epoch: 7 Global Step: 17890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:37,364-Speed 13017.36 samples/sec Loss 7.0759 LearningRate 0.1501 Epoch: 7 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:38,950-Speed 12921.17 samples/sec Loss 7.0240 LearningRate 0.1501 Epoch: 7 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:40,543-Speed 12866.47 samples/sec Loss 7.1406 LearningRate 0.1500 Epoch: 7 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:42,087-Speed 13275.62 samples/sec Loss 7.1951 LearningRate 0.1500 Epoch: 7 Global Step: 17930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:43,660-Speed 13025.68 samples/sec Loss 7.1840 LearningRate 0.1500 Epoch: 7 Global Step: 17940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:45,300-Speed 12497.86 samples/sec Loss 7.1647 LearningRate 0.1499 Epoch: 7 Global Step: 17950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:46,875-Speed 13008.87 samples/sec Loss 7.1396 LearningRate 0.1499 Epoch: 7 Global Step: 17960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:48,463-Speed 12903.17 samples/sec Loss 7.2114 LearningRate 0.1499 Epoch: 7 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:50,028-Speed 13098.83 samples/sec Loss 7.1748 LearningRate 0.1498 Epoch: 7 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:51,605-Speed 12995.95 samples/sec Loss 7.2139 LearningRate 0.1498 Epoch: 7 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:53,183-Speed 12992.36 samples/sec Loss 7.1929 LearningRate 0.1498 Epoch: 7 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:54,782-Speed 12813.60 samples/sec Loss 7.2042 LearningRate 0.1497 Epoch: 7 Global Step: 18010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:56,348-Speed 13089.64 samples/sec Loss 7.1756 LearningRate 0.1497 Epoch: 7 Global Step: 18020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:34:57,945-Speed 12828.06 samples/sec Loss 7.1962 LearningRate 0.1497 Epoch: 7 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:34:59,507-Speed 13116.90 samples/sec Loss 7.2319 LearningRate 0.1496 Epoch: 7 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:01,087-Speed 12971.15 samples/sec Loss 7.2346 LearningRate 0.1496 Epoch: 7 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:02,676-Speed 12907.73 samples/sec Loss 7.3673 LearningRate 0.1495 Epoch: 7 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:04,236-Speed 13137.97 samples/sec Loss 7.4038 LearningRate 0.1495 Epoch: 7 Global Step: 18070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:05,802-Speed 13087.87 samples/sec Loss 7.2658 LearningRate 0.1495 Epoch: 7 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:07,383-Speed 12965.57 samples/sec Loss 7.2741 LearningRate 0.1494 Epoch: 7 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:08,951-Speed 13070.01 samples/sec Loss 7.2751 LearningRate 0.1494 Epoch: 7 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:10,525-Speed 13027.69 samples/sec Loss 7.2793 LearningRate 0.1494 Epoch: 7 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:12,122-Speed 12828.85 samples/sec Loss 7.3274 LearningRate 0.1493 Epoch: 7 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:13,690-Speed 13065.70 samples/sec Loss 7.4208 LearningRate 0.1493 Epoch: 7 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:15,273-Speed 12952.19 samples/sec Loss 7.3713 LearningRate 0.1493 Epoch: 7 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:16,860-Speed 12909.67 samples/sec Loss 7.3194 LearningRate 0.1492 Epoch: 7 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:18,451-Speed 12882.53 samples/sec Loss 7.2766 LearningRate 0.1492 Epoch: 7 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:20,022-Speed 13046.50 samples/sec Loss 7.3384 LearningRate 0.1491 Epoch: 7 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:21,644-Speed 12634.63 samples/sec Loss 7.3202 LearningRate 0.1491 Epoch: 7 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:23,241-Speed 12832.94 samples/sec Loss 7.3666 LearningRate 0.1491 Epoch: 7 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:24,806-Speed 13096.00 samples/sec Loss 7.3078 LearningRate 0.1490 Epoch: 7 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:26,381-Speed 13017.23 samples/sec Loss 7.3474 LearningRate 0.1490 Epoch: 7 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:27,973-Speed 12870.60 samples/sec Loss 7.2995 LearningRate 0.1490 Epoch: 7 Global Step: 18220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:29,568-Speed 12852.57 samples/sec Loss 7.3206 LearningRate 0.1489 Epoch: 7 Global Step: 18230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:31,143-Speed 13013.44 samples/sec Loss 7.4002 LearningRate 0.1489 Epoch: 7 Global Step: 18240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:32,751-Speed 12749.74 samples/sec Loss 7.2659 LearningRate 0.1489 Epoch: 7 Global Step: 18250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:34,306-Speed 13176.16 samples/sec Loss 7.4324 LearningRate 0.1488 Epoch: 7 Global Step: 18260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:35,875-Speed 13081.33 samples/sec Loss 7.3675 LearningRate 0.1488 Epoch: 7 Global Step: 18270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:37,468-Speed 12867.17 samples/sec Loss 7.4055 LearningRate 0.1488 Epoch: 7 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:39,025-Speed 13156.05 samples/sec Loss 7.4730 LearningRate 0.1487 Epoch: 7 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:40,600-Speed 13017.14 samples/sec Loss 7.4608 LearningRate 0.1487 Epoch: 7 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:42,205-Speed 12764.54 samples/sec Loss 7.4686 LearningRate 0.1486 Epoch: 7 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:35:43,768-Speed 13113.79 samples/sec Loss 7.4295 LearningRate 0.1486 Epoch: 7 Global Step: 18320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:45,343-Speed 13010.04 samples/sec Loss 7.3848 LearningRate 0.1486 Epoch: 7 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:46,933-Speed 12886.88 samples/sec Loss 7.4625 LearningRate 0.1485 Epoch: 7 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:48,534-Speed 12805.45 samples/sec Loss 7.4607 LearningRate 0.1485 Epoch: 7 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:50,097-Speed 13104.26 samples/sec Loss 7.4561 LearningRate 0.1485 Epoch: 7 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:51,666-Speed 13065.62 samples/sec Loss 7.5296 LearningRate 0.1484 Epoch: 7 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:53,247-Speed 12960.98 samples/sec Loss 7.4837 LearningRate 0.1484 Epoch: 7 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:54,853-Speed 12766.38 samples/sec Loss 7.3509 LearningRate 0.1484 Epoch: 7 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:56,400-Speed 13253.00 samples/sec Loss 7.4203 LearningRate 0.1483 Epoch: 7 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:58,012-Speed 12710.70 samples/sec Loss 7.4685 LearningRate 0.1483 Epoch: 7 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:35:59,575-Speed 13103.41 samples/sec Loss 7.4015 LearningRate 0.1483 Epoch: 7 Global Step: 18420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:01,132-Speed 13165.75 samples/sec Loss 7.4128 LearningRate 0.1482 Epoch: 7 Global Step: 18430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:02,725-Speed 12865.88 samples/sec Loss 7.4891 LearningRate 0.1482 Epoch: 7 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:04,305-Speed 12993.35 samples/sec Loss 7.3884 LearningRate 0.1481 Epoch: 7 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:05,861-Speed 13181.15 samples/sec Loss 7.4915 LearningRate 0.1481 Epoch: 7 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:07,469-Speed 12734.40 samples/sec Loss 7.5401 LearningRate 0.1481 Epoch: 7 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:09,035-Speed 13094.20 samples/sec Loss 7.5261 LearningRate 0.1480 Epoch: 7 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:10,610-Speed 13011.48 samples/sec Loss 7.4852 LearningRate 0.1480 Epoch: 7 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:12,179-Speed 13060.05 samples/sec Loss 7.3849 LearningRate 0.1480 Epoch: 7 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:13,755-Speed 13004.10 samples/sec Loss 7.5278 LearningRate 0.1479 Epoch: 7 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:15,332-Speed 12997.32 samples/sec Loss 7.4809 LearningRate 0.1479 Epoch: 7 Global Step: 18520 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:36:16,897-Speed 13101.41 samples/sec Loss 7.5653 LearningRate 0.1479 Epoch: 7 Global Step: 18530 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:36:18,460-Speed 13108.68 samples/sec Loss 7.5394 LearningRate 0.1478 Epoch: 7 Global Step: 18540 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:36:20,061-Speed 12801.08 samples/sec Loss 7.4385 LearningRate 0.1478 Epoch: 7 Global Step: 18550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:36:21,633-Speed 13038.98 samples/sec Loss 7.5665 LearningRate 0.1477 Epoch: 7 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:23,230-Speed 12832.45 samples/sec Loss 7.4365 LearningRate 0.1477 Epoch: 7 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:24,793-Speed 13116.98 samples/sec Loss 7.5041 LearningRate 0.1477 Epoch: 7 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:26,356-Speed 13108.83 samples/sec Loss 7.4738 LearningRate 0.1476 Epoch: 7 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:27,912-Speed 13196.06 samples/sec Loss 7.4828 LearningRate 0.1476 Epoch: 7 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:29,497-Speed 12926.15 samples/sec Loss 7.5500 LearningRate 0.1476 Epoch: 7 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:31,096-Speed 12826.94 samples/sec Loss 7.5499 LearningRate 0.1475 Epoch: 7 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:32,658-Speed 13123.12 samples/sec Loss 7.3906 LearningRate 0.1475 Epoch: 7 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:34,226-Speed 13062.44 samples/sec Loss 7.4414 LearningRate 0.1475 Epoch: 7 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:35,827-Speed 12806.35 samples/sec Loss 7.4881 LearningRate 0.1474 Epoch: 7 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:37,405-Speed 12990.31 samples/sec Loss 7.4371 LearningRate 0.1474 Epoch: 7 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:38,968-Speed 13110.49 samples/sec Loss 7.5302 LearningRate 0.1474 Epoch: 7 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:40,569-Speed 12823.30 samples/sec Loss 7.5439 LearningRate 0.1473 Epoch: 7 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:42,140-Speed 13044.34 samples/sec Loss 7.3747 LearningRate 0.1473 Epoch: 7 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:43,717-Speed 12993.45 samples/sec Loss 7.5807 LearningRate 0.1472 Epoch: 7 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:45,290-Speed 13027.88 samples/sec Loss 7.4261 LearningRate 0.1472 Epoch: 7 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:46,880-Speed 12890.05 samples/sec Loss 7.5524 LearningRate 0.1472 Epoch: 7 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:48,437-Speed 13165.60 samples/sec Loss 7.5587 LearningRate 0.1471 Epoch: 7 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:50,018-Speed 12962.98 samples/sec Loss 7.6020 LearningRate 0.1471 Epoch: 7 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:36:51,627-Speed 12737.37 samples/sec Loss 7.5424 LearningRate 0.1471 Epoch: 7 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:53,188-Speed 13126.43 samples/sec Loss 7.4607 LearningRate 0.1470 Epoch: 7 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:54,776-Speed 12905.16 samples/sec Loss 7.5831 LearningRate 0.1470 Epoch: 7 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:56,352-Speed 13010.55 samples/sec Loss 7.5619 LearningRate 0.1470 Epoch: 7 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:57,910-Speed 13151.73 samples/sec Loss 7.5997 LearningRate 0.1469 Epoch: 7 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:36:59,470-Speed 13139.80 samples/sec Loss 7.5183 LearningRate 0.1469 Epoch: 7 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:01,033-Speed 13116.80 samples/sec Loss 7.5672 LearningRate 0.1469 Epoch: 7 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:02,586-Speed 13189.40 samples/sec Loss 7.5320 LearningRate 0.1468 Epoch: 7 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:04,185-Speed 12820.67 samples/sec Loss 7.5950 LearningRate 0.1468 Epoch: 7 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:05,783-Speed 12822.19 samples/sec Loss 7.5471 LearningRate 0.1467 Epoch: 7 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:07,371-Speed 12911.31 samples/sec Loss 7.4941 LearningRate 0.1467 Epoch: 7 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:08,945-Speed 13013.80 samples/sec Loss 7.4769 LearningRate 0.1467 Epoch: 7 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:10,507-Speed 13121.17 samples/sec Loss 7.4697 LearningRate 0.1466 Epoch: 7 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:12,097-Speed 12895.84 samples/sec Loss 7.5146 LearningRate 0.1466 Epoch: 7 Global Step: 18880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:13,649-Speed 13199.35 samples/sec Loss 7.4880 LearningRate 0.1466 Epoch: 7 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:15,227-Speed 12985.05 samples/sec Loss 7.5956 LearningRate 0.1465 Epoch: 7 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:16,841-Speed 12699.79 samples/sec Loss 7.6097 LearningRate 0.1465 Epoch: 7 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:18,415-Speed 13023.87 samples/sec Loss 7.5693 LearningRate 0.1465 Epoch: 7 Global Step: 18920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:20,001-Speed 12925.55 samples/sec Loss 7.5117 LearningRate 0.1464 Epoch: 7 Global Step: 18930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:21,616-Speed 12695.23 samples/sec Loss 7.5632 LearningRate 0.1464 Epoch: 7 Global Step: 18940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:23,186-Speed 13053.88 samples/sec Loss 7.4953 LearningRate 0.1464 Epoch: 7 Global Step: 18950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:24,773-Speed 12904.49 samples/sec Loss 7.6325 LearningRate 0.1463 Epoch: 7 Global Step: 18960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:26,345-Speed 13050.92 samples/sec Loss 7.6373 LearningRate 0.1463 Epoch: 7 Global Step: 18970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:27,933-Speed 12901.73 samples/sec Loss 7.4920 LearningRate 0.1463 Epoch: 7 Global Step: 18980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:29,499-Speed 13084.27 samples/sec Loss 7.6439 LearningRate 0.1462 Epoch: 7 Global Step: 18990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:31,088-Speed 12901.33 samples/sec Loss 7.5020 LearningRate 0.1462 Epoch: 7 Global Step: 19000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:32,660-Speed 13040.57 samples/sec Loss 7.6122 LearningRate 0.1461 Epoch: 7 Global Step: 19010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:37:34,229-Speed 13056.54 samples/sec Loss 7.6424 LearningRate 0.1461 Epoch: 7 Global Step: 19020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:35,798-Speed 13073.17 samples/sec Loss 7.6283 LearningRate 0.1461 Epoch: 7 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:37,393-Speed 12843.45 samples/sec Loss 7.5446 LearningRate 0.1460 Epoch: 7 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:38,978-Speed 12924.25 samples/sec Loss 7.5337 LearningRate 0.1460 Epoch: 7 Global Step: 19050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:40,572-Speed 12860.69 samples/sec Loss 7.6072 LearningRate 0.1460 Epoch: 7 Global Step: 19060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:42,138-Speed 13087.57 samples/sec Loss 7.5515 LearningRate 0.1459 Epoch: 7 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:43,714-Speed 13000.14 samples/sec Loss 7.6876 LearningRate 0.1459 Epoch: 7 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:45,293-Speed 12981.84 samples/sec Loss 7.4604 LearningRate 0.1459 Epoch: 7 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:46,898-Speed 12766.58 samples/sec Loss 7.5503 LearningRate 0.1458 Epoch: 7 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:48,467-Speed 13067.93 samples/sec Loss 7.4924 LearningRate 0.1458 Epoch: 7 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:50,043-Speed 13002.54 samples/sec Loss 7.5015 LearningRate 0.1458 Epoch: 7 Global Step: 19120 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:37:51,610-Speed 13077.32 samples/sec Loss 7.6016 LearningRate 0.1457 Epoch: 7 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:53,192-Speed 12959.56 samples/sec Loss 7.6180 LearningRate 0.1457 Epoch: 7 Global Step: 19140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:54,792-Speed 12808.56 samples/sec Loss 7.5141 LearningRate 0.1456 Epoch: 7 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:56,380-Speed 12915.83 samples/sec Loss 7.6096 LearningRate 0.1456 Epoch: 7 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:57,966-Speed 12921.68 samples/sec Loss 7.6155 LearningRate 0.1456 Epoch: 7 Global Step: 19170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:37:59,541-Speed 13010.65 samples/sec Loss 7.5604 LearningRate 0.1455 Epoch: 7 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:01,131-Speed 12894.07 samples/sec Loss 7.7333 LearningRate 0.1455 Epoch: 7 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:02,715-Speed 12940.01 samples/sec Loss 7.5275 LearningRate 0.1455 Epoch: 7 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:04,313-Speed 12820.62 samples/sec Loss 7.5056 LearningRate 0.1454 Epoch: 7 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:05,908-Speed 12856.17 samples/sec Loss 7.5738 LearningRate 0.1454 Epoch: 7 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:07,473-Speed 13088.96 samples/sec Loss 7.4500 LearningRate 0.1454 Epoch: 7 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:09,057-Speed 12937.56 samples/sec Loss 7.6264 LearningRate 0.1453 Epoch: 7 Global Step: 19240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:10,654-Speed 12837.13 samples/sec Loss 7.5476 LearningRate 0.1453 Epoch: 7 Global Step: 19250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:12,215-Speed 13129.07 samples/sec Loss 7.6530 LearningRate 0.1453 Epoch: 7 Global Step: 19260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:13,783-Speed 13069.39 samples/sec Loss 7.5113 LearningRate 0.1452 Epoch: 7 Global Step: 19270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:15,367-Speed 12934.70 samples/sec Loss 7.5202 LearningRate 0.1452 Epoch: 7 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:16,924-Speed 13161.88 samples/sec Loss 7.5904 LearningRate 0.1451 Epoch: 7 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:18,540-Speed 12681.58 samples/sec Loss 7.5839 LearningRate 0.1451 Epoch: 7 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:20,081-Speed 13346.25 samples/sec Loss 7.6237 LearningRate 0.1451 Epoch: 7 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:21,642-Speed 13128.62 samples/sec Loss 7.6867 LearningRate 0.1450 Epoch: 7 Global Step: 19320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:23,214-Speed 13033.65 samples/sec Loss 7.5057 LearningRate 0.1450 Epoch: 7 Global Step: 19330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:24,788-Speed 13022.85 samples/sec Loss 7.5854 LearningRate 0.1450 Epoch: 7 Global Step: 19340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:26,377-Speed 12901.58 samples/sec Loss 7.4716 LearningRate 0.1449 Epoch: 7 Global Step: 19350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:27,992-Speed 12684.36 samples/sec Loss 7.6193 LearningRate 0.1449 Epoch: 7 Global Step: 19360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:29,571-Speed 12979.43 samples/sec Loss 7.6449 LearningRate 0.1449 Epoch: 7 Global Step: 19370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:31,146-Speed 13013.19 samples/sec Loss 7.5480 LearningRate 0.1448 Epoch: 7 Global Step: 19380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:32,746-Speed 12808.09 samples/sec Loss 7.5359 LearningRate 0.1448 Epoch: 7 Global Step: 19390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:34,310-Speed 13104.45 samples/sec Loss 7.5071 LearningRate 0.1448 Epoch: 7 Global Step: 19400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:35,895-Speed 12933.74 samples/sec Loss 7.5051 LearningRate 0.1447 Epoch: 7 Global Step: 19410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:37,501-Speed 12754.11 samples/sec Loss 7.6395 LearningRate 0.1447 Epoch: 7 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:39,077-Speed 13008.14 samples/sec Loss 7.4571 LearningRate 0.1447 Epoch: 7 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:40,660-Speed 12949.92 samples/sec Loss 7.5963 LearningRate 0.1446 Epoch: 7 Global Step: 19440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:42,252-Speed 12866.55 samples/sec Loss 7.5253 LearningRate 0.1446 Epoch: 7 Global Step: 19450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:43,830-Speed 13008.40 samples/sec Loss 7.4213 LearningRate 0.1445 Epoch: 7 Global Step: 19460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:45,406-Speed 13074.04 samples/sec Loss 7.5506 LearningRate 0.1445 Epoch: 7 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:46,984-Speed 12983.48 samples/sec Loss 7.6906 LearningRate 0.1445 Epoch: 7 Global Step: 19480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:48,561-Speed 12996.01 samples/sec Loss 7.5603 LearningRate 0.1444 Epoch: 7 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:50,163-Speed 12794.66 samples/sec Loss 7.6069 LearningRate 0.1444 Epoch: 7 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:38:51,724-Speed 13122.70 samples/sec Loss 7.6337 LearningRate 0.1444 Epoch: 7 Global Step: 19510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:53,300-Speed 13011.73 samples/sec Loss 7.5648 LearningRate 0.1443 Epoch: 7 Global Step: 19520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:54,869-Speed 13053.64 samples/sec Loss 7.5605 LearningRate 0.1443 Epoch: 7 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:56,681-Speed 11315.62 samples/sec Loss 7.4709 LearningRate 0.1443 Epoch: 7 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:58,268-Speed 12912.25 samples/sec Loss 7.5022 LearningRate 0.1442 Epoch: 7 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:38:59,828-Speed 13140.40 samples/sec Loss 7.6340 LearningRate 0.1442 Epoch: 7 Global Step: 19560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:01,437-Speed 12735.40 samples/sec Loss 7.4555 LearningRate 0.1442 Epoch: 7 Global Step: 19570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:02,995-Speed 13158.52 samples/sec Loss 7.5203 LearningRate 0.1441 Epoch: 7 Global Step: 19580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:04,573-Speed 12984.15 samples/sec Loss 7.5064 LearningRate 0.1441 Epoch: 7 Global Step: 19590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:06,165-Speed 12876.56 samples/sec Loss 7.5630 LearningRate 0.1441 Epoch: 7 Global Step: 19600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:07,717-Speed 13208.02 samples/sec Loss 7.4929 LearningRate 0.1440 Epoch: 7 Global Step: 19610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:09,292-Speed 13015.67 samples/sec Loss 7.6200 LearningRate 0.1440 Epoch: 7 Global Step: 19620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:10,887-Speed 12863.58 samples/sec Loss 7.5619 LearningRate 0.1439 Epoch: 7 Global Step: 19630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:12,486-Speed 12814.90 samples/sec Loss 7.5378 LearningRate 0.1439 Epoch: 7 Global Step: 19640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:14,102-Speed 12679.35 samples/sec Loss 7.5234 LearningRate 0.1439 Epoch: 7 Global Step: 19650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:15,676-Speed 13035.59 samples/sec Loss 7.5197 LearningRate 0.1438 Epoch: 7 Global Step: 19660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:17,248-Speed 13028.65 samples/sec Loss 7.5594 LearningRate 0.1438 Epoch: 7 Global Step: 19670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:18,826-Speed 12987.21 samples/sec Loss 7.4677 LearningRate 0.1438 Epoch: 7 Global Step: 19680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:20,387-Speed 13129.26 samples/sec Loss 7.5049 LearningRate 0.1437 Epoch: 7 Global Step: 19690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:21,976-Speed 12897.62 samples/sec Loss 7.4360 LearningRate 0.1437 Epoch: 7 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:23,542-Speed 13085.32 samples/sec Loss 7.6309 LearningRate 0.1437 Epoch: 7 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:25,134-Speed 12876.20 samples/sec Loss 7.4895 LearningRate 0.1436 Epoch: 7 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:26,721-Speed 12912.53 samples/sec Loss 7.5391 LearningRate 0.1436 Epoch: 7 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:28,272-Speed 13218.33 samples/sec Loss 7.5649 LearningRate 0.1436 Epoch: 7 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:29,855-Speed 12938.66 samples/sec Loss 7.5950 LearningRate 0.1435 Epoch: 7 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:31,414-Speed 13148.38 samples/sec Loss 7.5242 LearningRate 0.1435 Epoch: 7 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:32,995-Speed 12964.96 samples/sec Loss 7.4835 LearningRate 0.1435 Epoch: 7 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:34,579-Speed 12933.41 samples/sec Loss 7.6086 LearningRate 0.1434 Epoch: 7 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:36,138-Speed 13152.85 samples/sec Loss 7.5757 LearningRate 0.1434 Epoch: 7 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:37,724-Speed 12920.83 samples/sec Loss 7.5019 LearningRate 0.1433 Epoch: 7 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:39,304-Speed 12969.18 samples/sec Loss 7.5741 LearningRate 0.1433 Epoch: 7 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:40,862-Speed 13162.21 samples/sec Loss 7.5172 LearningRate 0.1433 Epoch: 7 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:42,470-Speed 12743.17 samples/sec Loss 7.5068 LearningRate 0.1432 Epoch: 7 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:44,040-Speed 13044.19 samples/sec Loss 7.6156 LearningRate 0.1432 Epoch: 7 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:45,626-Speed 12930.01 samples/sec Loss 7.5239 LearningRate 0.1432 Epoch: 7 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:39:47,190-Speed 13100.03 samples/sec Loss 7.4853 LearningRate 0.1431 Epoch: 7 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:48,782-Speed 12870.41 samples/sec Loss 7.5334 LearningRate 0.1431 Epoch: 7 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:50,358-Speed 13003.32 samples/sec Loss 7.5595 LearningRate 0.1431 Epoch: 7 Global Step: 19880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:51,967-Speed 12743.35 samples/sec Loss 7.6085 LearningRate 0.1430 Epoch: 7 Global Step: 19890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:53,543-Speed 13002.37 samples/sec Loss 7.5817 LearningRate 0.1430 Epoch: 7 Global Step: 19900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:55,119-Speed 13002.23 samples/sec Loss 7.6019 LearningRate 0.1430 Epoch: 7 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:56,732-Speed 12706.87 samples/sec Loss 7.5247 LearningRate 0.1429 Epoch: 7 Global Step: 19920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:58,286-Speed 13190.35 samples/sec Loss 7.4965 LearningRate 0.1429 Epoch: 7 Global Step: 19930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:39:59,863-Speed 12997.02 samples/sec Loss 7.4438 LearningRate 0.1429 Epoch: 7 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:40:01,427-Speed 13097.15 samples/sec Loss 7.5723 LearningRate 0.1428 Epoch: 7 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:40:02,978-Speed 13221.12 samples/sec Loss 7.4398 LearningRate 0.1428 Epoch: 7 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:40:04,549-Speed 13041.91 samples/sec Loss 7.5808 LearningRate 0.1427 Epoch: 7 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:40:06,135-Speed 12924.41 samples/sec Loss 7.5946 LearningRate 0.1427 Epoch: 7 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:40:07,696-Speed 13129.95 samples/sec Loss 7.4483 LearningRate 0.1427 Epoch: 7 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:40:09,283-Speed 12912.06 samples/sec Loss 7.5855 LearningRate 0.1426 Epoch: 7 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:40:31,607-[lfw][20000]XNorm: 12.454200 Training: 2022-01-14 14:40:31,608-[lfw][20000]Accuracy-Flip: 0.99483+-0.00345 Training: 2022-01-14 14:40:31,608-[lfw][20000]Accuracy-Highest: 0.99483 Training: 2022-01-14 14:40:56,935-[cfp_fp][20000]XNorm: 10.497322 Training: 2022-01-14 14:40:56,936-[cfp_fp][20000]Accuracy-Flip: 0.94386+-0.01169 Training: 2022-01-14 14:40:56,937-[cfp_fp][20000]Accuracy-Highest: 0.94386 Training: 2022-01-14 14:41:18,894-[agedb_30][20000]XNorm: 12.081866 Training: 2022-01-14 14:41:18,895-[agedb_30][20000]Accuracy-Flip: 0.95417+-0.01003 Training: 2022-01-14 14:41:18,896-[agedb_30][20000]Accuracy-Highest: 0.95417 Training: 2022-01-14 14:41:20,488-Speed 287.62 samples/sec Loss 7.5538 LearningRate 0.1426 Epoch: 7 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:22,112-Speed 12625.20 samples/sec Loss 7.5066 LearningRate 0.1426 Epoch: 7 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:23,678-Speed 13085.96 samples/sec Loss 7.5606 LearningRate 0.1425 Epoch: 7 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:25,258-Speed 12970.88 samples/sec Loss 7.4688 LearningRate 0.1425 Epoch: 7 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:26,839-Speed 12961.38 samples/sec Loss 7.5004 LearningRate 0.1425 Epoch: 7 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:28,390-Speed 13233.01 samples/sec Loss 7.5807 LearningRate 0.1424 Epoch: 7 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:29,994-Speed 12781.31 samples/sec Loss 7.5602 LearningRate 0.1424 Epoch: 7 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:31,586-Speed 12875.80 samples/sec Loss 7.5211 LearningRate 0.1424 Epoch: 7 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:41:33,141-Speed 13179.36 samples/sec Loss 7.5226 LearningRate 0.1423 Epoch: 7 Global Step: 20090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:34,723-Speed 12964.32 samples/sec Loss 7.4151 LearningRate 0.1423 Epoch: 7 Global Step: 20100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:36,286-Speed 13115.58 samples/sec Loss 7.4726 LearningRate 0.1423 Epoch: 7 Global Step: 20110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:37,866-Speed 12968.32 samples/sec Loss 7.4354 LearningRate 0.1422 Epoch: 7 Global Step: 20120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:39,451-Speed 12930.07 samples/sec Loss 7.4916 LearningRate 0.1422 Epoch: 7 Global Step: 20130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:41,040-Speed 12899.27 samples/sec Loss 7.5297 LearningRate 0.1422 Epoch: 7 Global Step: 20140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:42,622-Speed 12958.51 samples/sec Loss 7.4469 LearningRate 0.1421 Epoch: 7 Global Step: 20150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:44,193-Speed 13049.72 samples/sec Loss 7.4769 LearningRate 0.1421 Epoch: 7 Global Step: 20160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:45,774-Speed 12962.09 samples/sec Loss 7.5368 LearningRate 0.1420 Epoch: 7 Global Step: 20170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:47,372-Speed 12823.18 samples/sec Loss 7.4741 LearningRate 0.1420 Epoch: 7 Global Step: 20180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 14:41:48,960-Speed 12911.99 samples/sec Loss 7.5059 LearningRate 0.1420 Epoch: 7 Global Step: 20190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:41:50,539-Speed 12980.28 samples/sec Loss 7.4467 LearningRate 0.1419 Epoch: 7 Global Step: 20200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:41:52,133-Speed 12853.63 samples/sec Loss 7.5937 LearningRate 0.1419 Epoch: 7 Global Step: 20210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:41:53,760-Speed 12601.23 samples/sec Loss 7.4855 LearningRate 0.1419 Epoch: 7 Global Step: 20220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:41:55,244-Speed 13807.64 samples/sec Loss 7.4578 LearningRate 0.1418 Epoch: 7 Global Step: 20230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:10,520-Speed 1340.75 samples/sec Loss 6.7695 LearningRate 0.1418 Epoch: 8 Global Step: 20240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:12,149-Speed 12582.66 samples/sec Loss 6.6184 LearningRate 0.1418 Epoch: 8 Global Step: 20250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:13,786-Speed 12521.87 samples/sec Loss 6.5164 LearningRate 0.1417 Epoch: 8 Global Step: 20260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:15,367-Speed 12961.79 samples/sec Loss 6.6340 LearningRate 0.1417 Epoch: 8 Global Step: 20270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:16,939-Speed 13038.87 samples/sec Loss 6.5736 LearningRate 0.1417 Epoch: 8 Global Step: 20280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:18,534-Speed 12841.84 samples/sec Loss 6.6563 LearningRate 0.1416 Epoch: 8 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:20,111-Speed 12999.73 samples/sec Loss 6.7171 LearningRate 0.1416 Epoch: 8 Global Step: 20300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:21,683-Speed 13038.02 samples/sec Loss 6.7461 LearningRate 0.1416 Epoch: 8 Global Step: 20310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:23,263-Speed 12973.22 samples/sec Loss 6.7397 LearningRate 0.1415 Epoch: 8 Global Step: 20320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:24,868-Speed 12763.55 samples/sec Loss 6.8267 LearningRate 0.1415 Epoch: 8 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:26,460-Speed 12893.80 samples/sec Loss 6.8677 LearningRate 0.1414 Epoch: 8 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:28,025-Speed 13092.52 samples/sec Loss 6.7214 LearningRate 0.1414 Epoch: 8 Global Step: 20350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:29,609-Speed 12937.93 samples/sec Loss 6.8133 LearningRate 0.1414 Epoch: 8 Global Step: 20360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:31,215-Speed 12779.03 samples/sec Loss 6.6878 LearningRate 0.1413 Epoch: 8 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:32,785-Speed 13073.32 samples/sec Loss 6.7659 LearningRate 0.1413 Epoch: 8 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:42:34,378-Speed 12860.23 samples/sec Loss 6.7664 LearningRate 0.1413 Epoch: 8 Global Step: 20390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:35,988-Speed 12725.88 samples/sec Loss 6.7918 LearningRate 0.1412 Epoch: 8 Global Step: 20400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:37,562-Speed 13025.36 samples/sec Loss 6.7136 LearningRate 0.1412 Epoch: 8 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:39,141-Speed 12972.03 samples/sec Loss 6.8770 LearningRate 0.1412 Epoch: 8 Global Step: 20420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:40,740-Speed 12818.11 samples/sec Loss 6.7877 LearningRate 0.1411 Epoch: 8 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:42,298-Speed 13156.28 samples/sec Loss 6.8849 LearningRate 0.1411 Epoch: 8 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:43,881-Speed 12945.42 samples/sec Loss 6.8909 LearningRate 0.1411 Epoch: 8 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:45,478-Speed 12835.18 samples/sec Loss 6.9337 LearningRate 0.1410 Epoch: 8 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:47,046-Speed 13072.13 samples/sec Loss 6.9169 LearningRate 0.1410 Epoch: 8 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:48,670-Speed 12633.24 samples/sec Loss 6.8853 LearningRate 0.1410 Epoch: 8 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:50,269-Speed 12819.09 samples/sec Loss 6.9111 LearningRate 0.1409 Epoch: 8 Global Step: 20490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:42:51,834-Speed 13090.13 samples/sec Loss 6.8394 LearningRate 0.1409 Epoch: 8 Global Step: 20500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:42:53,380-Speed 13261.82 samples/sec Loss 6.9399 LearningRate 0.1409 Epoch: 8 Global Step: 20510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:54,954-Speed 13019.79 samples/sec Loss 6.8413 LearningRate 0.1408 Epoch: 8 Global Step: 20520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:56,511-Speed 13164.65 samples/sec Loss 6.9566 LearningRate 0.1408 Epoch: 8 Global Step: 20530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:58,099-Speed 12902.96 samples/sec Loss 6.9275 LearningRate 0.1408 Epoch: 8 Global Step: 20540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:42:59,708-Speed 12737.28 samples/sec Loss 7.0284 LearningRate 0.1407 Epoch: 8 Global Step: 20550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:43:01,286-Speed 12984.21 samples/sec Loss 6.9563 LearningRate 0.1407 Epoch: 8 Global Step: 20560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:43:02,849-Speed 13116.55 samples/sec Loss 6.9933 LearningRate 0.1406 Epoch: 8 Global Step: 20570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:43:04,433-Speed 12940.05 samples/sec Loss 6.8894 LearningRate 0.1406 Epoch: 8 Global Step: 20580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:43:05,997-Speed 13098.24 samples/sec Loss 6.9252 LearningRate 0.1406 Epoch: 8 Global Step: 20590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:43:07,552-Speed 13178.66 samples/sec Loss 7.0614 LearningRate 0.1405 Epoch: 8 Global Step: 20600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:43:09,130-Speed 12985.04 samples/sec Loss 6.9461 LearningRate 0.1405 Epoch: 8 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:10,714-Speed 12937.16 samples/sec Loss 7.0099 LearningRate 0.1405 Epoch: 8 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:12,307-Speed 12865.65 samples/sec Loss 7.1029 LearningRate 0.1404 Epoch: 8 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:13,876-Speed 13066.65 samples/sec Loss 6.9986 LearningRate 0.1404 Epoch: 8 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:15,474-Speed 12824.98 samples/sec Loss 7.1131 LearningRate 0.1404 Epoch: 8 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:17,050-Speed 13001.18 samples/sec Loss 7.1419 LearningRate 0.1403 Epoch: 8 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:18,640-Speed 12890.51 samples/sec Loss 7.0547 LearningRate 0.1403 Epoch: 8 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:20,228-Speed 12907.29 samples/sec Loss 7.1232 LearningRate 0.1403 Epoch: 8 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:21,804-Speed 13003.69 samples/sec Loss 7.0527 LearningRate 0.1402 Epoch: 8 Global Step: 20690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:23,386-Speed 12954.44 samples/sec Loss 7.0730 LearningRate 0.1402 Epoch: 8 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:24,944-Speed 13157.41 samples/sec Loss 7.1159 LearningRate 0.1402 Epoch: 8 Global Step: 20710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:26,513-Speed 13061.94 samples/sec Loss 7.1130 LearningRate 0.1401 Epoch: 8 Global Step: 20720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:28,070-Speed 13167.96 samples/sec Loss 7.1202 LearningRate 0.1401 Epoch: 8 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:29,662-Speed 12881.83 samples/sec Loss 7.0663 LearningRate 0.1401 Epoch: 8 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:31,270-Speed 12740.81 samples/sec Loss 7.0830 LearningRate 0.1400 Epoch: 8 Global Step: 20750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:32,848-Speed 12987.21 samples/sec Loss 7.0820 LearningRate 0.1400 Epoch: 8 Global Step: 20760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:34,419-Speed 13047.06 samples/sec Loss 7.0994 LearningRate 0.1399 Epoch: 8 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:35,973-Speed 13187.98 samples/sec Loss 7.1583 LearningRate 0.1399 Epoch: 8 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:37,562-Speed 12898.25 samples/sec Loss 7.0337 LearningRate 0.1399 Epoch: 8 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:39,134-Speed 13038.81 samples/sec Loss 7.1742 LearningRate 0.1398 Epoch: 8 Global Step: 20800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:40,718-Speed 12936.57 samples/sec Loss 7.1540 LearningRate 0.1398 Epoch: 8 Global Step: 20810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:42,316-Speed 12824.17 samples/sec Loss 7.1931 LearningRate 0.1398 Epoch: 8 Global Step: 20820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:43,873-Speed 13163.82 samples/sec Loss 7.1560 LearningRate 0.1397 Epoch: 8 Global Step: 20830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:45,486-Speed 12701.42 samples/sec Loss 7.0931 LearningRate 0.1397 Epoch: 8 Global Step: 20840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:47,077-Speed 12881.76 samples/sec Loss 7.0626 LearningRate 0.1397 Epoch: 8 Global Step: 20850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:48,648-Speed 13046.15 samples/sec Loss 7.2360 LearningRate 0.1396 Epoch: 8 Global Step: 20860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:50,233-Speed 13033.34 samples/sec Loss 7.2085 LearningRate 0.1396 Epoch: 8 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:51,811-Speed 12994.81 samples/sec Loss 7.1593 LearningRate 0.1396 Epoch: 8 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:53,379-Speed 13069.57 samples/sec Loss 7.3063 LearningRate 0.1395 Epoch: 8 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:43:54,968-Speed 12896.70 samples/sec Loss 7.2255 LearningRate 0.1395 Epoch: 8 Global Step: 20900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:56,533-Speed 13101.98 samples/sec Loss 7.1665 LearningRate 0.1395 Epoch: 8 Global Step: 20910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:58,092-Speed 13152.34 samples/sec Loss 7.2515 LearningRate 0.1394 Epoch: 8 Global Step: 20920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:43:59,675-Speed 12945.39 samples/sec Loss 7.1477 LearningRate 0.1394 Epoch: 8 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:01,279-Speed 12781.33 samples/sec Loss 7.2013 LearningRate 0.1394 Epoch: 8 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:02,832-Speed 13192.27 samples/sec Loss 7.1335 LearningRate 0.1393 Epoch: 8 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:04,430-Speed 12827.73 samples/sec Loss 7.1699 LearningRate 0.1393 Epoch: 8 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:05,991-Speed 13130.49 samples/sec Loss 7.2383 LearningRate 0.1393 Epoch: 8 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:07,553-Speed 13118.94 samples/sec Loss 7.2073 LearningRate 0.1392 Epoch: 8 Global Step: 20980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:09,145-Speed 12875.02 samples/sec Loss 7.2829 LearningRate 0.1392 Epoch: 8 Global Step: 20990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:10,713-Speed 13076.64 samples/sec Loss 7.2236 LearningRate 0.1391 Epoch: 8 Global Step: 21000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:12,278-Speed 13095.21 samples/sec Loss 7.1837 LearningRate 0.1391 Epoch: 8 Global Step: 21010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:13,846-Speed 13069.29 samples/sec Loss 7.1528 LearningRate 0.1391 Epoch: 8 Global Step: 21020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:15,430-Speed 12933.43 samples/sec Loss 7.2149 LearningRate 0.1390 Epoch: 8 Global Step: 21030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:16,996-Speed 13085.36 samples/sec Loss 7.2732 LearningRate 0.1390 Epoch: 8 Global Step: 21040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:18,569-Speed 13028.85 samples/sec Loss 7.2022 LearningRate 0.1390 Epoch: 8 Global Step: 21050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:20,117-Speed 13244.88 samples/sec Loss 7.2256 LearningRate 0.1389 Epoch: 8 Global Step: 21060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:21,687-Speed 13050.25 samples/sec Loss 7.1609 LearningRate 0.1389 Epoch: 8 Global Step: 21070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:44:23,297-Speed 12731.23 samples/sec Loss 7.2168 LearningRate 0.1389 Epoch: 8 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:24,875-Speed 12992.36 samples/sec Loss 7.1002 LearningRate 0.1388 Epoch: 8 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:26,436-Speed 13126.52 samples/sec Loss 7.1983 LearningRate 0.1388 Epoch: 8 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:28,022-Speed 12919.66 samples/sec Loss 7.1977 LearningRate 0.1388 Epoch: 8 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:29,607-Speed 12931.88 samples/sec Loss 7.2468 LearningRate 0.1387 Epoch: 8 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:31,175-Speed 13067.89 samples/sec Loss 7.2114 LearningRate 0.1387 Epoch: 8 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:32,803-Speed 12584.68 samples/sec Loss 7.2481 LearningRate 0.1387 Epoch: 8 Global Step: 21140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:34,386-Speed 12955.30 samples/sec Loss 7.2695 LearningRate 0.1386 Epoch: 8 Global Step: 21150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:35,941-Speed 13176.35 samples/sec Loss 7.2181 LearningRate 0.1386 Epoch: 8 Global Step: 21160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:37,538-Speed 12835.42 samples/sec Loss 7.1614 LearningRate 0.1386 Epoch: 8 Global Step: 21170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:39,113-Speed 13010.41 samples/sec Loss 7.2894 LearningRate 0.1385 Epoch: 8 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:40,679-Speed 13116.07 samples/sec Loss 7.2719 LearningRate 0.1385 Epoch: 8 Global Step: 21190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:42,256-Speed 12995.18 samples/sec Loss 7.2355 LearningRate 0.1385 Epoch: 8 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:43,837-Speed 12963.67 samples/sec Loss 7.2793 LearningRate 0.1384 Epoch: 8 Global Step: 21210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:45,409-Speed 13031.27 samples/sec Loss 7.3291 LearningRate 0.1384 Epoch: 8 Global Step: 21220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:46,984-Speed 13013.22 samples/sec Loss 7.3132 LearningRate 0.1384 Epoch: 8 Global Step: 21230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:48,577-Speed 12863.61 samples/sec Loss 7.2744 LearningRate 0.1383 Epoch: 8 Global Step: 21240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:50,153-Speed 13016.89 samples/sec Loss 7.2498 LearningRate 0.1383 Epoch: 8 Global Step: 21250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:51,745-Speed 12894.25 samples/sec Loss 7.2295 LearningRate 0.1382 Epoch: 8 Global Step: 21260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:44:53,295-Speed 13218.96 samples/sec Loss 7.3083 LearningRate 0.1382 Epoch: 8 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:54,892-Speed 12840.64 samples/sec Loss 7.2403 LearningRate 0.1382 Epoch: 8 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:56,453-Speed 13124.31 samples/sec Loss 7.2481 LearningRate 0.1381 Epoch: 8 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:58,027-Speed 13024.30 samples/sec Loss 7.1455 LearningRate 0.1381 Epoch: 8 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:44:59,597-Speed 13075.71 samples/sec Loss 7.3919 LearningRate 0.1381 Epoch: 8 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:01,187-Speed 12906.43 samples/sec Loss 7.1861 LearningRate 0.1380 Epoch: 8 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:02,759-Speed 13037.15 samples/sec Loss 7.2371 LearningRate 0.1380 Epoch: 8 Global Step: 21330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:04,343-Speed 12943.08 samples/sec Loss 7.3592 LearningRate 0.1380 Epoch: 8 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:05,931-Speed 12900.48 samples/sec Loss 7.2363 LearningRate 0.1379 Epoch: 8 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:07,504-Speed 13028.39 samples/sec Loss 7.2946 LearningRate 0.1379 Epoch: 8 Global Step: 21360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:09,083-Speed 13000.17 samples/sec Loss 7.3337 LearningRate 0.1379 Epoch: 8 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:10,687-Speed 12776.72 samples/sec Loss 7.2772 LearningRate 0.1378 Epoch: 8 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:12,270-Speed 12946.83 samples/sec Loss 7.3263 LearningRate 0.1378 Epoch: 8 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:13,837-Speed 13085.70 samples/sec Loss 7.2869 LearningRate 0.1378 Epoch: 8 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:15,415-Speed 12985.24 samples/sec Loss 7.2975 LearningRate 0.1377 Epoch: 8 Global Step: 21410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:16,999-Speed 12937.73 samples/sec Loss 7.2367 LearningRate 0.1377 Epoch: 8 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:18,588-Speed 12894.04 samples/sec Loss 7.2551 LearningRate 0.1377 Epoch: 8 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:20,161-Speed 13026.92 samples/sec Loss 7.3774 LearningRate 0.1376 Epoch: 8 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:45:21,730-Speed 13065.36 samples/sec Loss 7.2989 LearningRate 0.1376 Epoch: 8 Global Step: 21450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:23,303-Speed 13030.40 samples/sec Loss 7.3192 LearningRate 0.1376 Epoch: 8 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:24,880-Speed 12995.26 samples/sec Loss 7.3314 LearningRate 0.1375 Epoch: 8 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:26,456-Speed 13007.15 samples/sec Loss 7.2764 LearningRate 0.1375 Epoch: 8 Global Step: 21480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:28,031-Speed 13011.45 samples/sec Loss 7.2716 LearningRate 0.1375 Epoch: 8 Global Step: 21490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:29,617-Speed 12921.05 samples/sec Loss 7.2426 LearningRate 0.1374 Epoch: 8 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:31,199-Speed 12948.35 samples/sec Loss 7.2521 LearningRate 0.1374 Epoch: 8 Global Step: 21510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:32,760-Speed 13132.71 samples/sec Loss 7.2706 LearningRate 0.1373 Epoch: 8 Global Step: 21520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:34,339-Speed 12978.78 samples/sec Loss 7.2397 LearningRate 0.1373 Epoch: 8 Global Step: 21530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:35,925-Speed 12918.74 samples/sec Loss 7.2150 LearningRate 0.1373 Epoch: 8 Global Step: 21540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:37,517-Speed 12873.31 samples/sec Loss 7.3961 LearningRate 0.1372 Epoch: 8 Global Step: 21550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:39,091-Speed 13027.75 samples/sec Loss 7.2685 LearningRate 0.1372 Epoch: 8 Global Step: 21560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:40,685-Speed 12852.06 samples/sec Loss 7.2102 LearningRate 0.1372 Epoch: 8 Global Step: 21570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:42,243-Speed 13153.93 samples/sec Loss 7.2684 LearningRate 0.1371 Epoch: 8 Global Step: 21580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:43,827-Speed 12981.10 samples/sec Loss 7.3421 LearningRate 0.1371 Epoch: 8 Global Step: 21590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:45,442-Speed 12686.21 samples/sec Loss 7.3514 LearningRate 0.1371 Epoch: 8 Global Step: 21600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:47,017-Speed 13017.86 samples/sec Loss 7.2415 LearningRate 0.1370 Epoch: 8 Global Step: 21610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:45:48,583-Speed 13084.16 samples/sec Loss 7.2204 LearningRate 0.1370 Epoch: 8 Global Step: 21620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:50,156-Speed 13028.72 samples/sec Loss 7.2654 LearningRate 0.1370 Epoch: 8 Global Step: 21630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:51,735-Speed 12981.39 samples/sec Loss 7.2734 LearningRate 0.1369 Epoch: 8 Global Step: 21640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:53,326-Speed 12878.63 samples/sec Loss 7.3137 LearningRate 0.1369 Epoch: 8 Global Step: 21650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:54,874-Speed 13274.85 samples/sec Loss 7.3336 LearningRate 0.1369 Epoch: 8 Global Step: 21660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:56,455-Speed 12955.84 samples/sec Loss 7.2783 LearningRate 0.1368 Epoch: 8 Global Step: 21670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:58,019-Speed 13108.71 samples/sec Loss 7.3323 LearningRate 0.1368 Epoch: 8 Global Step: 21680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:45:59,598-Speed 13009.58 samples/sec Loss 7.2431 LearningRate 0.1368 Epoch: 8 Global Step: 21690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:01,155-Speed 13157.42 samples/sec Loss 7.3000 LearningRate 0.1367 Epoch: 8 Global Step: 21700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:02,716-Speed 13130.23 samples/sec Loss 7.2807 LearningRate 0.1367 Epoch: 8 Global Step: 21710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:04,303-Speed 12918.66 samples/sec Loss 7.3382 LearningRate 0.1367 Epoch: 8 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:05,889-Speed 12922.82 samples/sec Loss 7.3557 LearningRate 0.1366 Epoch: 8 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:07,449-Speed 13133.44 samples/sec Loss 7.3529 LearningRate 0.1366 Epoch: 8 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:09,009-Speed 13140.54 samples/sec Loss 7.3602 LearningRate 0.1366 Epoch: 8 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:10,609-Speed 12811.37 samples/sec Loss 7.2458 LearningRate 0.1365 Epoch: 8 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:12,169-Speed 13141.15 samples/sec Loss 7.3121 LearningRate 0.1365 Epoch: 8 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:13,764-Speed 12842.32 samples/sec Loss 7.3634 LearningRate 0.1365 Epoch: 8 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:15,350-Speed 12930.21 samples/sec Loss 7.2738 LearningRate 0.1364 Epoch: 8 Global Step: 21790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:16,916-Speed 13079.04 samples/sec Loss 7.3238 LearningRate 0.1364 Epoch: 8 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:18,478-Speed 13119.50 samples/sec Loss 7.2555 LearningRate 0.1364 Epoch: 8 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:20,056-Speed 12987.08 samples/sec Loss 7.3152 LearningRate 0.1363 Epoch: 8 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:21,623-Speed 13083.56 samples/sec Loss 7.3284 LearningRate 0.1363 Epoch: 8 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:23,209-Speed 12914.46 samples/sec Loss 7.3833 LearningRate 0.1362 Epoch: 8 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:24,792-Speed 12951.14 samples/sec Loss 7.3675 LearningRate 0.1362 Epoch: 8 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:26,351-Speed 13142.62 samples/sec Loss 7.3584 LearningRate 0.1362 Epoch: 8 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:27,950-Speed 12820.40 samples/sec Loss 7.3902 LearningRate 0.1361 Epoch: 8 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:29,548-Speed 12996.04 samples/sec Loss 7.4085 LearningRate 0.1361 Epoch: 8 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:31,112-Speed 13106.93 samples/sec Loss 7.3168 LearningRate 0.1361 Epoch: 8 Global Step: 21890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:46:32,673-Speed 13132.75 samples/sec Loss 7.2832 LearningRate 0.1360 Epoch: 8 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:34,271-Speed 12870.70 samples/sec Loss 7.3324 LearningRate 0.1360 Epoch: 8 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:35,865-Speed 12850.98 samples/sec Loss 7.2235 LearningRate 0.1360 Epoch: 8 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:37,435-Speed 13053.85 samples/sec Loss 7.3015 LearningRate 0.1359 Epoch: 8 Global Step: 21930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:39,000-Speed 13092.24 samples/sec Loss 7.2108 LearningRate 0.1359 Epoch: 8 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:40,603-Speed 12791.11 samples/sec Loss 7.2361 LearningRate 0.1359 Epoch: 8 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:42,179-Speed 12995.58 samples/sec Loss 7.2383 LearningRate 0.1358 Epoch: 8 Global Step: 21960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:43,757-Speed 12983.92 samples/sec Loss 7.2598 LearningRate 0.1358 Epoch: 8 Global Step: 21970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:45,346-Speed 12909.55 samples/sec Loss 7.3385 LearningRate 0.1358 Epoch: 8 Global Step: 21980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:46,916-Speed 13049.48 samples/sec Loss 7.2560 LearningRate 0.1357 Epoch: 8 Global Step: 21990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:48,482-Speed 13087.57 samples/sec Loss 7.3460 LearningRate 0.1357 Epoch: 8 Global Step: 22000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:50,056-Speed 13024.66 samples/sec Loss 7.3313 LearningRate 0.1357 Epoch: 8 Global Step: 22010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:51,628-Speed 13039.37 samples/sec Loss 7.4027 LearningRate 0.1356 Epoch: 8 Global Step: 22020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:53,242-Speed 12721.02 samples/sec Loss 7.3032 LearningRate 0.1356 Epoch: 8 Global Step: 22030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:54,795-Speed 13196.46 samples/sec Loss 7.2703 LearningRate 0.1356 Epoch: 8 Global Step: 22040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:56,358-Speed 13110.20 samples/sec Loss 7.2314 LearningRate 0.1355 Epoch: 8 Global Step: 22050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:57,925-Speed 13075.56 samples/sec Loss 7.2916 LearningRate 0.1355 Epoch: 8 Global Step: 22060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:46:59,498-Speed 13039.00 samples/sec Loss 7.2851 LearningRate 0.1355 Epoch: 8 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:01,116-Speed 12667.00 samples/sec Loss 7.3486 LearningRate 0.1354 Epoch: 8 Global Step: 22080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:02,709-Speed 12867.04 samples/sec Loss 7.2642 LearningRate 0.1354 Epoch: 8 Global Step: 22090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:04,298-Speed 12897.69 samples/sec Loss 7.2426 LearningRate 0.1354 Epoch: 8 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:05,870-Speed 13033.15 samples/sec Loss 7.3273 LearningRate 0.1353 Epoch: 8 Global Step: 22110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:07,432-Speed 13123.30 samples/sec Loss 7.3933 LearningRate 0.1353 Epoch: 8 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:09,023-Speed 12877.51 samples/sec Loss 7.3309 LearningRate 0.1353 Epoch: 8 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:10,607-Speed 12945.22 samples/sec Loss 7.3371 LearningRate 0.1352 Epoch: 8 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:12,203-Speed 12838.30 samples/sec Loss 7.2772 LearningRate 0.1352 Epoch: 8 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:13,757-Speed 13186.18 samples/sec Loss 7.2912 LearningRate 0.1352 Epoch: 8 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:15,310-Speed 13205.02 samples/sec Loss 7.4040 LearningRate 0.1351 Epoch: 8 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:16,911-Speed 12796.82 samples/sec Loss 7.2056 LearningRate 0.1351 Epoch: 8 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:18,452-Speed 13290.70 samples/sec Loss 7.2928 LearningRate 0.1350 Epoch: 8 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:20,038-Speed 12927.71 samples/sec Loss 7.2415 LearningRate 0.1350 Epoch: 8 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:21,622-Speed 12945.58 samples/sec Loss 7.3148 LearningRate 0.1350 Epoch: 8 Global Step: 22210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:23,227-Speed 12763.95 samples/sec Loss 7.2974 LearningRate 0.1349 Epoch: 8 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:25,061-Speed 11177.12 samples/sec Loss 7.3048 LearningRate 0.1349 Epoch: 8 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:26,608-Speed 13245.41 samples/sec Loss 7.3408 LearningRate 0.1349 Epoch: 8 Global Step: 22240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:28,182-Speed 13014.22 samples/sec Loss 7.2839 LearningRate 0.1348 Epoch: 8 Global Step: 22250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:29,761-Speed 12983.96 samples/sec Loss 7.2501 LearningRate 0.1348 Epoch: 8 Global Step: 22260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:31,335-Speed 13018.34 samples/sec Loss 7.2880 LearningRate 0.1348 Epoch: 8 Global Step: 22270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:32,933-Speed 12823.82 samples/sec Loss 7.2998 LearningRate 0.1347 Epoch: 8 Global Step: 22280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:34,505-Speed 13039.21 samples/sec Loss 7.3966 LearningRate 0.1347 Epoch: 8 Global Step: 22290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:36,086-Speed 12959.40 samples/sec Loss 7.4219 LearningRate 0.1347 Epoch: 8 Global Step: 22300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:37,660-Speed 13020.09 samples/sec Loss 7.2867 LearningRate 0.1346 Epoch: 8 Global Step: 22310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:39,239-Speed 13004.86 samples/sec Loss 7.2388 LearningRate 0.1346 Epoch: 8 Global Step: 22320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:40,796-Speed 13161.15 samples/sec Loss 7.2018 LearningRate 0.1346 Epoch: 8 Global Step: 22330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:42,396-Speed 12825.55 samples/sec Loss 7.3557 LearningRate 0.1345 Epoch: 8 Global Step: 22340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:47:43,974-Speed 12985.98 samples/sec Loss 7.2586 LearningRate 0.1345 Epoch: 8 Global Step: 22350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:45,530-Speed 13168.70 samples/sec Loss 7.2256 LearningRate 0.1345 Epoch: 8 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:47,104-Speed 13021.21 samples/sec Loss 7.3256 LearningRate 0.1344 Epoch: 8 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:48,717-Speed 12703.15 samples/sec Loss 7.3451 LearningRate 0.1344 Epoch: 8 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:50,278-Speed 13138.56 samples/sec Loss 7.3356 LearningRate 0.1344 Epoch: 8 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:51,881-Speed 12783.03 samples/sec Loss 7.2970 LearningRate 0.1343 Epoch: 8 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:53,473-Speed 12867.85 samples/sec Loss 7.2337 LearningRate 0.1343 Epoch: 8 Global Step: 22410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:55,050-Speed 13002.66 samples/sec Loss 7.3882 LearningRate 0.1343 Epoch: 8 Global Step: 22420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:56,625-Speed 13008.27 samples/sec Loss 7.2534 LearningRate 0.1342 Epoch: 8 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:58,214-Speed 12891.08 samples/sec Loss 7.3242 LearningRate 0.1342 Epoch: 8 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:47:59,818-Speed 12785.57 samples/sec Loss 7.2382 LearningRate 0.1342 Epoch: 8 Global Step: 22450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:48:01,386-Speed 13070.99 samples/sec Loss 7.3341 LearningRate 0.1341 Epoch: 8 Global Step: 22460 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:48:02,960-Speed 13020.58 samples/sec Loss 7.3633 LearningRate 0.1341 Epoch: 8 Global Step: 22470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:04,559-Speed 12825.06 samples/sec Loss 7.2562 LearningRate 0.1341 Epoch: 8 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:06,131-Speed 13027.04 samples/sec Loss 7.3132 LearningRate 0.1340 Epoch: 8 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:07,704-Speed 13029.10 samples/sec Loss 7.2950 LearningRate 0.1340 Epoch: 8 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:09,268-Speed 13113.40 samples/sec Loss 7.2555 LearningRate 0.1340 Epoch: 8 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:10,848-Speed 12967.22 samples/sec Loss 7.3232 LearningRate 0.1339 Epoch: 8 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:12,433-Speed 12932.18 samples/sec Loss 7.2330 LearningRate 0.1339 Epoch: 8 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:13,992-Speed 13141.53 samples/sec Loss 7.2641 LearningRate 0.1339 Epoch: 8 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:15,577-Speed 12932.97 samples/sec Loss 7.2318 LearningRate 0.1338 Epoch: 8 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:17,146-Speed 13061.72 samples/sec Loss 7.2679 LearningRate 0.1338 Epoch: 8 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:18,727-Speed 12964.55 samples/sec Loss 7.1803 LearningRate 0.1338 Epoch: 8 Global Step: 22570 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:48:20,306-Speed 12979.95 samples/sec Loss 7.2605 LearningRate 0.1337 Epoch: 8 Global Step: 22580 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:48:21,844-Speed 13321.53 samples/sec Loss 7.2571 LearningRate 0.1337 Epoch: 8 Global Step: 22590 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:48:23,403-Speed 13142.01 samples/sec Loss 7.2444 LearningRate 0.1336 Epoch: 8 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:24,974-Speed 13049.11 samples/sec Loss 7.2996 LearningRate 0.1336 Epoch: 8 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:26,572-Speed 12823.41 samples/sec Loss 7.3603 LearningRate 0.1336 Epoch: 8 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:28,156-Speed 12934.44 samples/sec Loss 7.3256 LearningRate 0.1335 Epoch: 8 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:29,726-Speed 13058.01 samples/sec Loss 7.2730 LearningRate 0.1335 Epoch: 8 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:31,327-Speed 12793.56 samples/sec Loss 7.2037 LearningRate 0.1335 Epoch: 8 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:32,902-Speed 13018.58 samples/sec Loss 7.2836 LearningRate 0.1334 Epoch: 8 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:34,457-Speed 13180.81 samples/sec Loss 7.3087 LearningRate 0.1334 Epoch: 8 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:36,027-Speed 13048.62 samples/sec Loss 7.1850 LearningRate 0.1334 Epoch: 8 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:37,602-Speed 13009.94 samples/sec Loss 7.3292 LearningRate 0.1333 Epoch: 8 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:48:39,160-Speed 13160.23 samples/sec Loss 7.3692 LearningRate 0.1333 Epoch: 8 Global Step: 22700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:48:40,718-Speed 13152.38 samples/sec Loss 7.2069 LearningRate 0.1333 Epoch: 8 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:48:42,278-Speed 13130.59 samples/sec Loss 7.3150 LearningRate 0.1332 Epoch: 8 Global Step: 22720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:48:43,857-Speed 12985.02 samples/sec Loss 7.3342 LearningRate 0.1332 Epoch: 8 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:48:45,443-Speed 12945.92 samples/sec Loss 7.3462 LearningRate 0.1332 Epoch: 8 Global Step: 22740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:48:47,059-Speed 12674.66 samples/sec Loss 7.4035 LearningRate 0.1331 Epoch: 8 Global Step: 22750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:48:48,576-Speed 13506.19 samples/sec Loss 7.2123 LearningRate 0.1331 Epoch: 8 Global Step: 22760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:02,990-Speed 1421.06 samples/sec Loss 6.4666 LearningRate 0.1331 Epoch: 9 Global Step: 22770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:04,576-Speed 12951.55 samples/sec Loss 6.3884 LearningRate 0.1330 Epoch: 9 Global Step: 22780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:06,145-Speed 13057.75 samples/sec Loss 6.4150 LearningRate 0.1330 Epoch: 9 Global Step: 22790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:07,737-Speed 12871.83 samples/sec Loss 6.4763 LearningRate 0.1330 Epoch: 9 Global Step: 22800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:09,302-Speed 13096.91 samples/sec Loss 6.3822 LearningRate 0.1329 Epoch: 9 Global Step: 22810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:10,907-Speed 12773.67 samples/sec Loss 6.3939 LearningRate 0.1329 Epoch: 9 Global Step: 22820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:12,531-Speed 12614.27 samples/sec Loss 6.3549 LearningRate 0.1329 Epoch: 9 Global Step: 22830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:14,106-Speed 13016.76 samples/sec Loss 6.5155 LearningRate 0.1328 Epoch: 9 Global Step: 22840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:15,685-Speed 12982.52 samples/sec Loss 6.4170 LearningRate 0.1328 Epoch: 9 Global Step: 22850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:49:17,258-Speed 13023.06 samples/sec Loss 6.4405 LearningRate 0.1328 Epoch: 9 Global Step: 22860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:18,855-Speed 12836.27 samples/sec Loss 6.4212 LearningRate 0.1327 Epoch: 9 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:20,439-Speed 12942.92 samples/sec Loss 6.5078 LearningRate 0.1327 Epoch: 9 Global Step: 22880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:22,026-Speed 12915.45 samples/sec Loss 6.4971 LearningRate 0.1327 Epoch: 9 Global Step: 22890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:23,658-Speed 12556.30 samples/sec Loss 6.5433 LearningRate 0.1326 Epoch: 9 Global Step: 22900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:25,242-Speed 12943.60 samples/sec Loss 6.5021 LearningRate 0.1326 Epoch: 9 Global Step: 22910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:26,831-Speed 12892.58 samples/sec Loss 6.5243 LearningRate 0.1326 Epoch: 9 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:28,430-Speed 12817.82 samples/sec Loss 6.5926 LearningRate 0.1325 Epoch: 9 Global Step: 22930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:30,027-Speed 12854.38 samples/sec Loss 6.5965 LearningRate 0.1325 Epoch: 9 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:31,599-Speed 13032.26 samples/sec Loss 6.5508 LearningRate 0.1325 Epoch: 9 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:33,797-Speed 9323.82 samples/sec Loss 6.6856 LearningRate 0.1324 Epoch: 9 Global Step: 22960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:49:36,044-Speed 9123.33 samples/sec Loss 6.5734 LearningRate 0.1324 Epoch: 9 Global Step: 22970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:49:38,145-Speed 9753.93 samples/sec Loss 6.7188 LearningRate 0.1324 Epoch: 9 Global Step: 22980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:49:39,721-Speed 13000.79 samples/sec Loss 6.5490 LearningRate 0.1323 Epoch: 9 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:41,293-Speed 13031.82 samples/sec Loss 6.7236 LearningRate 0.1323 Epoch: 9 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:42,878-Speed 12937.17 samples/sec Loss 6.6789 LearningRate 0.1323 Epoch: 9 Global Step: 23010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:44,426-Speed 13234.90 samples/sec Loss 6.6750 LearningRate 0.1322 Epoch: 9 Global Step: 23020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:45,992-Speed 13085.88 samples/sec Loss 6.7151 LearningRate 0.1322 Epoch: 9 Global Step: 23030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:47,604-Speed 12716.21 samples/sec Loss 6.7417 LearningRate 0.1322 Epoch: 9 Global Step: 23040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:49,186-Speed 12953.17 samples/sec Loss 6.7836 LearningRate 0.1321 Epoch: 9 Global Step: 23050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:50,782-Speed 12843.82 samples/sec Loss 6.7882 LearningRate 0.1321 Epoch: 9 Global Step: 23060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:52,404-Speed 12633.51 samples/sec Loss 6.7295 LearningRate 0.1321 Epoch: 9 Global Step: 23070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:53,973-Speed 13063.84 samples/sec Loss 6.6951 LearningRate 0.1320 Epoch: 9 Global Step: 23080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:55,558-Speed 12936.88 samples/sec Loss 6.7000 LearningRate 0.1320 Epoch: 9 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:49:57,106-Speed 13239.01 samples/sec Loss 6.7631 LearningRate 0.1320 Epoch: 9 Global Step: 23100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:49:58,692-Speed 12917.53 samples/sec Loss 6.7373 LearningRate 0.1319 Epoch: 9 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:00,268-Speed 13011.46 samples/sec Loss 6.7211 LearningRate 0.1319 Epoch: 9 Global Step: 23120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:01,863-Speed 12843.68 samples/sec Loss 6.8841 LearningRate 0.1319 Epoch: 9 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:03,426-Speed 13114.99 samples/sec Loss 6.7403 LearningRate 0.1318 Epoch: 9 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:05,014-Speed 12910.64 samples/sec Loss 6.8937 LearningRate 0.1318 Epoch: 9 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:06,568-Speed 13185.31 samples/sec Loss 6.8235 LearningRate 0.1318 Epoch: 9 Global Step: 23160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:08,148-Speed 12970.68 samples/sec Loss 6.7852 LearningRate 0.1317 Epoch: 9 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:09,701-Speed 13196.80 samples/sec Loss 6.8393 LearningRate 0.1317 Epoch: 9 Global Step: 23180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:11,267-Speed 13085.24 samples/sec Loss 6.9301 LearningRate 0.1316 Epoch: 9 Global Step: 23190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:12,872-Speed 12767.13 samples/sec Loss 6.8545 LearningRate 0.1316 Epoch: 9 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:14,466-Speed 12865.87 samples/sec Loss 6.7167 LearningRate 0.1316 Epoch: 9 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:16,027-Speed 13129.99 samples/sec Loss 6.8352 LearningRate 0.1315 Epoch: 9 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:17,605-Speed 12982.63 samples/sec Loss 6.8627 LearningRate 0.1315 Epoch: 9 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:19,166-Speed 13134.54 samples/sec Loss 6.8179 LearningRate 0.1315 Epoch: 9 Global Step: 23240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:20,755-Speed 12891.78 samples/sec Loss 6.8781 LearningRate 0.1314 Epoch: 9 Global Step: 23250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:22,387-Speed 12560.24 samples/sec Loss 6.8937 LearningRate 0.1314 Epoch: 9 Global Step: 23260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:23,964-Speed 12997.26 samples/sec Loss 6.9154 LearningRate 0.1314 Epoch: 9 Global Step: 23270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:25,525-Speed 13124.19 samples/sec Loss 6.9180 LearningRate 0.1313 Epoch: 9 Global Step: 23280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:27,092-Speed 13082.94 samples/sec Loss 6.8641 LearningRate 0.1313 Epoch: 9 Global Step: 23290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:28,661-Speed 13056.89 samples/sec Loss 6.8859 LearningRate 0.1313 Epoch: 9 Global Step: 23300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:30,233-Speed 13046.92 samples/sec Loss 6.8931 LearningRate 0.1312 Epoch: 9 Global Step: 23310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:31,824-Speed 12875.84 samples/sec Loss 6.8791 LearningRate 0.1312 Epoch: 9 Global Step: 23320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:33,423-Speed 12819.08 samples/sec Loss 7.0535 LearningRate 0.1312 Epoch: 9 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:50:35,000-Speed 13000.00 samples/sec Loss 6.9625 LearningRate 0.1311 Epoch: 9 Global Step: 23340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:36,588-Speed 12900.62 samples/sec Loss 6.8562 LearningRate 0.1311 Epoch: 9 Global Step: 23350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:38,146-Speed 13152.73 samples/sec Loss 7.0540 LearningRate 0.1311 Epoch: 9 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:39,727-Speed 12971.95 samples/sec Loss 6.9380 LearningRate 0.1310 Epoch: 9 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:41,299-Speed 13045.64 samples/sec Loss 6.9293 LearningRate 0.1310 Epoch: 9 Global Step: 23380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:42,875-Speed 13003.61 samples/sec Loss 6.9663 LearningRate 0.1310 Epoch: 9 Global Step: 23390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:44,462-Speed 12921.75 samples/sec Loss 6.9224 LearningRate 0.1309 Epoch: 9 Global Step: 23400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:46,031-Speed 13055.41 samples/sec Loss 6.8078 LearningRate 0.1309 Epoch: 9 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:47,624-Speed 12867.12 samples/sec Loss 6.8679 LearningRate 0.1309 Epoch: 9 Global Step: 23420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:49,187-Speed 13119.13 samples/sec Loss 7.0517 LearningRate 0.1308 Epoch: 9 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:50,750-Speed 13112.64 samples/sec Loss 6.9847 LearningRate 0.1308 Epoch: 9 Global Step: 23440 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:50:52,326-Speed 13003.61 samples/sec Loss 6.8796 LearningRate 0.1308 Epoch: 9 Global Step: 23450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:50:53,900-Speed 13018.91 samples/sec Loss 6.9076 LearningRate 0.1307 Epoch: 9 Global Step: 23460 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:50:55,463-Speed 13119.62 samples/sec Loss 6.8836 LearningRate 0.1307 Epoch: 9 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:57,069-Speed 12757.44 samples/sec Loss 6.8757 LearningRate 0.1307 Epoch: 9 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:50:58,645-Speed 12999.92 samples/sec Loss 6.9782 LearningRate 0.1306 Epoch: 9 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:51:00,221-Speed 13047.01 samples/sec Loss 6.9853 LearningRate 0.1306 Epoch: 9 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:51:01,827-Speed 12760.04 samples/sec Loss 7.0075 LearningRate 0.1306 Epoch: 9 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:51:03,399-Speed 13035.86 samples/sec Loss 6.8677 LearningRate 0.1305 Epoch: 9 Global Step: 23520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:04,983-Speed 12966.39 samples/sec Loss 6.9904 LearningRate 0.1305 Epoch: 9 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:06,567-Speed 12937.24 samples/sec Loss 7.1214 LearningRate 0.1305 Epoch: 9 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:08,119-Speed 13207.75 samples/sec Loss 7.0830 LearningRate 0.1304 Epoch: 9 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:09,701-Speed 12959.73 samples/sec Loss 7.0515 LearningRate 0.1304 Epoch: 9 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:11,257-Speed 13171.96 samples/sec Loss 6.8972 LearningRate 0.1304 Epoch: 9 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:12,816-Speed 13143.17 samples/sec Loss 6.9857 LearningRate 0.1303 Epoch: 9 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:14,375-Speed 13142.40 samples/sec Loss 6.9964 LearningRate 0.1303 Epoch: 9 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:15,942-Speed 13082.83 samples/sec Loss 7.0093 LearningRate 0.1303 Epoch: 9 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:17,476-Speed 13358.09 samples/sec Loss 7.0004 LearningRate 0.1302 Epoch: 9 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:19,068-Speed 12868.35 samples/sec Loss 6.9448 LearningRate 0.1302 Epoch: 9 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:51:20,635-Speed 13085.89 samples/sec Loss 6.9908 LearningRate 0.1302 Epoch: 9 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:51:22,186-Speed 13207.40 samples/sec Loss 6.9655 LearningRate 0.1301 Epoch: 9 Global Step: 23640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:23,766-Speed 12968.72 samples/sec Loss 6.9479 LearningRate 0.1301 Epoch: 9 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:25,367-Speed 12805.35 samples/sec Loss 7.0602 LearningRate 0.1301 Epoch: 9 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:26,930-Speed 13109.77 samples/sec Loss 6.9660 LearningRate 0.1300 Epoch: 9 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:28,519-Speed 12898.90 samples/sec Loss 7.0535 LearningRate 0.1300 Epoch: 9 Global Step: 23680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:30,100-Speed 12962.60 samples/sec Loss 6.9694 LearningRate 0.1300 Epoch: 9 Global Step: 23690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:31,694-Speed 12853.86 samples/sec Loss 7.0616 LearningRate 0.1299 Epoch: 9 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:33,297-Speed 12784.17 samples/sec Loss 7.0265 LearningRate 0.1299 Epoch: 9 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:34,873-Speed 13004.34 samples/sec Loss 7.0042 LearningRate 0.1299 Epoch: 9 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:36,425-Speed 13207.38 samples/sec Loss 7.0777 LearningRate 0.1298 Epoch: 9 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:38,011-Speed 12927.23 samples/sec Loss 7.1089 LearningRate 0.1298 Epoch: 9 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:51:39,589-Speed 12991.86 samples/sec Loss 7.0533 LearningRate 0.1298 Epoch: 9 Global Step: 23750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:41,163-Speed 13019.89 samples/sec Loss 6.9933 LearningRate 0.1297 Epoch: 9 Global Step: 23760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:42,752-Speed 12895.02 samples/sec Loss 7.0476 LearningRate 0.1297 Epoch: 9 Global Step: 23770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:44,329-Speed 12997.13 samples/sec Loss 7.0906 LearningRate 0.1297 Epoch: 9 Global Step: 23780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:45,899-Speed 13054.09 samples/sec Loss 7.0276 LearningRate 0.1296 Epoch: 9 Global Step: 23790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:51:47,479-Speed 12976.25 samples/sec Loss 7.0120 LearningRate 0.1296 Epoch: 9 Global Step: 23800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:51:49,035-Speed 13170.78 samples/sec Loss 7.1335 LearningRate 0.1296 Epoch: 9 Global Step: 23810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:51:50,621-Speed 12921.02 samples/sec Loss 7.1844 LearningRate 0.1295 Epoch: 9 Global Step: 23820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:51:52,196-Speed 13013.64 samples/sec Loss 7.0972 LearningRate 0.1295 Epoch: 9 Global Step: 23830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:51:53,757-Speed 13124.21 samples/sec Loss 6.9736 LearningRate 0.1295 Epoch: 9 Global Step: 23840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:51:55,352-Speed 12856.37 samples/sec Loss 7.0421 LearningRate 0.1294 Epoch: 9 Global Step: 23850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:51:56,930-Speed 12988.60 samples/sec Loss 7.0811 LearningRate 0.1294 Epoch: 9 Global Step: 23860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:51:58,506-Speed 12999.28 samples/sec Loss 7.1291 LearningRate 0.1294 Epoch: 9 Global Step: 23870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:52:00,078-Speed 13037.23 samples/sec Loss 7.0292 LearningRate 0.1293 Epoch: 9 Global Step: 23880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:52:01,690-Speed 12722.40 samples/sec Loss 6.9896 LearningRate 0.1293 Epoch: 9 Global Step: 23890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:52:03,267-Speed 12994.85 samples/sec Loss 7.0630 LearningRate 0.1293 Epoch: 9 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:04,846-Speed 12981.48 samples/sec Loss 7.1385 LearningRate 0.1292 Epoch: 9 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:06,462-Speed 12678.47 samples/sec Loss 7.0042 LearningRate 0.1292 Epoch: 9 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:08,035-Speed 13031.14 samples/sec Loss 7.0406 LearningRate 0.1292 Epoch: 9 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:09,618-Speed 12946.97 samples/sec Loss 7.1093 LearningRate 0.1291 Epoch: 9 Global Step: 23940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:11,190-Speed 13037.69 samples/sec Loss 7.0663 LearningRate 0.1291 Epoch: 9 Global Step: 23950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:12,760-Speed 13057.10 samples/sec Loss 6.9923 LearningRate 0.1291 Epoch: 9 Global Step: 23960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:14,333-Speed 13026.16 samples/sec Loss 7.1397 LearningRate 0.1290 Epoch: 9 Global Step: 23970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:15,958-Speed 12618.21 samples/sec Loss 7.1566 LearningRate 0.1290 Epoch: 9 Global Step: 23980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:17,524-Speed 13085.80 samples/sec Loss 7.0326 LearningRate 0.1290 Epoch: 9 Global Step: 23990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:19,099-Speed 13014.01 samples/sec Loss 7.1150 LearningRate 0.1289 Epoch: 9 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:52:20,676-Speed 12998.51 samples/sec Loss 7.0586 LearningRate 0.1289 Epoch: 9 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:52:22,235-Speed 13148.64 samples/sec Loss 7.1075 LearningRate 0.1289 Epoch: 9 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:23,820-Speed 12931.08 samples/sec Loss 7.0565 LearningRate 0.1288 Epoch: 9 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:25,385-Speed 13103.35 samples/sec Loss 6.9772 LearningRate 0.1288 Epoch: 9 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:26,970-Speed 12926.65 samples/sec Loss 7.1076 LearningRate 0.1288 Epoch: 9 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:28,546-Speed 13030.11 samples/sec Loss 7.1757 LearningRate 0.1287 Epoch: 9 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:30,118-Speed 13042.27 samples/sec Loss 7.0891 LearningRate 0.1287 Epoch: 9 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:31,670-Speed 13206.07 samples/sec Loss 7.1393 LearningRate 0.1287 Epoch: 9 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:33,238-Speed 13062.63 samples/sec Loss 7.0638 LearningRate 0.1286 Epoch: 9 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:34,796-Speed 13164.43 samples/sec Loss 7.0810 LearningRate 0.1286 Epoch: 9 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:36,345-Speed 13226.89 samples/sec Loss 7.0845 LearningRate 0.1286 Epoch: 9 Global Step: 24110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:37,927-Speed 12952.44 samples/sec Loss 7.0885 LearningRate 0.1285 Epoch: 9 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:52:39,501-Speed 13015.86 samples/sec Loss 6.9770 LearningRate 0.1285 Epoch: 9 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:52:41,083-Speed 12957.51 samples/sec Loss 6.9156 LearningRate 0.1285 Epoch: 9 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:52:42,640-Speed 13176.37 samples/sec Loss 7.0100 LearningRate 0.1284 Epoch: 9 Global Step: 24150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:44,217-Speed 12996.79 samples/sec Loss 7.1839 LearningRate 0.1284 Epoch: 9 Global Step: 24160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:45,797-Speed 12974.27 samples/sec Loss 7.0437 LearningRate 0.1284 Epoch: 9 Global Step: 24170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:47,400-Speed 12790.69 samples/sec Loss 7.1221 LearningRate 0.1283 Epoch: 9 Global Step: 24180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:48,971-Speed 13048.26 samples/sec Loss 7.0508 LearningRate 0.1283 Epoch: 9 Global Step: 24190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:50,557-Speed 12924.47 samples/sec Loss 7.0374 LearningRate 0.1283 Epoch: 9 Global Step: 24200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:52,111-Speed 13185.92 samples/sec Loss 7.0660 LearningRate 0.1282 Epoch: 9 Global Step: 24210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:53,675-Speed 13106.84 samples/sec Loss 7.0290 LearningRate 0.1282 Epoch: 9 Global Step: 24220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:55,260-Speed 12928.19 samples/sec Loss 7.0648 LearningRate 0.1282 Epoch: 9 Global Step: 24230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:56,831-Speed 13047.71 samples/sec Loss 7.0890 LearningRate 0.1281 Epoch: 9 Global Step: 24240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:58,397-Speed 13080.15 samples/sec Loss 7.0472 LearningRate 0.1281 Epoch: 9 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:52:59,972-Speed 13018.94 samples/sec Loss 7.0852 LearningRate 0.1281 Epoch: 9 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:01,552-Speed 12969.44 samples/sec Loss 7.1465 LearningRate 0.1280 Epoch: 9 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:03,108-Speed 13173.01 samples/sec Loss 7.1060 LearningRate 0.1280 Epoch: 9 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:04,679-Speed 13052.78 samples/sec Loss 7.1133 LearningRate 0.1280 Epoch: 9 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:06,274-Speed 12846.52 samples/sec Loss 7.0178 LearningRate 0.1279 Epoch: 9 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:07,903-Speed 12580.78 samples/sec Loss 6.9882 LearningRate 0.1279 Epoch: 9 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:09,462-Speed 13141.69 samples/sec Loss 7.0797 LearningRate 0.1279 Epoch: 9 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:11,019-Speed 13165.62 samples/sec Loss 7.0566 LearningRate 0.1278 Epoch: 9 Global Step: 24330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:12,596-Speed 12996.98 samples/sec Loss 7.0819 LearningRate 0.1278 Epoch: 9 Global Step: 24340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:14,166-Speed 13049.49 samples/sec Loss 7.0598 LearningRate 0.1278 Epoch: 9 Global Step: 24350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:15,745-Speed 12987.46 samples/sec Loss 7.0688 LearningRate 0.1277 Epoch: 9 Global Step: 24360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:17,321-Speed 12998.75 samples/sec Loss 7.0447 LearningRate 0.1277 Epoch: 9 Global Step: 24370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:18,901-Speed 12966.73 samples/sec Loss 7.1232 LearningRate 0.1277 Epoch: 9 Global Step: 24380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:20,474-Speed 13035.46 samples/sec Loss 7.1139 LearningRate 0.1276 Epoch: 9 Global Step: 24390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:22,054-Speed 12968.98 samples/sec Loss 7.1326 LearningRate 0.1276 Epoch: 9 Global Step: 24400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:23,690-Speed 12526.10 samples/sec Loss 7.1196 LearningRate 0.1276 Epoch: 9 Global Step: 24410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:25,266-Speed 13005.88 samples/sec Loss 7.1003 LearningRate 0.1275 Epoch: 9 Global Step: 24420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:53:26,844-Speed 12984.68 samples/sec Loss 7.0698 LearningRate 0.1275 Epoch: 9 Global Step: 24430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:28,433-Speed 12898.57 samples/sec Loss 7.0195 LearningRate 0.1275 Epoch: 9 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:29,989-Speed 13174.62 samples/sec Loss 7.0678 LearningRate 0.1274 Epoch: 9 Global Step: 24450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:31,540-Speed 13216.32 samples/sec Loss 7.0581 LearningRate 0.1274 Epoch: 9 Global Step: 24460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:33,158-Speed 12663.63 samples/sec Loss 7.0448 LearningRate 0.1274 Epoch: 9 Global Step: 24470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:34,740-Speed 12958.74 samples/sec Loss 7.1366 LearningRate 0.1273 Epoch: 9 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:36,329-Speed 12898.38 samples/sec Loss 7.1473 LearningRate 0.1273 Epoch: 9 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:37,952-Speed 12625.67 samples/sec Loss 7.0747 LearningRate 0.1273 Epoch: 9 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:39,532-Speed 12967.53 samples/sec Loss 7.1619 LearningRate 0.1272 Epoch: 9 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:41,118-Speed 12921.68 samples/sec Loss 7.0540 LearningRate 0.1272 Epoch: 9 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:42,692-Speed 13024.87 samples/sec Loss 6.9861 LearningRate 0.1272 Epoch: 9 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:53:44,253-Speed 13128.78 samples/sec Loss 7.1649 LearningRate 0.1271 Epoch: 9 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:53:45,832-Speed 12977.76 samples/sec Loss 7.0209 LearningRate 0.1271 Epoch: 9 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:53:47,397-Speed 13093.80 samples/sec Loss 7.0677 LearningRate 0.1271 Epoch: 9 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:53:48,976-Speed 12981.73 samples/sec Loss 7.0129 LearningRate 0.1270 Epoch: 9 Global Step: 24570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:50,544-Speed 13128.22 samples/sec Loss 7.0467 LearningRate 0.1270 Epoch: 9 Global Step: 24580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:52,122-Speed 12980.70 samples/sec Loss 7.1442 LearningRate 0.1270 Epoch: 9 Global Step: 24590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:53,722-Speed 12811.98 samples/sec Loss 7.0058 LearningRate 0.1269 Epoch: 9 Global Step: 24600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:55,295-Speed 13036.18 samples/sec Loss 7.1345 LearningRate 0.1269 Epoch: 9 Global Step: 24610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:56,875-Speed 12970.99 samples/sec Loss 7.1152 LearningRate 0.1269 Epoch: 9 Global Step: 24620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:53:58,451-Speed 12997.09 samples/sec Loss 7.0232 LearningRate 0.1268 Epoch: 9 Global Step: 24630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:00,033-Speed 12960.98 samples/sec Loss 7.0404 LearningRate 0.1268 Epoch: 9 Global Step: 24640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:01,613-Speed 12967.39 samples/sec Loss 7.1201 LearningRate 0.1268 Epoch: 9 Global Step: 24650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:03,204-Speed 12877.62 samples/sec Loss 7.0499 LearningRate 0.1267 Epoch: 9 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:04,760-Speed 13178.34 samples/sec Loss 6.9639 LearningRate 0.1267 Epoch: 9 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:06,327-Speed 13077.60 samples/sec Loss 7.0391 LearningRate 0.1267 Epoch: 9 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:07,909-Speed 12956.92 samples/sec Loss 7.0569 LearningRate 0.1266 Epoch: 9 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:09,518-Speed 12738.96 samples/sec Loss 7.0920 LearningRate 0.1266 Epoch: 9 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:11,066-Speed 13239.65 samples/sec Loss 7.0354 LearningRate 0.1266 Epoch: 9 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:12,650-Speed 12935.63 samples/sec Loss 7.0309 LearningRate 0.1265 Epoch: 9 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:14,207-Speed 13165.04 samples/sec Loss 7.2050 LearningRate 0.1265 Epoch: 9 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:15,772-Speed 13093.78 samples/sec Loss 7.1588 LearningRate 0.1265 Epoch: 9 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:17,327-Speed 13190.70 samples/sec Loss 7.0565 LearningRate 0.1264 Epoch: 9 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:18,899-Speed 13037.31 samples/sec Loss 6.9857 LearningRate 0.1264 Epoch: 9 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:20,499-Speed 12801.78 samples/sec Loss 7.1276 LearningRate 0.1264 Epoch: 9 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:22,063-Speed 13103.75 samples/sec Loss 7.0756 LearningRate 0.1263 Epoch: 9 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:23,632-Speed 13063.19 samples/sec Loss 7.1175 LearningRate 0.1263 Epoch: 9 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:25,206-Speed 13021.95 samples/sec Loss 7.1179 LearningRate 0.1263 Epoch: 9 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:26,772-Speed 13084.92 samples/sec Loss 7.0679 LearningRate 0.1262 Epoch: 9 Global Step: 24810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:28,333-Speed 13130.22 samples/sec Loss 7.0732 LearningRate 0.1262 Epoch: 9 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:29,909-Speed 13008.55 samples/sec Loss 7.1032 LearningRate 0.1262 Epoch: 9 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:31,485-Speed 12997.16 samples/sec Loss 7.1314 LearningRate 0.1261 Epoch: 9 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:33,061-Speed 13010.25 samples/sec Loss 7.1319 LearningRate 0.1261 Epoch: 9 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:34,646-Speed 12932.52 samples/sec Loss 7.1307 LearningRate 0.1261 Epoch: 9 Global Step: 24860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:36,201-Speed 13179.73 samples/sec Loss 7.0857 LearningRate 0.1260 Epoch: 9 Global Step: 24870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:37,777-Speed 13011.32 samples/sec Loss 7.0710 LearningRate 0.1260 Epoch: 9 Global Step: 24880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:39,334-Speed 13161.60 samples/sec Loss 7.1486 LearningRate 0.1260 Epoch: 9 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:40,912-Speed 12988.93 samples/sec Loss 7.1217 LearningRate 0.1259 Epoch: 9 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:42,497-Speed 12930.68 samples/sec Loss 6.9787 LearningRate 0.1259 Epoch: 9 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:44,071-Speed 13016.58 samples/sec Loss 7.0990 LearningRate 0.1259 Epoch: 9 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:45,663-Speed 12876.56 samples/sec Loss 7.1912 LearningRate 0.1258 Epoch: 9 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:47,231-Speed 13071.12 samples/sec Loss 7.1575 LearningRate 0.1258 Epoch: 9 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:48,811-Speed 12967.79 samples/sec Loss 7.1302 LearningRate 0.1258 Epoch: 9 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:54:50,393-Speed 12969.91 samples/sec Loss 7.0979 LearningRate 0.1257 Epoch: 9 Global Step: 24960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:51,982-Speed 12890.14 samples/sec Loss 7.0522 LearningRate 0.1257 Epoch: 9 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:53,601-Speed 12665.23 samples/sec Loss 7.0202 LearningRate 0.1257 Epoch: 9 Global Step: 24980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:55,158-Speed 13161.42 samples/sec Loss 7.0913 LearningRate 0.1256 Epoch: 9 Global Step: 24990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:54:56,706-Speed 13239.90 samples/sec Loss 6.9826 LearningRate 0.1256 Epoch: 9 Global Step: 25000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:55:18,922-[lfw][25000]XNorm: 12.251053 Training: 2022-01-14 14:55:18,923-[lfw][25000]Accuracy-Flip: 0.99383+-0.00373 Training: 2022-01-14 14:55:18,923-[lfw][25000]Accuracy-Highest: 0.99483 Training: 2022-01-14 14:55:44,758-[cfp_fp][25000]XNorm: 10.205331 Training: 2022-01-14 14:55:44,759-[cfp_fp][25000]Accuracy-Flip: 0.94800+-0.01100 Training: 2022-01-14 14:55:44,761-[cfp_fp][25000]Accuracy-Highest: 0.94800 Training: 2022-01-14 14:56:06,799-[agedb_30][25000]XNorm: 11.853433 Training: 2022-01-14 14:56:06,800-[agedb_30][25000]Accuracy-Flip: 0.95800+-0.00846 Training: 2022-01-14 14:56:06,801-[agedb_30][25000]Accuracy-Highest: 0.95800 Training: 2022-01-14 14:56:08,372-Speed 285.77 samples/sec Loss 7.0341 LearningRate 0.1256 Epoch: 9 Global Step: 25010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:56:09,949-Speed 12999.27 samples/sec Loss 7.0910 LearningRate 0.1255 Epoch: 9 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:56:11,537-Speed 12906.47 samples/sec Loss 7.0389 LearningRate 0.1255 Epoch: 9 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:56:13,108-Speed 13046.57 samples/sec Loss 7.0187 LearningRate 0.1255 Epoch: 9 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:14,705-Speed 12845.43 samples/sec Loss 7.0124 LearningRate 0.1254 Epoch: 9 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:16,282-Speed 12996.67 samples/sec Loss 7.0762 LearningRate 0.1254 Epoch: 9 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:17,865-Speed 12949.39 samples/sec Loss 6.9926 LearningRate 0.1254 Epoch: 9 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:19,461-Speed 12838.22 samples/sec Loss 7.1303 LearningRate 0.1253 Epoch: 9 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:21,057-Speed 12844.32 samples/sec Loss 7.0500 LearningRate 0.1253 Epoch: 9 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:22,638-Speed 12965.63 samples/sec Loss 7.1394 LearningRate 0.1253 Epoch: 9 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:24,217-Speed 12973.37 samples/sec Loss 7.1059 LearningRate 0.1252 Epoch: 9 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:25,786-Speed 13062.05 samples/sec Loss 7.1834 LearningRate 0.1252 Epoch: 9 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:27,369-Speed 12951.10 samples/sec Loss 7.1028 LearningRate 0.1252 Epoch: 9 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:28,938-Speed 13061.66 samples/sec Loss 6.9812 LearningRate 0.1251 Epoch: 9 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:56:30,521-Speed 12946.65 samples/sec Loss 7.0755 LearningRate 0.1251 Epoch: 9 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:56:32,099-Speed 12993.11 samples/sec Loss 7.0966 LearningRate 0.1251 Epoch: 9 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:56:33,664-Speed 13093.93 samples/sec Loss 7.1046 LearningRate 0.1250 Epoch: 9 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:35,244-Speed 12964.76 samples/sec Loss 7.1059 LearningRate 0.1250 Epoch: 9 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:36,853-Speed 12765.30 samples/sec Loss 7.0438 LearningRate 0.1250 Epoch: 9 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:38,409-Speed 13170.03 samples/sec Loss 7.1263 LearningRate 0.1249 Epoch: 9 Global Step: 25200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:39,986-Speed 12997.94 samples/sec Loss 7.0439 LearningRate 0.1249 Epoch: 9 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:41,558-Speed 13039.81 samples/sec Loss 7.1300 LearningRate 0.1249 Epoch: 9 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:43,157-Speed 12820.91 samples/sec Loss 7.0430 LearningRate 0.1249 Epoch: 9 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:44,726-Speed 13055.17 samples/sec Loss 7.0420 LearningRate 0.1248 Epoch: 9 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:46,291-Speed 13099.53 samples/sec Loss 7.0582 LearningRate 0.1248 Epoch: 9 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:47,903-Speed 12709.75 samples/sec Loss 7.0848 LearningRate 0.1248 Epoch: 9 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:49,453-Speed 13222.26 samples/sec Loss 6.9848 LearningRate 0.1247 Epoch: 9 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:56:51,149-Speed 12082.58 samples/sec Loss 7.0998 LearningRate 0.1247 Epoch: 9 Global Step: 25280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:56:52,674-Speed 13442.27 samples/sec Loss 7.1053 LearningRate 0.1247 Epoch: 9 Global Step: 25290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:06,572-Speed 1473.83 samples/sec Loss 6.1061 LearningRate 0.1246 Epoch: 10 Global Step: 25300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:08,149-Speed 12993.60 samples/sec Loss 6.2013 LearningRate 0.1246 Epoch: 10 Global Step: 25310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:09,740-Speed 12881.10 samples/sec Loss 6.1418 LearningRate 0.1246 Epoch: 10 Global Step: 25320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:11,342-Speed 12795.50 samples/sec Loss 6.1990 LearningRate 0.1245 Epoch: 10 Global Step: 25330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:12,965-Speed 12621.75 samples/sec Loss 6.2501 LearningRate 0.1245 Epoch: 10 Global Step: 25340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:14,563-Speed 12834.38 samples/sec Loss 6.1407 LearningRate 0.1245 Epoch: 10 Global Step: 25350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:16,134-Speed 13043.95 samples/sec Loss 6.2499 LearningRate 0.1244 Epoch: 10 Global Step: 25360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:17,692-Speed 13158.31 samples/sec Loss 6.2619 LearningRate 0.1244 Epoch: 10 Global Step: 25370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:57:19,268-Speed 12995.02 samples/sec Loss 6.2728 LearningRate 0.1244 Epoch: 10 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:20,848-Speed 12976.03 samples/sec Loss 6.2106 LearningRate 0.1243 Epoch: 10 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:22,420-Speed 13039.69 samples/sec Loss 6.3445 LearningRate 0.1243 Epoch: 10 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:24,002-Speed 12948.06 samples/sec Loss 6.4140 LearningRate 0.1243 Epoch: 10 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:25,585-Speed 12947.28 samples/sec Loss 6.3429 LearningRate 0.1242 Epoch: 10 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:27,136-Speed 13208.43 samples/sec Loss 6.3033 LearningRate 0.1242 Epoch: 10 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:28,725-Speed 12911.28 samples/sec Loss 6.3478 LearningRate 0.1242 Epoch: 10 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:30,282-Speed 13163.82 samples/sec Loss 6.4223 LearningRate 0.1241 Epoch: 10 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:31,880-Speed 12826.14 samples/sec Loss 6.2765 LearningRate 0.1241 Epoch: 10 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:33,436-Speed 13169.76 samples/sec Loss 6.3139 LearningRate 0.1241 Epoch: 10 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:35,004-Speed 13071.53 samples/sec Loss 6.3726 LearningRate 0.1240 Epoch: 10 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:36,580-Speed 13003.29 samples/sec Loss 6.2362 LearningRate 0.1240 Epoch: 10 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:38,138-Speed 13155.56 samples/sec Loss 6.4684 LearningRate 0.1240 Epoch: 10 Global Step: 25500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:39,700-Speed 13122.99 samples/sec Loss 6.5537 LearningRate 0.1239 Epoch: 10 Global Step: 25510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:41,265-Speed 13090.76 samples/sec Loss 6.4364 LearningRate 0.1239 Epoch: 10 Global Step: 25520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:42,859-Speed 12860.28 samples/sec Loss 6.5028 LearningRate 0.1239 Epoch: 10 Global Step: 25530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:44,428-Speed 13062.30 samples/sec Loss 6.4210 LearningRate 0.1238 Epoch: 10 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:45,990-Speed 13123.77 samples/sec Loss 6.4334 LearningRate 0.1238 Epoch: 10 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:47,572-Speed 12953.34 samples/sec Loss 6.3670 LearningRate 0.1238 Epoch: 10 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:49,148-Speed 13006.76 samples/sec Loss 6.5574 LearningRate 0.1237 Epoch: 10 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:57:50,681-Speed 13368.52 samples/sec Loss 6.4837 LearningRate 0.1237 Epoch: 10 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:52,280-Speed 12810.36 samples/sec Loss 6.5520 LearningRate 0.1237 Epoch: 10 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:53,822-Speed 13291.67 samples/sec Loss 6.6322 LearningRate 0.1236 Epoch: 10 Global Step: 25600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:55,386-Speed 13108.68 samples/sec Loss 6.5842 LearningRate 0.1236 Epoch: 10 Global Step: 25610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:56,946-Speed 13135.05 samples/sec Loss 6.6341 LearningRate 0.1236 Epoch: 10 Global Step: 25620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:57:58,523-Speed 12997.81 samples/sec Loss 6.6459 LearningRate 0.1235 Epoch: 10 Global Step: 25630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:00,075-Speed 13207.80 samples/sec Loss 6.5443 LearningRate 0.1235 Epoch: 10 Global Step: 25640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:01,620-Speed 13265.34 samples/sec Loss 6.5032 LearningRate 0.1235 Epoch: 10 Global Step: 25650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:03,214-Speed 12851.18 samples/sec Loss 6.5046 LearningRate 0.1234 Epoch: 10 Global Step: 25660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:04,791-Speed 12999.32 samples/sec Loss 6.6096 LearningRate 0.1234 Epoch: 10 Global Step: 25670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:06,372-Speed 12966.22 samples/sec Loss 6.6209 LearningRate 0.1234 Epoch: 10 Global Step: 25680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:07,961-Speed 12895.34 samples/sec Loss 6.6005 LearningRate 0.1233 Epoch: 10 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:09,523-Speed 13119.92 samples/sec Loss 6.6505 LearningRate 0.1233 Epoch: 10 Global Step: 25700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:11,125-Speed 12791.81 samples/sec Loss 6.5282 LearningRate 0.1233 Epoch: 10 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:12,689-Speed 13111.64 samples/sec Loss 6.6468 LearningRate 0.1232 Epoch: 10 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:14,258-Speed 13060.33 samples/sec Loss 6.6517 LearningRate 0.1232 Epoch: 10 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:15,812-Speed 13181.59 samples/sec Loss 6.6428 LearningRate 0.1232 Epoch: 10 Global Step: 25740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:17,377-Speed 13125.60 samples/sec Loss 6.6479 LearningRate 0.1231 Epoch: 10 Global Step: 25750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:18,960-Speed 12943.96 samples/sec Loss 6.6917 LearningRate 0.1231 Epoch: 10 Global Step: 25760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:20,528-Speed 13071.09 samples/sec Loss 6.6458 LearningRate 0.1231 Epoch: 10 Global Step: 25770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:22,132-Speed 12773.49 samples/sec Loss 6.6504 LearningRate 0.1230 Epoch: 10 Global Step: 25780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:23,685-Speed 13203.14 samples/sec Loss 6.6319 LearningRate 0.1230 Epoch: 10 Global Step: 25790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:25,305-Speed 12652.85 samples/sec Loss 6.6556 LearningRate 0.1230 Epoch: 10 Global Step: 25800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:26,875-Speed 13049.83 samples/sec Loss 6.5834 LearningRate 0.1230 Epoch: 10 Global Step: 25810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:28,433-Speed 13180.24 samples/sec Loss 6.6821 LearningRate 0.1229 Epoch: 10 Global Step: 25820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:30,018-Speed 12918.59 samples/sec Loss 6.5791 LearningRate 0.1229 Epoch: 10 Global Step: 25830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:31,625-Speed 12760.68 samples/sec Loss 6.6581 LearningRate 0.1229 Epoch: 10 Global Step: 25840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:58:33,169-Speed 13281.12 samples/sec Loss 6.7023 LearningRate 0.1228 Epoch: 10 Global Step: 25850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:34,730-Speed 13118.95 samples/sec Loss 6.7017 LearningRate 0.1228 Epoch: 10 Global Step: 25860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:36,285-Speed 13181.79 samples/sec Loss 6.7671 LearningRate 0.1228 Epoch: 10 Global Step: 25870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:37,929-Speed 12472.65 samples/sec Loss 6.7547 LearningRate 0.1227 Epoch: 10 Global Step: 25880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:39,511-Speed 12953.21 samples/sec Loss 6.7351 LearningRate 0.1227 Epoch: 10 Global Step: 25890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:41,079-Speed 13073.46 samples/sec Loss 6.7737 LearningRate 0.1227 Epoch: 10 Global Step: 25900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:42,653-Speed 13020.18 samples/sec Loss 6.7409 LearningRate 0.1226 Epoch: 10 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:44,220-Speed 13080.21 samples/sec Loss 6.6675 LearningRate 0.1226 Epoch: 10 Global Step: 25920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:45,781-Speed 13128.70 samples/sec Loss 6.7338 LearningRate 0.1226 Epoch: 10 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:47,413-Speed 12559.06 samples/sec Loss 6.7273 LearningRate 0.1225 Epoch: 10 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:58:48,974-Speed 13129.38 samples/sec Loss 6.7061 LearningRate 0.1225 Epoch: 10 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:50,545-Speed 13045.20 samples/sec Loss 6.7056 LearningRate 0.1225 Epoch: 10 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:52,137-Speed 12867.25 samples/sec Loss 6.7295 LearningRate 0.1224 Epoch: 10 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:53,707-Speed 13062.84 samples/sec Loss 6.8165 LearningRate 0.1224 Epoch: 10 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:55,306-Speed 12812.26 samples/sec Loss 6.8334 LearningRate 0.1224 Epoch: 10 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:56,868-Speed 13120.70 samples/sec Loss 6.6615 LearningRate 0.1223 Epoch: 10 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:58:58,465-Speed 12839.83 samples/sec Loss 6.7700 LearningRate 0.1223 Epoch: 10 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:00,007-Speed 13291.81 samples/sec Loss 6.6972 LearningRate 0.1223 Epoch: 10 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:01,568-Speed 13124.90 samples/sec Loss 6.7692 LearningRate 0.1222 Epoch: 10 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:03,128-Speed 13133.40 samples/sec Loss 6.7804 LearningRate 0.1222 Epoch: 10 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:04,688-Speed 13133.03 samples/sec Loss 6.6946 LearningRate 0.1222 Epoch: 10 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:06,250-Speed 13124.26 samples/sec Loss 6.7186 LearningRate 0.1221 Epoch: 10 Global Step: 26060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:07,809-Speed 13143.96 samples/sec Loss 6.7973 LearningRate 0.1221 Epoch: 10 Global Step: 26070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:09,383-Speed 13021.99 samples/sec Loss 6.7557 LearningRate 0.1221 Epoch: 10 Global Step: 26080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:10,943-Speed 13137.12 samples/sec Loss 6.7755 LearningRate 0.1220 Epoch: 10 Global Step: 26090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:12,524-Speed 12963.17 samples/sec Loss 6.8249 LearningRate 0.1220 Epoch: 10 Global Step: 26100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:14,127-Speed 12784.09 samples/sec Loss 6.7663 LearningRate 0.1220 Epoch: 10 Global Step: 26110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:15,689-Speed 13122.03 samples/sec Loss 6.7946 LearningRate 0.1219 Epoch: 10 Global Step: 26120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:17,253-Speed 13099.03 samples/sec Loss 6.7184 LearningRate 0.1219 Epoch: 10 Global Step: 26130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:18,827-Speed 13020.46 samples/sec Loss 6.7905 LearningRate 0.1219 Epoch: 10 Global Step: 26140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:20,407-Speed 12971.62 samples/sec Loss 6.8061 LearningRate 0.1218 Epoch: 10 Global Step: 26150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:21,986-Speed 12986.12 samples/sec Loss 6.7514 LearningRate 0.1218 Epoch: 10 Global Step: 26160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 14:59:23,552-Speed 13085.16 samples/sec Loss 6.8376 LearningRate 0.1218 Epoch: 10 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:25,131-Speed 12978.24 samples/sec Loss 6.8523 LearningRate 0.1217 Epoch: 10 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:26,695-Speed 13106.53 samples/sec Loss 6.7976 LearningRate 0.1217 Epoch: 10 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:28,283-Speed 12915.12 samples/sec Loss 6.7327 LearningRate 0.1217 Epoch: 10 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:29,824-Speed 13295.06 samples/sec Loss 6.8399 LearningRate 0.1216 Epoch: 10 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:31,389-Speed 13092.52 samples/sec Loss 6.8108 LearningRate 0.1216 Epoch: 10 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:32,966-Speed 12999.89 samples/sec Loss 6.8572 LearningRate 0.1216 Epoch: 10 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:34,528-Speed 13122.01 samples/sec Loss 6.7571 LearningRate 0.1216 Epoch: 10 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:36,117-Speed 12892.91 samples/sec Loss 6.7729 LearningRate 0.1215 Epoch: 10 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:37,705-Speed 12913.45 samples/sec Loss 6.8093 LearningRate 0.1215 Epoch: 10 Global Step: 26260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 14:59:39,276-Speed 13049.20 samples/sec Loss 6.8908 LearningRate 0.1215 Epoch: 10 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:40,849-Speed 13024.36 samples/sec Loss 6.8628 LearningRate 0.1214 Epoch: 10 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:42,415-Speed 13081.53 samples/sec Loss 6.8310 LearningRate 0.1214 Epoch: 10 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:44,000-Speed 12931.06 samples/sec Loss 6.7673 LearningRate 0.1214 Epoch: 10 Global Step: 26300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:45,555-Speed 13177.95 samples/sec Loss 6.8023 LearningRate 0.1213 Epoch: 10 Global Step: 26310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:47,125-Speed 13057.48 samples/sec Loss 6.8375 LearningRate 0.1213 Epoch: 10 Global Step: 26320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:48,694-Speed 13059.56 samples/sec Loss 6.8036 LearningRate 0.1213 Epoch: 10 Global Step: 26330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:50,260-Speed 13091.75 samples/sec Loss 6.8315 LearningRate 0.1212 Epoch: 10 Global Step: 26340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:51,830-Speed 13049.58 samples/sec Loss 6.7708 LearningRate 0.1212 Epoch: 10 Global Step: 26350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:53,394-Speed 13108.36 samples/sec Loss 6.7563 LearningRate 0.1212 Epoch: 10 Global Step: 26360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:54,946-Speed 13203.29 samples/sec Loss 6.7874 LearningRate 0.1211 Epoch: 10 Global Step: 26370 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:59:56,518-Speed 13034.03 samples/sec Loss 6.8585 LearningRate 0.1211 Epoch: 10 Global Step: 26380 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 14:59:58,101-Speed 12950.35 samples/sec Loss 6.7524 LearningRate 0.1211 Epoch: 10 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 14:59:59,664-Speed 13102.96 samples/sec Loss 6.8873 LearningRate 0.1210 Epoch: 10 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:01,258-Speed 12859.11 samples/sec Loss 6.8596 LearningRate 0.1210 Epoch: 10 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:02,832-Speed 13027.32 samples/sec Loss 6.9129 LearningRate 0.1210 Epoch: 10 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:04,397-Speed 13089.55 samples/sec Loss 6.9045 LearningRate 0.1209 Epoch: 10 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:05,964-Speed 13078.27 samples/sec Loss 6.8755 LearningRate 0.1209 Epoch: 10 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:07,502-Speed 13329.45 samples/sec Loss 6.9694 LearningRate 0.1209 Epoch: 10 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:09,072-Speed 13049.54 samples/sec Loss 6.8160 LearningRate 0.1208 Epoch: 10 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:10,665-Speed 12866.88 samples/sec Loss 6.8170 LearningRate 0.1208 Epoch: 10 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:12,220-Speed 13182.45 samples/sec Loss 6.9471 LearningRate 0.1208 Epoch: 10 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:13,857-Speed 12523.73 samples/sec Loss 6.8629 LearningRate 0.1207 Epoch: 10 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:15,390-Speed 13367.57 samples/sec Loss 6.8535 LearningRate 0.1207 Epoch: 10 Global Step: 26500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:16,952-Speed 13118.72 samples/sec Loss 6.9254 LearningRate 0.1207 Epoch: 10 Global Step: 26510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:18,525-Speed 13031.52 samples/sec Loss 6.8953 LearningRate 0.1206 Epoch: 10 Global Step: 26520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:20,141-Speed 12676.03 samples/sec Loss 6.8191 LearningRate 0.1206 Epoch: 10 Global Step: 26530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:21,708-Speed 13079.88 samples/sec Loss 6.8637 LearningRate 0.1206 Epoch: 10 Global Step: 26540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:23,279-Speed 13047.78 samples/sec Loss 6.8238 LearningRate 0.1205 Epoch: 10 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:24,826-Speed 13248.56 samples/sec Loss 6.9003 LearningRate 0.1205 Epoch: 10 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:26,409-Speed 12945.86 samples/sec Loss 6.8507 LearningRate 0.1205 Epoch: 10 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:27,977-Speed 13076.42 samples/sec Loss 6.8532 LearningRate 0.1205 Epoch: 10 Global Step: 26580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:29,537-Speed 13132.45 samples/sec Loss 6.8845 LearningRate 0.1204 Epoch: 10 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:31,123-Speed 12921.27 samples/sec Loss 6.8528 LearningRate 0.1204 Epoch: 10 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:32,699-Speed 13003.54 samples/sec Loss 6.8449 LearningRate 0.1204 Epoch: 10 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:34,250-Speed 13219.43 samples/sec Loss 6.8589 LearningRate 0.1203 Epoch: 10 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:35,834-Speed 12937.67 samples/sec Loss 6.8303 LearningRate 0.1203 Epoch: 10 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:37,411-Speed 12989.17 samples/sec Loss 6.8297 LearningRate 0.1203 Epoch: 10 Global Step: 26640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:38,992-Speed 12966.54 samples/sec Loss 6.8120 LearningRate 0.1202 Epoch: 10 Global Step: 26650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:40,551-Speed 13147.86 samples/sec Loss 6.8497 LearningRate 0.1202 Epoch: 10 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:42,141-Speed 12890.18 samples/sec Loss 6.8498 LearningRate 0.1202 Epoch: 10 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:00:43,700-Speed 13151.51 samples/sec Loss 6.9448 LearningRate 0.1201 Epoch: 10 Global Step: 26680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:45,271-Speed 13041.67 samples/sec Loss 6.8292 LearningRate 0.1201 Epoch: 10 Global Step: 26690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:46,844-Speed 13022.22 samples/sec Loss 6.8356 LearningRate 0.1201 Epoch: 10 Global Step: 26700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:48,444-Speed 12817.74 samples/sec Loss 6.8670 LearningRate 0.1200 Epoch: 10 Global Step: 26710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:49,988-Speed 13270.37 samples/sec Loss 6.9729 LearningRate 0.1200 Epoch: 10 Global Step: 26720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:51,549-Speed 13130.17 samples/sec Loss 6.8953 LearningRate 0.1200 Epoch: 10 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:53,164-Speed 12690.08 samples/sec Loss 6.9164 LearningRate 0.1199 Epoch: 10 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:54,709-Speed 13261.09 samples/sec Loss 6.9914 LearningRate 0.1199 Epoch: 10 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:56,277-Speed 13068.69 samples/sec Loss 6.9141 LearningRate 0.1199 Epoch: 10 Global Step: 26760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:57,858-Speed 12964.26 samples/sec Loss 6.9076 LearningRate 0.1198 Epoch: 10 Global Step: 26770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:00:59,411-Speed 13202.15 samples/sec Loss 6.9050 LearningRate 0.1198 Epoch: 10 Global Step: 26780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:00,956-Speed 13258.10 samples/sec Loss 6.9305 LearningRate 0.1198 Epoch: 10 Global Step: 26790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:02,592-Speed 12527.36 samples/sec Loss 6.8804 LearningRate 0.1197 Epoch: 10 Global Step: 26800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:04,125-Speed 13378.78 samples/sec Loss 6.8371 LearningRate 0.1197 Epoch: 10 Global Step: 26810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:05,692-Speed 13078.99 samples/sec Loss 6.9055 LearningRate 0.1197 Epoch: 10 Global Step: 26820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:07,282-Speed 12887.70 samples/sec Loss 6.9810 LearningRate 0.1196 Epoch: 10 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:08,838-Speed 13174.40 samples/sec Loss 6.9737 LearningRate 0.1196 Epoch: 10 Global Step: 26840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:10,407-Speed 13057.59 samples/sec Loss 6.8066 LearningRate 0.1196 Epoch: 10 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:11,972-Speed 13099.28 samples/sec Loss 6.8616 LearningRate 0.1195 Epoch: 10 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:13,552-Speed 12975.25 samples/sec Loss 6.8419 LearningRate 0.1195 Epoch: 10 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:15,075-Speed 13447.78 samples/sec Loss 6.8319 LearningRate 0.1195 Epoch: 10 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:16,660-Speed 12931.64 samples/sec Loss 6.9176 LearningRate 0.1195 Epoch: 10 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:18,208-Speed 13237.27 samples/sec Loss 6.9106 LearningRate 0.1194 Epoch: 10 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:19,762-Speed 13187.20 samples/sec Loss 6.8128 LearningRate 0.1194 Epoch: 10 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:21,347-Speed 12933.01 samples/sec Loss 6.9702 LearningRate 0.1194 Epoch: 10 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:22,905-Speed 13158.26 samples/sec Loss 6.8922 LearningRate 0.1193 Epoch: 10 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:24,461-Speed 13166.48 samples/sec Loss 6.9227 LearningRate 0.1193 Epoch: 10 Global Step: 26940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:26,025-Speed 13105.93 samples/sec Loss 6.9891 LearningRate 0.1193 Epoch: 10 Global Step: 26950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:27,589-Speed 13104.06 samples/sec Loss 6.8658 LearningRate 0.1192 Epoch: 10 Global Step: 26960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:29,148-Speed 13147.92 samples/sec Loss 6.9209 LearningRate 0.1192 Epoch: 10 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:30,701-Speed 13189.20 samples/sec Loss 6.9563 LearningRate 0.1192 Epoch: 10 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:32,302-Speed 12909.12 samples/sec Loss 6.8711 LearningRate 0.1191 Epoch: 10 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:33,879-Speed 12993.08 samples/sec Loss 6.9906 LearningRate 0.1191 Epoch: 10 Global Step: 27000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:35,460-Speed 12963.66 samples/sec Loss 6.9758 LearningRate 0.1191 Epoch: 10 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:37,050-Speed 12887.47 samples/sec Loss 6.8084 LearningRate 0.1190 Epoch: 10 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:38,624-Speed 13028.27 samples/sec Loss 6.8569 LearningRate 0.1190 Epoch: 10 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:40,221-Speed 12832.68 samples/sec Loss 6.8785 LearningRate 0.1190 Epoch: 10 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:41,792-Speed 13045.87 samples/sec Loss 6.8759 LearningRate 0.1189 Epoch: 10 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:43,358-Speed 13162.69 samples/sec Loss 6.8117 LearningRate 0.1189 Epoch: 10 Global Step: 27060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:44,937-Speed 12975.84 samples/sec Loss 6.8182 LearningRate 0.1189 Epoch: 10 Global Step: 27070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:46,516-Speed 12979.91 samples/sec Loss 6.7906 LearningRate 0.1188 Epoch: 10 Global Step: 27080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:48,082-Speed 13090.89 samples/sec Loss 6.8895 LearningRate 0.1188 Epoch: 10 Global Step: 27090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:49,658-Speed 13005.36 samples/sec Loss 6.8977 LearningRate 0.1188 Epoch: 10 Global Step: 27100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:51,263-Speed 12766.18 samples/sec Loss 6.9517 LearningRate 0.1187 Epoch: 10 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:01:52,817-Speed 13190.86 samples/sec Loss 6.9065 LearningRate 0.1187 Epoch: 10 Global Step: 27120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:54,392-Speed 13005.95 samples/sec Loss 6.8609 LearningRate 0.1187 Epoch: 10 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:55,966-Speed 13019.11 samples/sec Loss 6.9053 LearningRate 0.1186 Epoch: 10 Global Step: 27140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:57,522-Speed 13171.10 samples/sec Loss 6.8648 LearningRate 0.1186 Epoch: 10 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:01:59,075-Speed 13196.49 samples/sec Loss 6.8871 LearningRate 0.1186 Epoch: 10 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:00,647-Speed 13037.61 samples/sec Loss 6.9824 LearningRate 0.1186 Epoch: 10 Global Step: 27170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:02,254-Speed 12745.69 samples/sec Loss 6.8826 LearningRate 0.1185 Epoch: 10 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:03,808-Speed 13197.81 samples/sec Loss 6.9410 LearningRate 0.1185 Epoch: 10 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:05,372-Speed 13096.70 samples/sec Loss 6.9477 LearningRate 0.1185 Epoch: 10 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:06,942-Speed 13055.26 samples/sec Loss 6.9243 LearningRate 0.1184 Epoch: 10 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:08,537-Speed 12884.20 samples/sec Loss 6.8307 LearningRate 0.1184 Epoch: 10 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:10,122-Speed 12936.99 samples/sec Loss 6.8843 LearningRate 0.1184 Epoch: 10 Global Step: 27230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:11,697-Speed 13008.03 samples/sec Loss 6.9281 LearningRate 0.1183 Epoch: 10 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:13,247-Speed 13223.64 samples/sec Loss 6.8827 LearningRate 0.1183 Epoch: 10 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:14,831-Speed 12939.81 samples/sec Loss 6.9044 LearningRate 0.1183 Epoch: 10 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:16,420-Speed 12891.76 samples/sec Loss 6.8216 LearningRate 0.1182 Epoch: 10 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:17,995-Speed 13018.83 samples/sec Loss 6.8538 LearningRate 0.1182 Epoch: 10 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:19,548-Speed 13202.41 samples/sec Loss 6.9155 LearningRate 0.1182 Epoch: 10 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:21,134-Speed 12913.53 samples/sec Loss 6.8790 LearningRate 0.1181 Epoch: 10 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:22,717-Speed 12954.71 samples/sec Loss 6.9330 LearningRate 0.1181 Epoch: 10 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:24,266-Speed 13231.48 samples/sec Loss 6.8987 LearningRate 0.1181 Epoch: 10 Global Step: 27320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 15:02:25,785-Speed 13487.93 samples/sec Loss 6.8263 LearningRate 0.1180 Epoch: 10 Global Step: 27330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:27,350-Speed 13092.28 samples/sec Loss 6.9061 LearningRate 0.1180 Epoch: 10 Global Step: 27340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:28,949-Speed 12818.84 samples/sec Loss 6.8689 LearningRate 0.1180 Epoch: 10 Global Step: 27350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:30,496-Speed 13243.78 samples/sec Loss 6.9048 LearningRate 0.1179 Epoch: 10 Global Step: 27360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:32,061-Speed 13098.01 samples/sec Loss 6.8288 LearningRate 0.1179 Epoch: 10 Global Step: 27370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:33,661-Speed 12810.94 samples/sec Loss 6.9930 LearningRate 0.1179 Epoch: 10 Global Step: 27380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:35,217-Speed 13169.40 samples/sec Loss 6.9270 LearningRate 0.1178 Epoch: 10 Global Step: 27390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:36,766-Speed 13227.88 samples/sec Loss 6.9004 LearningRate 0.1178 Epoch: 10 Global Step: 27400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:38,348-Speed 12959.48 samples/sec Loss 6.9167 LearningRate 0.1178 Epoch: 10 Global Step: 27410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:39,896-Speed 13238.83 samples/sec Loss 6.8714 LearningRate 0.1178 Epoch: 10 Global Step: 27420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 15:02:41,460-Speed 13102.45 samples/sec Loss 6.9159 LearningRate 0.1177 Epoch: 10 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:43,041-Speed 12964.63 samples/sec Loss 6.8835 LearningRate 0.1177 Epoch: 10 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:44,612-Speed 13046.06 samples/sec Loss 6.9190 LearningRate 0.1177 Epoch: 10 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:46,160-Speed 13230.35 samples/sec Loss 6.8159 LearningRate 0.1176 Epoch: 10 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:47,722-Speed 13120.60 samples/sec Loss 6.9176 LearningRate 0.1176 Epoch: 10 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:49,281-Speed 13148.40 samples/sec Loss 6.9805 LearningRate 0.1176 Epoch: 10 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:50,830-Speed 13224.42 samples/sec Loss 6.9441 LearningRate 0.1175 Epoch: 10 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:52,383-Speed 13193.44 samples/sec Loss 6.9026 LearningRate 0.1175 Epoch: 10 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:53,994-Speed 12726.31 samples/sec Loss 6.9350 LearningRate 0.1175 Epoch: 10 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:55,549-Speed 13209.19 samples/sec Loss 6.9091 LearningRate 0.1174 Epoch: 10 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:02:57,112-Speed 13115.36 samples/sec Loss 6.8647 LearningRate 0.1174 Epoch: 10 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:02:58,698-Speed 12927.86 samples/sec Loss 6.8749 LearningRate 0.1174 Epoch: 10 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:00,264-Speed 13081.86 samples/sec Loss 6.9159 LearningRate 0.1173 Epoch: 10 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:01,831-Speed 13081.77 samples/sec Loss 6.9411 LearningRate 0.1173 Epoch: 10 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:03,403-Speed 13036.46 samples/sec Loss 6.7961 LearningRate 0.1173 Epoch: 10 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:04,953-Speed 13223.76 samples/sec Loss 6.9547 LearningRate 0.1172 Epoch: 10 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:06,513-Speed 13138.36 samples/sec Loss 6.8493 LearningRate 0.1172 Epoch: 10 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:08,073-Speed 13138.77 samples/sec Loss 6.8714 LearningRate 0.1172 Epoch: 10 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:09,667-Speed 12856.74 samples/sec Loss 6.9593 LearningRate 0.1171 Epoch: 10 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:11,221-Speed 13183.85 samples/sec Loss 6.8008 LearningRate 0.1171 Epoch: 10 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:12,775-Speed 13188.71 samples/sec Loss 6.8811 LearningRate 0.1171 Epoch: 10 Global Step: 27630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 15:03:14,320-Speed 13264.59 samples/sec Loss 6.8782 LearningRate 0.1171 Epoch: 10 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:15,901-Speed 12964.69 samples/sec Loss 6.9082 LearningRate 0.1170 Epoch: 10 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:17,461-Speed 13130.15 samples/sec Loss 6.8684 LearningRate 0.1170 Epoch: 10 Global Step: 27660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:19,016-Speed 13190.71 samples/sec Loss 6.7901 LearningRate 0.1170 Epoch: 10 Global Step: 27670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:20,615-Speed 12807.35 samples/sec Loss 6.8161 LearningRate 0.1169 Epoch: 10 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:22,190-Speed 13018.05 samples/sec Loss 6.8400 LearningRate 0.1169 Epoch: 10 Global Step: 27690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:23,761-Speed 13041.34 samples/sec Loss 6.8695 LearningRate 0.1169 Epoch: 10 Global Step: 27700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:25,343-Speed 12953.53 samples/sec Loss 6.8388 LearningRate 0.1168 Epoch: 10 Global Step: 27710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:26,903-Speed 13136.74 samples/sec Loss 6.9239 LearningRate 0.1168 Epoch: 10 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:28,480-Speed 12996.52 samples/sec Loss 6.9225 LearningRate 0.1168 Epoch: 10 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:30,045-Speed 13099.79 samples/sec Loss 6.8391 LearningRate 0.1167 Epoch: 10 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:31,607-Speed 13119.45 samples/sec Loss 6.8179 LearningRate 0.1167 Epoch: 10 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:33,159-Speed 13205.49 samples/sec Loss 6.9821 LearningRate 0.1167 Epoch: 10 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:34,761-Speed 12822.96 samples/sec Loss 6.9266 LearningRate 0.1166 Epoch: 10 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:03:36,301-Speed 13310.34 samples/sec Loss 6.8721 LearningRate 0.1166 Epoch: 10 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:37,871-Speed 13048.32 samples/sec Loss 6.9524 LearningRate 0.1166 Epoch: 10 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:39,448-Speed 12998.99 samples/sec Loss 6.9304 LearningRate 0.1165 Epoch: 10 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:41,025-Speed 12996.89 samples/sec Loss 6.8628 LearningRate 0.1165 Epoch: 10 Global Step: 27810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:56,283-Speed 1342.40 samples/sec Loss 6.7888 LearningRate 0.1165 Epoch: 11 Global Step: 27820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:57,937-Speed 12395.58 samples/sec Loss 6.1116 LearningRate 0.1164 Epoch: 11 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:03:59,513-Speed 13005.89 samples/sec Loss 6.0061 LearningRate 0.1164 Epoch: 11 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:01,122-Speed 12737.19 samples/sec Loss 6.0181 LearningRate 0.1164 Epoch: 11 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:02,684-Speed 13123.68 samples/sec Loss 6.0400 LearningRate 0.1164 Epoch: 11 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:04,313-Speed 12588.11 samples/sec Loss 5.9212 LearningRate 0.1163 Epoch: 11 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:05,897-Speed 12931.29 samples/sec Loss 6.0224 LearningRate 0.1163 Epoch: 11 Global Step: 27880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:04:07,431-Speed 13358.28 samples/sec Loss 6.0707 LearningRate 0.1163 Epoch: 11 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:08,998-Speed 13079.06 samples/sec Loss 6.0539 LearningRate 0.1162 Epoch: 11 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:10,568-Speed 13077.32 samples/sec Loss 6.0531 LearningRate 0.1162 Epoch: 11 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:12,161-Speed 12866.52 samples/sec Loss 6.1862 LearningRate 0.1162 Epoch: 11 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:13,783-Speed 12638.10 samples/sec Loss 6.0326 LearningRate 0.1161 Epoch: 11 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:15,345-Speed 13122.09 samples/sec Loss 6.0838 LearningRate 0.1161 Epoch: 11 Global Step: 27940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:16,920-Speed 13010.80 samples/sec Loss 6.1298 LearningRate 0.1161 Epoch: 11 Global Step: 27950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:18,492-Speed 13040.52 samples/sec Loss 6.1501 LearningRate 0.1160 Epoch: 11 Global Step: 27960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:20,072-Speed 12965.66 samples/sec Loss 6.1338 LearningRate 0.1160 Epoch: 11 Global Step: 27970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:21,657-Speed 12931.67 samples/sec Loss 6.2543 LearningRate 0.1160 Epoch: 11 Global Step: 27980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:23,244-Speed 12913.15 samples/sec Loss 6.1624 LearningRate 0.1159 Epoch: 11 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:04:24,846-Speed 12794.17 samples/sec Loss 6.2044 LearningRate 0.1159 Epoch: 11 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:04:26,417-Speed 13047.14 samples/sec Loss 6.2500 LearningRate 0.1159 Epoch: 11 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:04:27,991-Speed 13018.93 samples/sec Loss 6.2347 LearningRate 0.1158 Epoch: 11 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:04:29,607-Speed 12681.12 samples/sec Loss 6.2354 LearningRate 0.1158 Epoch: 11 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:31,163-Speed 13176.15 samples/sec Loss 6.2706 LearningRate 0.1158 Epoch: 11 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:32,721-Speed 13148.08 samples/sec Loss 6.2464 LearningRate 0.1157 Epoch: 11 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:34,318-Speed 12836.99 samples/sec Loss 6.2489 LearningRate 0.1157 Epoch: 11 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:35,890-Speed 13033.76 samples/sec Loss 6.2868 LearningRate 0.1157 Epoch: 11 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:37,457-Speed 13075.36 samples/sec Loss 6.2104 LearningRate 0.1157 Epoch: 11 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:39,012-Speed 13184.66 samples/sec Loss 6.2755 LearningRate 0.1156 Epoch: 11 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:40,568-Speed 13166.48 samples/sec Loss 6.3268 LearningRate 0.1156 Epoch: 11 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:42,130-Speed 13121.08 samples/sec Loss 6.3832 LearningRate 0.1156 Epoch: 11 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:43,729-Speed 12821.26 samples/sec Loss 6.2587 LearningRate 0.1155 Epoch: 11 Global Step: 28120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:45,297-Speed 13066.01 samples/sec Loss 6.2733 LearningRate 0.1155 Epoch: 11 Global Step: 28130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:46,866-Speed 13065.75 samples/sec Loss 6.3136 LearningRate 0.1155 Epoch: 11 Global Step: 28140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:48,474-Speed 12742.05 samples/sec Loss 6.3204 LearningRate 0.1154 Epoch: 11 Global Step: 28150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:50,024-Speed 13228.83 samples/sec Loss 6.3791 LearningRate 0.1154 Epoch: 11 Global Step: 28160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:51,601-Speed 13011.46 samples/sec Loss 6.2640 LearningRate 0.1154 Epoch: 11 Global Step: 28170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:53,195-Speed 12866.67 samples/sec Loss 6.2742 LearningRate 0.1153 Epoch: 11 Global Step: 28180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:54,732-Speed 13331.58 samples/sec Loss 6.4519 LearningRate 0.1153 Epoch: 11 Global Step: 28190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:56,302-Speed 13045.87 samples/sec Loss 6.3137 LearningRate 0.1153 Epoch: 11 Global Step: 28200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:57,900-Speed 12830.58 samples/sec Loss 6.3469 LearningRate 0.1152 Epoch: 11 Global Step: 28210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:04:59,479-Speed 12982.86 samples/sec Loss 6.3190 LearningRate 0.1152 Epoch: 11 Global Step: 28220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:01,050-Speed 13038.63 samples/sec Loss 6.5194 LearningRate 0.1152 Epoch: 11 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:02,633-Speed 12942.46 samples/sec Loss 6.4206 LearningRate 0.1151 Epoch: 11 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:04,196-Speed 13117.26 samples/sec Loss 6.3269 LearningRate 0.1151 Epoch: 11 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:05,826-Speed 12574.67 samples/sec Loss 6.4585 LearningRate 0.1151 Epoch: 11 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:07,408-Speed 12953.09 samples/sec Loss 6.4722 LearningRate 0.1151 Epoch: 11 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:09,002-Speed 12858.93 samples/sec Loss 6.4230 LearningRate 0.1150 Epoch: 11 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:10,569-Speed 13081.22 samples/sec Loss 6.4939 LearningRate 0.1150 Epoch: 11 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:12,154-Speed 12933.98 samples/sec Loss 6.5499 LearningRate 0.1150 Epoch: 11 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:13,745-Speed 12884.18 samples/sec Loss 6.3940 LearningRate 0.1149 Epoch: 11 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:15,322-Speed 12996.43 samples/sec Loss 6.4285 LearningRate 0.1149 Epoch: 11 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:16,887-Speed 13109.94 samples/sec Loss 6.4566 LearningRate 0.1149 Epoch: 11 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:18,462-Speed 13024.34 samples/sec Loss 6.4886 LearningRate 0.1148 Epoch: 11 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:20,029-Speed 13072.42 samples/sec Loss 6.3887 LearningRate 0.1148 Epoch: 11 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:21,628-Speed 12818.07 samples/sec Loss 6.4579 LearningRate 0.1148 Epoch: 11 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:23,202-Speed 13022.18 samples/sec Loss 6.4768 LearningRate 0.1147 Epoch: 11 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:24,797-Speed 12848.75 samples/sec Loss 6.4452 LearningRate 0.1147 Epoch: 11 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:26,366-Speed 13061.92 samples/sec Loss 6.4927 LearningRate 0.1147 Epoch: 11 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:27,979-Speed 12702.22 samples/sec Loss 6.4970 LearningRate 0.1146 Epoch: 11 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:29,558-Speed 12988.75 samples/sec Loss 6.5915 LearningRate 0.1146 Epoch: 11 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:31,142-Speed 12933.94 samples/sec Loss 6.5706 LearningRate 0.1146 Epoch: 11 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:32,717-Speed 13010.61 samples/sec Loss 6.4581 LearningRate 0.1145 Epoch: 11 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:34,299-Speed 12958.49 samples/sec Loss 6.5800 LearningRate 0.1145 Epoch: 11 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:35,852-Speed 13195.16 samples/sec Loss 6.5481 LearningRate 0.1145 Epoch: 11 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:37,439-Speed 12913.42 samples/sec Loss 6.4930 LearningRate 0.1145 Epoch: 11 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:39,001-Speed 13126.70 samples/sec Loss 6.5427 LearningRate 0.1144 Epoch: 11 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:40,583-Speed 12955.59 samples/sec Loss 6.4683 LearningRate 0.1144 Epoch: 11 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:42,167-Speed 12943.19 samples/sec Loss 6.4321 LearningRate 0.1144 Epoch: 11 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:43,733-Speed 13090.54 samples/sec Loss 6.5195 LearningRate 0.1143 Epoch: 11 Global Step: 28500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:45,324-Speed 12873.66 samples/sec Loss 6.5445 LearningRate 0.1143 Epoch: 11 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:46,908-Speed 12939.98 samples/sec Loss 6.5628 LearningRate 0.1143 Epoch: 11 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:48,481-Speed 13032.27 samples/sec Loss 6.5492 LearningRate 0.1142 Epoch: 11 Global Step: 28530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:50,073-Speed 12866.40 samples/sec Loss 6.5063 LearningRate 0.1142 Epoch: 11 Global Step: 28540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:05:51,634-Speed 13133.69 samples/sec Loss 6.5942 LearningRate 0.1142 Epoch: 11 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:53,211-Speed 13000.99 samples/sec Loss 6.5633 LearningRate 0.1141 Epoch: 11 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:54,788-Speed 12991.67 samples/sec Loss 6.5165 LearningRate 0.1141 Epoch: 11 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:56,358-Speed 13052.84 samples/sec Loss 6.5432 LearningRate 0.1141 Epoch: 11 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:57,922-Speed 13103.78 samples/sec Loss 6.5950 LearningRate 0.1140 Epoch: 11 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:05:59,480-Speed 13151.12 samples/sec Loss 6.5549 LearningRate 0.1140 Epoch: 11 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:01,035-Speed 13183.91 samples/sec Loss 6.5583 LearningRate 0.1140 Epoch: 11 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:02,601-Speed 13090.65 samples/sec Loss 6.5265 LearningRate 0.1140 Epoch: 11 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:04,154-Speed 13203.37 samples/sec Loss 6.5695 LearningRate 0.1139 Epoch: 11 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:05,732-Speed 12983.56 samples/sec Loss 6.4992 LearningRate 0.1139 Epoch: 11 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:07,307-Speed 13015.78 samples/sec Loss 6.5744 LearningRate 0.1139 Epoch: 11 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:08,881-Speed 13015.74 samples/sec Loss 6.5411 LearningRate 0.1138 Epoch: 11 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:10,463-Speed 12957.13 samples/sec Loss 6.5609 LearningRate 0.1138 Epoch: 11 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:12,038-Speed 13008.56 samples/sec Loss 6.7207 LearningRate 0.1138 Epoch: 11 Global Step: 28680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:13,616-Speed 12995.67 samples/sec Loss 6.5276 LearningRate 0.1137 Epoch: 11 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:15,201-Speed 12928.12 samples/sec Loss 6.6204 LearningRate 0.1137 Epoch: 11 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:16,760-Speed 13151.17 samples/sec Loss 6.5551 LearningRate 0.1137 Epoch: 11 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:18,329-Speed 13057.80 samples/sec Loss 6.6398 LearningRate 0.1136 Epoch: 11 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:19,920-Speed 12884.67 samples/sec Loss 6.6535 LearningRate 0.1136 Epoch: 11 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:21,500-Speed 12971.41 samples/sec Loss 6.7194 LearningRate 0.1136 Epoch: 11 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:23,066-Speed 13080.86 samples/sec Loss 6.6689 LearningRate 0.1135 Epoch: 11 Global Step: 28750 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 15:06:24,646-Speed 12980.66 samples/sec Loss 6.6441 LearningRate 0.1135 Epoch: 11 Global Step: 28760 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-14 15:06:26,197-Speed 13214.25 samples/sec Loss 6.5719 LearningRate 0.1135 Epoch: 11 Global Step: 28770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:27,774-Speed 12989.62 samples/sec Loss 6.5694 LearningRate 0.1134 Epoch: 11 Global Step: 28780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:29,394-Speed 12659.09 samples/sec Loss 6.6264 LearningRate 0.1134 Epoch: 11 Global Step: 28790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:30,973-Speed 12981.38 samples/sec Loss 6.6600 LearningRate 0.1134 Epoch: 11 Global Step: 28800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:32,556-Speed 12944.25 samples/sec Loss 6.5995 LearningRate 0.1134 Epoch: 11 Global Step: 28810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:34,129-Speed 13036.70 samples/sec Loss 6.5918 LearningRate 0.1133 Epoch: 11 Global Step: 28820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:35,702-Speed 13026.89 samples/sec Loss 6.6982 LearningRate 0.1133 Epoch: 11 Global Step: 28830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:37,275-Speed 13023.98 samples/sec Loss 6.6762 LearningRate 0.1133 Epoch: 11 Global Step: 28840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:38,864-Speed 12900.08 samples/sec Loss 6.5799 LearningRate 0.1132 Epoch: 11 Global Step: 28850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:40,432-Speed 13074.44 samples/sec Loss 6.7126 LearningRate 0.1132 Epoch: 11 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:42,018-Speed 12917.31 samples/sec Loss 6.6860 LearningRate 0.1132 Epoch: 11 Global Step: 28870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:43,570-Speed 13214.01 samples/sec Loss 6.6208 LearningRate 0.1131 Epoch: 11 Global Step: 28880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:45,155-Speed 12930.84 samples/sec Loss 6.6894 LearningRate 0.1131 Epoch: 11 Global Step: 28890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:46,735-Speed 12967.52 samples/sec Loss 6.6810 LearningRate 0.1131 Epoch: 11 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:06:48,270-Speed 13360.42 samples/sec Loss 6.6676 LearningRate 0.1130 Epoch: 11 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:49,865-Speed 12848.09 samples/sec Loss 6.7177 LearningRate 0.1130 Epoch: 11 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:51,461-Speed 12844.73 samples/sec Loss 6.6479 LearningRate 0.1130 Epoch: 11 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:53,027-Speed 13079.28 samples/sec Loss 6.6538 LearningRate 0.1129 Epoch: 11 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:54,605-Speed 12999.09 samples/sec Loss 6.7434 LearningRate 0.1129 Epoch: 11 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:56,153-Speed 13238.60 samples/sec Loss 6.6451 LearningRate 0.1129 Epoch: 11 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:57,749-Speed 12834.82 samples/sec Loss 6.7497 LearningRate 0.1129 Epoch: 11 Global Step: 28970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:06:59,305-Speed 13173.83 samples/sec Loss 6.6924 LearningRate 0.1128 Epoch: 11 Global Step: 28980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:00,885-Speed 12976.33 samples/sec Loss 6.6900 LearningRate 0.1128 Epoch: 11 Global Step: 28990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:02,496-Speed 12718.64 samples/sec Loss 6.7556 LearningRate 0.1128 Epoch: 11 Global Step: 29000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:04,045-Speed 13233.90 samples/sec Loss 6.7856 LearningRate 0.1127 Epoch: 11 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:05,633-Speed 12906.41 samples/sec Loss 6.7332 LearningRate 0.1127 Epoch: 11 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:07,185-Speed 13200.40 samples/sec Loss 6.7125 LearningRate 0.1127 Epoch: 11 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:08,751-Speed 13097.04 samples/sec Loss 6.7445 LearningRate 0.1126 Epoch: 11 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:10,323-Speed 13032.58 samples/sec Loss 6.7039 LearningRate 0.1126 Epoch: 11 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:11,891-Speed 13070.15 samples/sec Loss 6.6925 LearningRate 0.1126 Epoch: 11 Global Step: 29060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:13,464-Speed 13028.65 samples/sec Loss 6.7727 LearningRate 0.1125 Epoch: 11 Global Step: 29070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:15,063-Speed 12819.28 samples/sec Loss 6.6709 LearningRate 0.1125 Epoch: 11 Global Step: 29080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:16,634-Speed 13041.83 samples/sec Loss 6.6413 LearningRate 0.1125 Epoch: 11 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:18,198-Speed 13109.87 samples/sec Loss 6.7084 LearningRate 0.1124 Epoch: 11 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:19,759-Speed 13149.99 samples/sec Loss 6.6279 LearningRate 0.1124 Epoch: 11 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:21,360-Speed 12806.38 samples/sec Loss 6.6438 LearningRate 0.1124 Epoch: 11 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:22,914-Speed 13184.95 samples/sec Loss 6.6458 LearningRate 0.1124 Epoch: 11 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:24,483-Speed 13062.99 samples/sec Loss 6.7132 LearningRate 0.1123 Epoch: 11 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:26,077-Speed 12857.63 samples/sec Loss 6.6810 LearningRate 0.1123 Epoch: 11 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:27,671-Speed 12854.89 samples/sec Loss 6.6339 LearningRate 0.1123 Epoch: 11 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:29,232-Speed 13129.91 samples/sec Loss 6.6849 LearningRate 0.1122 Epoch: 11 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:30,808-Speed 13000.22 samples/sec Loss 6.6990 LearningRate 0.1122 Epoch: 11 Global Step: 29180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:32,385-Speed 13003.02 samples/sec Loss 6.7540 LearningRate 0.1122 Epoch: 11 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:33,953-Speed 13071.90 samples/sec Loss 6.6875 LearningRate 0.1121 Epoch: 11 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:35,537-Speed 12938.81 samples/sec Loss 6.6970 LearningRate 0.1121 Epoch: 11 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:37,127-Speed 12884.37 samples/sec Loss 6.7065 LearningRate 0.1121 Epoch: 11 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:38,683-Speed 13177.09 samples/sec Loss 6.6973 LearningRate 0.1120 Epoch: 11 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:40,250-Speed 13103.04 samples/sec Loss 6.6831 LearningRate 0.1120 Epoch: 11 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:41,846-Speed 12836.02 samples/sec Loss 6.6753 LearningRate 0.1120 Epoch: 11 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:07:43,398-Speed 13208.03 samples/sec Loss 6.7874 LearningRate 0.1119 Epoch: 11 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:44,995-Speed 12834.45 samples/sec Loss 6.6596 LearningRate 0.1119 Epoch: 11 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:46,558-Speed 13101.45 samples/sec Loss 6.7283 LearningRate 0.1119 Epoch: 11 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:48,135-Speed 12995.94 samples/sec Loss 6.8110 LearningRate 0.1119 Epoch: 11 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:49,702-Speed 13083.35 samples/sec Loss 6.6238 LearningRate 0.1118 Epoch: 11 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:51,270-Speed 13072.21 samples/sec Loss 6.7125 LearningRate 0.1118 Epoch: 11 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:52,822-Speed 13197.98 samples/sec Loss 6.7333 LearningRate 0.1118 Epoch: 11 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:54,386-Speed 13114.21 samples/sec Loss 6.8655 LearningRate 0.1117 Epoch: 11 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:55,976-Speed 12887.42 samples/sec Loss 6.6758 LearningRate 0.1117 Epoch: 11 Global Step: 29340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:57,567-Speed 12881.20 samples/sec Loss 6.7672 LearningRate 0.1117 Epoch: 11 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:07:59,149-Speed 12958.51 samples/sec Loss 6.7899 LearningRate 0.1116 Epoch: 11 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:00,717-Speed 13067.18 samples/sec Loss 6.6287 LearningRate 0.1116 Epoch: 11 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:02,289-Speed 13041.50 samples/sec Loss 6.6184 LearningRate 0.1116 Epoch: 11 Global Step: 29380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:03,866-Speed 13001.13 samples/sec Loss 6.6707 LearningRate 0.1115 Epoch: 11 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:05,456-Speed 12886.41 samples/sec Loss 6.6947 LearningRate 0.1115 Epoch: 11 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:07,021-Speed 13094.92 samples/sec Loss 6.6842 LearningRate 0.1115 Epoch: 11 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:08,587-Speed 13088.24 samples/sec Loss 6.6473 LearningRate 0.1115 Epoch: 11 Global Step: 29420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:10,180-Speed 12868.56 samples/sec Loss 6.7191 LearningRate 0.1114 Epoch: 11 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:11,771-Speed 12881.81 samples/sec Loss 6.5841 LearningRate 0.1114 Epoch: 11 Global Step: 29440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:13,347-Speed 13005.77 samples/sec Loss 6.6800 LearningRate 0.1114 Epoch: 11 Global Step: 29450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:14,954-Speed 12752.92 samples/sec Loss 6.7170 LearningRate 0.1113 Epoch: 11 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:16,525-Speed 13048.63 samples/sec Loss 6.6998 LearningRate 0.1113 Epoch: 11 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:18,100-Speed 13011.26 samples/sec Loss 6.6627 LearningRate 0.1113 Epoch: 11 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:19,699-Speed 12822.89 samples/sec Loss 6.6680 LearningRate 0.1112 Epoch: 11 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:21,282-Speed 12938.71 samples/sec Loss 6.6983 LearningRate 0.1112 Epoch: 11 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:22,885-Speed 12786.40 samples/sec Loss 6.7279 LearningRate 0.1112 Epoch: 11 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:24,466-Speed 12962.58 samples/sec Loss 6.7057 LearningRate 0.1111 Epoch: 11 Global Step: 29520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:26,036-Speed 13050.60 samples/sec Loss 6.6906 LearningRate 0.1111 Epoch: 11 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:27,629-Speed 12873.05 samples/sec Loss 6.8519 LearningRate 0.1111 Epoch: 11 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:29,207-Speed 12989.13 samples/sec Loss 6.6339 LearningRate 0.1110 Epoch: 11 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:30,771-Speed 13102.87 samples/sec Loss 6.7621 LearningRate 0.1110 Epoch: 11 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:32,338-Speed 13078.40 samples/sec Loss 6.6276 LearningRate 0.1110 Epoch: 11 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:33,929-Speed 12891.80 samples/sec Loss 6.6762 LearningRate 0.1110 Epoch: 11 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:35,530-Speed 12795.50 samples/sec Loss 6.7059 LearningRate 0.1109 Epoch: 11 Global Step: 29590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:37,091-Speed 13126.01 samples/sec Loss 6.6552 LearningRate 0.1109 Epoch: 11 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:38,680-Speed 12902.10 samples/sec Loss 6.6558 LearningRate 0.1109 Epoch: 11 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:40,260-Speed 12972.04 samples/sec Loss 6.7087 LearningRate 0.1108 Epoch: 11 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:41,830-Speed 13052.10 samples/sec Loss 6.7899 LearningRate 0.1108 Epoch: 11 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:43,407-Speed 13000.05 samples/sec Loss 6.7949 LearningRate 0.1108 Epoch: 11 Global Step: 29640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:44,994-Speed 12914.85 samples/sec Loss 6.7287 LearningRate 0.1107 Epoch: 11 Global Step: 29650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:46,551-Speed 13161.47 samples/sec Loss 6.7882 LearningRate 0.1107 Epoch: 11 Global Step: 29660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:48,130-Speed 12979.95 samples/sec Loss 6.7836 LearningRate 0.1107 Epoch: 11 Global Step: 29670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:49,711-Speed 12965.82 samples/sec Loss 6.7451 LearningRate 0.1106 Epoch: 11 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:51,265-Speed 13186.23 samples/sec Loss 6.8047 LearningRate 0.1106 Epoch: 11 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:52,858-Speed 12857.69 samples/sec Loss 6.6923 LearningRate 0.1106 Epoch: 11 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:54,437-Speed 12988.52 samples/sec Loss 6.6275 LearningRate 0.1106 Epoch: 11 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:08:56,014-Speed 12988.05 samples/sec Loss 6.7334 LearningRate 0.1105 Epoch: 11 Global Step: 29720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:57,605-Speed 12880.94 samples/sec Loss 6.7905 LearningRate 0.1105 Epoch: 11 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:08:59,186-Speed 12979.70 samples/sec Loss 6.7296 LearningRate 0.1105 Epoch: 11 Global Step: 29740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:00,760-Speed 13024.41 samples/sec Loss 6.6130 LearningRate 0.1104 Epoch: 11 Global Step: 29750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:02,362-Speed 12789.62 samples/sec Loss 6.6800 LearningRate 0.1104 Epoch: 11 Global Step: 29760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:03,913-Speed 13215.12 samples/sec Loss 6.6296 LearningRate 0.1104 Epoch: 11 Global Step: 29770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:05,481-Speed 13063.13 samples/sec Loss 6.7991 LearningRate 0.1103 Epoch: 11 Global Step: 29780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:07,048-Speed 13080.92 samples/sec Loss 6.7764 LearningRate 0.1103 Epoch: 11 Global Step: 29790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:08,645-Speed 12834.61 samples/sec Loss 6.7010 LearningRate 0.1103 Epoch: 11 Global Step: 29800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:10,226-Speed 12961.49 samples/sec Loss 6.6547 LearningRate 0.1102 Epoch: 11 Global Step: 29810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:11,828-Speed 12881.51 samples/sec Loss 6.7372 LearningRate 0.1102 Epoch: 11 Global Step: 29820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:13,405-Speed 12999.20 samples/sec Loss 6.7076 LearningRate 0.1102 Epoch: 11 Global Step: 29830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:09:14,979-Speed 13018.81 samples/sec Loss 6.7520 LearningRate 0.1102 Epoch: 11 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:16,582-Speed 12787.44 samples/sec Loss 6.7471 LearningRate 0.1101 Epoch: 11 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:18,182-Speed 12803.42 samples/sec Loss 6.5682 LearningRate 0.1101 Epoch: 11 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:19,786-Speed 12781.01 samples/sec Loss 6.6890 LearningRate 0.1101 Epoch: 11 Global Step: 29870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:21,341-Speed 13180.40 samples/sec Loss 6.7067 LearningRate 0.1100 Epoch: 11 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:22,929-Speed 12903.90 samples/sec Loss 6.5998 LearningRate 0.1100 Epoch: 11 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:24,510-Speed 12967.94 samples/sec Loss 6.8165 LearningRate 0.1100 Epoch: 11 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:26,070-Speed 13131.30 samples/sec Loss 6.6742 LearningRate 0.1099 Epoch: 11 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:27,667-Speed 12832.10 samples/sec Loss 6.7229 LearningRate 0.1099 Epoch: 11 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:29,235-Speed 13075.58 samples/sec Loss 6.6471 LearningRate 0.1099 Epoch: 11 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:30,797-Speed 13120.74 samples/sec Loss 6.6466 LearningRate 0.1098 Epoch: 11 Global Step: 29940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:09:32,386-Speed 12894.31 samples/sec Loss 6.7604 LearningRate 0.1098 Epoch: 11 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:33,965-Speed 13007.76 samples/sec Loss 6.7688 LearningRate 0.1098 Epoch: 11 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:35,558-Speed 12861.07 samples/sec Loss 6.6303 LearningRate 0.1097 Epoch: 11 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:37,157-Speed 12816.51 samples/sec Loss 6.7374 LearningRate 0.1097 Epoch: 11 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:38,738-Speed 12970.24 samples/sec Loss 6.7401 LearningRate 0.1097 Epoch: 11 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:09:40,315-Speed 12990.56 samples/sec Loss 6.7620 LearningRate 0.1097 Epoch: 11 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:10:03,667-[lfw][30000]XNorm: 11.839088 Training: 2022-01-14 15:10:03,668-[lfw][30000]Accuracy-Flip: 0.99583+-0.00281 Training: 2022-01-14 15:10:03,668-[lfw][30000]Accuracy-Highest: 0.99583 Training: 2022-01-14 15:10:29,905-[cfp_fp][30000]XNorm: 9.904480 Training: 2022-01-14 15:10:29,906-[cfp_fp][30000]Accuracy-Flip: 0.94829+-0.00873 Training: 2022-01-14 15:10:29,907-[cfp_fp][30000]Accuracy-Highest: 0.94829 Training: 2022-01-14 15:10:51,812-[agedb_30][30000]XNorm: 11.485990 Training: 2022-01-14 15:10:51,813-[agedb_30][30000]Accuracy-Flip: 0.95533+-0.00903 Training: 2022-01-14 15:10:51,814-[agedb_30][30000]Accuracy-Highest: 0.95800 Training: 2022-01-14 15:10:53,363-Speed 280.37 samples/sec Loss 6.7418 LearningRate 0.1096 Epoch: 11 Global Step: 30010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:10:54,965-Speed 12793.51 samples/sec Loss 6.7209 LearningRate 0.1096 Epoch: 11 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:10:56,543-Speed 12990.54 samples/sec Loss 6.7195 LearningRate 0.1096 Epoch: 11 Global Step: 30030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:10:58,127-Speed 12948.51 samples/sec Loss 6.7127 LearningRate 0.1095 Epoch: 11 Global Step: 30040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:10:59,703-Speed 13006.58 samples/sec Loss 6.6106 LearningRate 0.1095 Epoch: 11 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:11:01,312-Speed 12731.73 samples/sec Loss 6.8119 LearningRate 0.1095 Epoch: 11 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:11:02,930-Speed 12672.52 samples/sec Loss 6.7749 LearningRate 0.1094 Epoch: 11 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 15:11:04,479-Speed 13230.10 samples/sec Loss 6.7436 LearningRate 0.1094 Epoch: 11 Global Step: 30080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:11:06,036-Speed 13161.32 samples/sec Loss 6.6650 LearningRate 0.1094 Epoch: 11 Global Step: 30090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:11:07,617-Speed 12960.79 samples/sec Loss 6.6044 LearningRate 0.1093 Epoch: 11 Global Step: 30100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:11:09,201-Speed 12943.85 samples/sec Loss 6.6397 LearningRate 0.1093 Epoch: 11 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:11:10,759-Speed 13150.43 samples/sec Loss 6.7205 LearningRate 0.1093 Epoch: 11 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:11:12,339-Speed 12982.43 samples/sec Loss 6.6144 LearningRate 0.1093 Epoch: 11 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 15:11:13,878-Speed 13318.10 samples/sec Loss 6.7227 LearningRate 0.1092 Epoch: 11 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:15,447-Speed 13059.96 samples/sec Loss 6.7785 LearningRate 0.1092 Epoch: 11 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:17,019-Speed 13030.97 samples/sec Loss 6.6967 LearningRate 0.1092 Epoch: 11 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:18,606-Speed 12924.24 samples/sec Loss 6.7498 LearningRate 0.1091 Epoch: 11 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:20,169-Speed 13108.78 samples/sec Loss 6.7048 LearningRate 0.1091 Epoch: 11 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:11:21,751-Speed 12977.57 samples/sec Loss 6.7776 LearningRate 0.1091 Epoch: 11 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:23,322-Speed 13046.52 samples/sec Loss 6.6406 LearningRate 0.1090 Epoch: 11 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:24,898-Speed 13000.98 samples/sec Loss 6.7303 LearningRate 0.1090 Epoch: 11 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:26,451-Speed 13195.88 samples/sec Loss 6.7709 LearningRate 0.1090 Epoch: 11 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:28,041-Speed 12895.22 samples/sec Loss 6.7255 LearningRate 0.1089 Epoch: 11 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:29,617-Speed 12997.10 samples/sec Loss 6.7142 LearningRate 0.1089 Epoch: 11 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:31,211-Speed 12862.34 samples/sec Loss 6.6375 LearningRate 0.1089 Epoch: 11 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:32,784-Speed 13027.08 samples/sec Loss 6.7071 LearningRate 0.1089 Epoch: 11 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:34,337-Speed 13195.46 samples/sec Loss 6.6061 LearningRate 0.1088 Epoch: 11 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:35,921-Speed 12934.42 samples/sec Loss 6.6915 LearningRate 0.1088 Epoch: 11 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:37,513-Speed 12879.85 samples/sec Loss 6.6979 LearningRate 0.1088 Epoch: 11 Global Step: 30290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:11:39,089-Speed 13011.13 samples/sec Loss 6.7437 LearningRate 0.1087 Epoch: 11 Global Step: 30300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:11:40,658-Speed 13062.10 samples/sec Loss 6.7212 LearningRate 0.1087 Epoch: 11 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:42,226-Speed 13068.98 samples/sec Loss 6.7967 LearningRate 0.1087 Epoch: 11 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:43,803-Speed 12994.80 samples/sec Loss 6.7763 LearningRate 0.1086 Epoch: 11 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:11:45,391-Speed 12908.96 samples/sec Loss 6.6700 LearningRate 0.1086 Epoch: 11 Global Step: 30340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:00,325-Speed 1371.53 samples/sec Loss 6.5687 LearningRate 0.1086 Epoch: 12 Global Step: 30350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:02,145-Speed 11265.52 samples/sec Loss 5.7248 LearningRate 0.1086 Epoch: 12 Global Step: 30360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:03,723-Speed 12984.05 samples/sec Loss 5.7553 LearningRate 0.1085 Epoch: 12 Global Step: 30370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:05,328-Speed 12765.49 samples/sec Loss 5.7504 LearningRate 0.1085 Epoch: 12 Global Step: 30380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:06,902-Speed 13014.89 samples/sec Loss 5.8362 LearningRate 0.1085 Epoch: 12 Global Step: 30390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:08,480-Speed 12995.80 samples/sec Loss 5.9038 LearningRate 0.1084 Epoch: 12 Global Step: 30400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:10,036-Speed 13174.10 samples/sec Loss 5.8355 LearningRate 0.1084 Epoch: 12 Global Step: 30410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:11,611-Speed 13004.67 samples/sec Loss 5.8872 LearningRate 0.1084 Epoch: 12 Global Step: 30420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:13,189-Speed 12983.53 samples/sec Loss 5.9650 LearningRate 0.1083 Epoch: 12 Global Step: 30430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:14,779-Speed 12891.17 samples/sec Loss 5.8820 LearningRate 0.1083 Epoch: 12 Global Step: 30440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:16,342-Speed 13110.15 samples/sec Loss 5.8845 LearningRate 0.1083 Epoch: 12 Global Step: 30450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:17,903-Speed 13134.97 samples/sec Loss 5.9732 LearningRate 0.1082 Epoch: 12 Global Step: 30460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:19,529-Speed 12598.69 samples/sec Loss 5.9524 LearningRate 0.1082 Epoch: 12 Global Step: 30470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:21,092-Speed 13130.82 samples/sec Loss 5.9422 LearningRate 0.1082 Epoch: 12 Global Step: 30480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:12:22,666-Speed 13023.86 samples/sec Loss 5.9957 LearningRate 0.1082 Epoch: 12 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:24,263-Speed 12831.47 samples/sec Loss 5.9431 LearningRate 0.1081 Epoch: 12 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:25,819-Speed 13167.08 samples/sec Loss 6.0515 LearningRate 0.1081 Epoch: 12 Global Step: 30510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:27,394-Speed 13020.51 samples/sec Loss 6.0421 LearningRate 0.1081 Epoch: 12 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:28,983-Speed 12893.33 samples/sec Loss 5.9909 LearningRate 0.1080 Epoch: 12 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:30,562-Speed 13004.45 samples/sec Loss 6.0248 LearningRate 0.1080 Epoch: 12 Global Step: 30540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:32,176-Speed 12694.60 samples/sec Loss 6.0943 LearningRate 0.1080 Epoch: 12 Global Step: 30550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:33,786-Speed 12726.76 samples/sec Loss 5.9628 LearningRate 0.1079 Epoch: 12 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:35,377-Speed 12885.85 samples/sec Loss 6.0662 LearningRate 0.1079 Epoch: 12 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:36,943-Speed 13091.29 samples/sec Loss 6.0723 LearningRate 0.1079 Epoch: 12 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:12:38,525-Speed 12951.33 samples/sec Loss 6.1338 LearningRate 0.1078 Epoch: 12 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:40,106-Speed 12962.54 samples/sec Loss 6.1894 LearningRate 0.1078 Epoch: 12 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:41,679-Speed 13024.88 samples/sec Loss 6.1024 LearningRate 0.1078 Epoch: 12 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:43,282-Speed 12789.16 samples/sec Loss 6.0184 LearningRate 0.1078 Epoch: 12 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:44,825-Speed 13279.06 samples/sec Loss 6.0844 LearningRate 0.1077 Epoch: 12 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:46,396-Speed 13046.36 samples/sec Loss 6.2159 LearningRate 0.1077 Epoch: 12 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:47,951-Speed 13175.19 samples/sec Loss 6.1101 LearningRate 0.1077 Epoch: 12 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:49,602-Speed 12414.80 samples/sec Loss 6.0530 LearningRate 0.1076 Epoch: 12 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:51,200-Speed 12822.04 samples/sec Loss 6.1456 LearningRate 0.1076 Epoch: 12 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:52,843-Speed 12475.00 samples/sec Loss 6.2066 LearningRate 0.1076 Epoch: 12 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:54,395-Speed 13205.69 samples/sec Loss 6.2021 LearningRate 0.1075 Epoch: 12 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:55,956-Speed 13130.58 samples/sec Loss 6.2447 LearningRate 0.1075 Epoch: 12 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:57,507-Speed 13215.71 samples/sec Loss 6.1232 LearningRate 0.1075 Epoch: 12 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:12:59,072-Speed 13093.22 samples/sec Loss 6.2177 LearningRate 0.1074 Epoch: 12 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:00,633-Speed 13129.35 samples/sec Loss 6.2415 LearningRate 0.1074 Epoch: 12 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:02,197-Speed 13099.84 samples/sec Loss 6.1770 LearningRate 0.1074 Epoch: 12 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:03,753-Speed 13166.99 samples/sec Loss 6.1995 LearningRate 0.1074 Epoch: 12 Global Step: 30750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:05,322-Speed 13061.09 samples/sec Loss 6.3176 LearningRate 0.1073 Epoch: 12 Global Step: 30760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:06,883-Speed 13138.65 samples/sec Loss 6.2433 LearningRate 0.1073 Epoch: 12 Global Step: 30770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:08,464-Speed 12964.73 samples/sec Loss 6.2344 LearningRate 0.1073 Epoch: 12 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:10,036-Speed 13036.82 samples/sec Loss 6.2281 LearningRate 0.1072 Epoch: 12 Global Step: 30790 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:13:11,600-Speed 13098.74 samples/sec Loss 6.3451 LearningRate 0.1072 Epoch: 12 Global Step: 30800 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:13:13,180-Speed 12974.94 samples/sec Loss 6.2375 LearningRate 0.1072 Epoch: 12 Global Step: 30810 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:13:14,743-Speed 13118.72 samples/sec Loss 6.2226 LearningRate 0.1071 Epoch: 12 Global Step: 30820 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:13:16,307-Speed 13095.79 samples/sec Loss 6.3364 LearningRate 0.1071 Epoch: 12 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:17,900-Speed 12866.38 samples/sec Loss 6.3778 LearningRate 0.1071 Epoch: 12 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:19,469-Speed 13063.88 samples/sec Loss 6.3443 LearningRate 0.1071 Epoch: 12 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:21,070-Speed 12801.11 samples/sec Loss 6.2725 LearningRate 0.1070 Epoch: 12 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:22,642-Speed 13040.79 samples/sec Loss 6.3110 LearningRate 0.1070 Epoch: 12 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:24,203-Speed 13123.00 samples/sec Loss 6.2165 LearningRate 0.1070 Epoch: 12 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:25,799-Speed 12847.74 samples/sec Loss 6.3086 LearningRate 0.1069 Epoch: 12 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:27,365-Speed 13088.97 samples/sec Loss 6.2648 LearningRate 0.1069 Epoch: 12 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:28,944-Speed 12973.29 samples/sec Loss 6.3846 LearningRate 0.1069 Epoch: 12 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:30,482-Speed 13328.18 samples/sec Loss 6.3308 LearningRate 0.1068 Epoch: 12 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:32,046-Speed 13101.60 samples/sec Loss 6.3825 LearningRate 0.1068 Epoch: 12 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:33,630-Speed 12942.38 samples/sec Loss 6.4311 LearningRate 0.1068 Epoch: 12 Global Step: 30940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:35,172-Speed 13290.13 samples/sec Loss 6.4416 LearningRate 0.1067 Epoch: 12 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:36,751-Speed 12985.87 samples/sec Loss 6.3380 LearningRate 0.1067 Epoch: 12 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:38,316-Speed 13086.34 samples/sec Loss 6.4038 LearningRate 0.1067 Epoch: 12 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:39,876-Speed 13141.90 samples/sec Loss 6.4191 LearningRate 0.1067 Epoch: 12 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:41,447-Speed 13045.96 samples/sec Loss 6.4394 LearningRate 0.1066 Epoch: 12 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:43,011-Speed 13104.46 samples/sec Loss 6.3312 LearningRate 0.1066 Epoch: 12 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:44,575-Speed 13104.06 samples/sec Loss 6.3806 LearningRate 0.1066 Epoch: 12 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:46,140-Speed 13092.63 samples/sec Loss 6.3846 LearningRate 0.1065 Epoch: 12 Global Step: 31020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:47,701-Speed 13138.34 samples/sec Loss 6.2734 LearningRate 0.1065 Epoch: 12 Global Step: 31030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:49,272-Speed 13041.21 samples/sec Loss 6.4102 LearningRate 0.1065 Epoch: 12 Global Step: 31040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:13:50,848-Speed 13001.51 samples/sec Loss 6.3414 LearningRate 0.1064 Epoch: 12 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:52,435-Speed 12914.57 samples/sec Loss 6.3567 LearningRate 0.1064 Epoch: 12 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:54,024-Speed 12896.78 samples/sec Loss 6.4209 LearningRate 0.1064 Epoch: 12 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:55,600-Speed 13000.25 samples/sec Loss 6.3589 LearningRate 0.1064 Epoch: 12 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:57,177-Speed 12999.46 samples/sec Loss 6.3515 LearningRate 0.1063 Epoch: 12 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:13:58,741-Speed 13104.79 samples/sec Loss 6.4142 LearningRate 0.1063 Epoch: 12 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:00,299-Speed 13153.69 samples/sec Loss 6.3677 LearningRate 0.1063 Epoch: 12 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:01,870-Speed 13039.14 samples/sec Loss 6.3960 LearningRate 0.1062 Epoch: 12 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:03,508-Speed 12514.30 samples/sec Loss 6.4881 LearningRate 0.1062 Epoch: 12 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:05,065-Speed 13159.53 samples/sec Loss 6.3751 LearningRate 0.1062 Epoch: 12 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:06,630-Speed 13097.41 samples/sec Loss 6.4424 LearningRate 0.1061 Epoch: 12 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:08,200-Speed 13052.85 samples/sec Loss 6.4443 LearningRate 0.1061 Epoch: 12 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:09,771-Speed 13046.07 samples/sec Loss 6.4606 LearningRate 0.1061 Epoch: 12 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:11,332-Speed 13130.05 samples/sec Loss 6.4685 LearningRate 0.1061 Epoch: 12 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:12,923-Speed 12883.82 samples/sec Loss 6.4803 LearningRate 0.1060 Epoch: 12 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:14,497-Speed 13086.29 samples/sec Loss 6.3753 LearningRate 0.1060 Epoch: 12 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:16,048-Speed 13219.48 samples/sec Loss 6.4274 LearningRate 0.1060 Epoch: 12 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:17,620-Speed 13067.42 samples/sec Loss 6.3590 LearningRate 0.1059 Epoch: 12 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:19,197-Speed 12994.70 samples/sec Loss 6.4815 LearningRate 0.1059 Epoch: 12 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:20,732-Speed 13347.27 samples/sec Loss 6.4736 LearningRate 0.1059 Epoch: 12 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:22,296-Speed 13102.98 samples/sec Loss 6.5115 LearningRate 0.1058 Epoch: 12 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:23,872-Speed 13002.17 samples/sec Loss 6.4532 LearningRate 0.1058 Epoch: 12 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:25,437-Speed 13098.65 samples/sec Loss 6.3975 LearningRate 0.1058 Epoch: 12 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:27,012-Speed 13011.38 samples/sec Loss 6.4850 LearningRate 0.1057 Epoch: 12 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:28,589-Speed 12993.90 samples/sec Loss 6.4420 LearningRate 0.1057 Epoch: 12 Global Step: 31290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:30,169-Speed 12970.86 samples/sec Loss 6.5062 LearningRate 0.1057 Epoch: 12 Global Step: 31300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:31,728-Speed 13143.79 samples/sec Loss 6.5037 LearningRate 0.1057 Epoch: 12 Global Step: 31310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:33,283-Speed 13182.19 samples/sec Loss 6.4761 LearningRate 0.1056 Epoch: 12 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:34,851-Speed 13069.25 samples/sec Loss 6.4544 LearningRate 0.1056 Epoch: 12 Global Step: 31330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:36,411-Speed 13134.28 samples/sec Loss 6.3538 LearningRate 0.1056 Epoch: 12 Global Step: 31340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:38,012-Speed 12808.07 samples/sec Loss 6.4454 LearningRate 0.1055 Epoch: 12 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:39,575-Speed 13110.57 samples/sec Loss 6.5495 LearningRate 0.1055 Epoch: 12 Global Step: 31360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:41,161-Speed 12948.27 samples/sec Loss 6.5627 LearningRate 0.1055 Epoch: 12 Global Step: 31370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:42,762-Speed 12807.27 samples/sec Loss 6.4643 LearningRate 0.1054 Epoch: 12 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:44,332-Speed 13046.89 samples/sec Loss 6.5317 LearningRate 0.1054 Epoch: 12 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:45,903-Speed 13044.43 samples/sec Loss 6.5129 LearningRate 0.1054 Epoch: 12 Global Step: 31400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:47,495-Speed 12881.58 samples/sec Loss 6.4679 LearningRate 0.1054 Epoch: 12 Global Step: 31410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:49,080-Speed 12931.68 samples/sec Loss 6.4420 LearningRate 0.1053 Epoch: 12 Global Step: 31420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:50,641-Speed 13121.50 samples/sec Loss 6.4725 LearningRate 0.1053 Epoch: 12 Global Step: 31430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:52,226-Speed 12934.01 samples/sec Loss 6.5110 LearningRate 0.1053 Epoch: 12 Global Step: 31440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:53,803-Speed 12994.15 samples/sec Loss 6.4885 LearningRate 0.1052 Epoch: 12 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:14:55,360-Speed 13162.94 samples/sec Loss 6.5309 LearningRate 0.1052 Epoch: 12 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:56,944-Speed 12938.99 samples/sec Loss 6.4413 LearningRate 0.1052 Epoch: 12 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:14:58,500-Speed 13169.75 samples/sec Loss 6.5477 LearningRate 0.1051 Epoch: 12 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:00,073-Speed 13032.45 samples/sec Loss 6.4596 LearningRate 0.1051 Epoch: 12 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:01,643-Speed 13068.28 samples/sec Loss 6.5127 LearningRate 0.1051 Epoch: 12 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:03,209-Speed 13085.44 samples/sec Loss 6.5506 LearningRate 0.1051 Epoch: 12 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:04,775-Speed 13088.06 samples/sec Loss 6.5041 LearningRate 0.1050 Epoch: 12 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:06,352-Speed 12996.78 samples/sec Loss 6.5234 LearningRate 0.1050 Epoch: 12 Global Step: 31530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:07,924-Speed 13039.24 samples/sec Loss 6.5431 LearningRate 0.1050 Epoch: 12 Global Step: 31540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:09,487-Speed 13112.66 samples/sec Loss 6.5137 LearningRate 0.1049 Epoch: 12 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:11,049-Speed 13113.72 samples/sec Loss 6.5702 LearningRate 0.1049 Epoch: 12 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:12,653-Speed 12783.96 samples/sec Loss 6.4887 LearningRate 0.1049 Epoch: 12 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:14,219-Speed 13079.09 samples/sec Loss 6.4770 LearningRate 0.1048 Epoch: 12 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:15,797-Speed 12988.81 samples/sec Loss 6.5033 LearningRate 0.1048 Epoch: 12 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:17,377-Speed 12975.45 samples/sec Loss 6.5431 LearningRate 0.1048 Epoch: 12 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:18,936-Speed 13136.15 samples/sec Loss 6.5612 LearningRate 0.1048 Epoch: 12 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:20,481-Speed 13267.54 samples/sec Loss 6.5189 LearningRate 0.1047 Epoch: 12 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:22,041-Speed 13140.14 samples/sec Loss 6.4778 LearningRate 0.1047 Epoch: 12 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:23,604-Speed 13112.70 samples/sec Loss 6.4598 LearningRate 0.1047 Epoch: 12 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:25,165-Speed 13124.37 samples/sec Loss 6.5494 LearningRate 0.1046 Epoch: 12 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:26,725-Speed 13140.92 samples/sec Loss 6.4907 LearningRate 0.1046 Epoch: 12 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:28,271-Speed 13255.37 samples/sec Loss 6.4725 LearningRate 0.1046 Epoch: 12 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:29,850-Speed 12977.29 samples/sec Loss 6.5323 LearningRate 0.1045 Epoch: 12 Global Step: 31680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:31,435-Speed 12929.30 samples/sec Loss 6.4671 LearningRate 0.1045 Epoch: 12 Global Step: 31690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:32,995-Speed 13140.71 samples/sec Loss 6.4838 LearningRate 0.1045 Epoch: 12 Global Step: 31700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:34,562-Speed 13070.23 samples/sec Loss 6.5979 LearningRate 0.1045 Epoch: 12 Global Step: 31710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:36,138-Speed 13005.39 samples/sec Loss 6.5002 LearningRate 0.1044 Epoch: 12 Global Step: 31720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:37,701-Speed 13120.59 samples/sec Loss 6.6175 LearningRate 0.1044 Epoch: 12 Global Step: 31730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:39,276-Speed 13009.73 samples/sec Loss 6.5703 LearningRate 0.1044 Epoch: 12 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:40,823-Speed 13244.86 samples/sec Loss 6.5786 LearningRate 0.1043 Epoch: 12 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:42,409-Speed 12923.71 samples/sec Loss 6.5206 LearningRate 0.1043 Epoch: 12 Global Step: 31760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:43,969-Speed 13142.17 samples/sec Loss 6.5405 LearningRate 0.1043 Epoch: 12 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:45,559-Speed 12892.03 samples/sec Loss 6.5697 LearningRate 0.1042 Epoch: 12 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:47,109-Speed 13220.46 samples/sec Loss 6.4812 LearningRate 0.1042 Epoch: 12 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:48,678-Speed 13061.18 samples/sec Loss 6.5395 LearningRate 0.1042 Epoch: 12 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:50,217-Speed 13319.51 samples/sec Loss 6.4707 LearningRate 0.1041 Epoch: 12 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:15:51,782-Speed 13094.00 samples/sec Loss 6.5237 LearningRate 0.1041 Epoch: 12 Global Step: 31820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:53,325-Speed 13281.35 samples/sec Loss 6.5173 LearningRate 0.1041 Epoch: 12 Global Step: 31830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:54,886-Speed 13128.21 samples/sec Loss 6.4644 LearningRate 0.1041 Epoch: 12 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:56,445-Speed 13145.78 samples/sec Loss 6.5333 LearningRate 0.1040 Epoch: 12 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:58,007-Speed 13148.57 samples/sec Loss 6.4912 LearningRate 0.1040 Epoch: 12 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:15:59,601-Speed 12856.55 samples/sec Loss 6.4468 LearningRate 0.1040 Epoch: 12 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:01,145-Speed 13281.16 samples/sec Loss 6.5462 LearningRate 0.1039 Epoch: 12 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:02,711-Speed 13084.44 samples/sec Loss 6.4349 LearningRate 0.1039 Epoch: 12 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:04,301-Speed 12894.80 samples/sec Loss 6.6395 LearningRate 0.1039 Epoch: 12 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:05,856-Speed 13176.92 samples/sec Loss 6.6095 LearningRate 0.1038 Epoch: 12 Global Step: 31910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:07,446-Speed 12894.01 samples/sec Loss 6.5290 LearningRate 0.1038 Epoch: 12 Global Step: 31920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:09,003-Speed 13159.50 samples/sec Loss 6.5960 LearningRate 0.1038 Epoch: 12 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:10,561-Speed 13152.19 samples/sec Loss 6.5795 LearningRate 0.1038 Epoch: 12 Global Step: 31940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:12,105-Speed 13291.62 samples/sec Loss 6.5770 LearningRate 0.1037 Epoch: 12 Global Step: 31950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:13,660-Speed 13175.29 samples/sec Loss 6.5566 LearningRate 0.1037 Epoch: 12 Global Step: 31960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:15,270-Speed 12724.17 samples/sec Loss 6.5093 LearningRate 0.1037 Epoch: 12 Global Step: 31970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:16,833-Speed 13110.95 samples/sec Loss 6.5220 LearningRate 0.1036 Epoch: 12 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:18,371-Speed 13334.97 samples/sec Loss 6.5307 LearningRate 0.1036 Epoch: 12 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:19,925-Speed 13185.75 samples/sec Loss 6.4853 LearningRate 0.1036 Epoch: 12 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:21,503-Speed 12988.07 samples/sec Loss 6.5597 LearningRate 0.1035 Epoch: 12 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:23,075-Speed 13039.77 samples/sec Loss 6.5863 LearningRate 0.1035 Epoch: 12 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:24,640-Speed 13089.94 samples/sec Loss 6.5776 LearningRate 0.1035 Epoch: 12 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:26,212-Speed 13040.43 samples/sec Loss 6.5069 LearningRate 0.1035 Epoch: 12 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:27,770-Speed 13158.28 samples/sec Loss 6.5552 LearningRate 0.1034 Epoch: 12 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:29,339-Speed 13060.76 samples/sec Loss 6.4640 LearningRate 0.1034 Epoch: 12 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:30,909-Speed 13050.35 samples/sec Loss 6.5932 LearningRate 0.1034 Epoch: 12 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:32,502-Speed 12884.00 samples/sec Loss 6.5282 LearningRate 0.1033 Epoch: 12 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:34,044-Speed 13287.00 samples/sec Loss 6.5748 LearningRate 0.1033 Epoch: 12 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:16:35,606-Speed 13123.94 samples/sec Loss 6.4942 LearningRate 0.1033 Epoch: 12 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:37,198-Speed 12892.45 samples/sec Loss 6.5171 LearningRate 0.1033 Epoch: 12 Global Step: 32110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:38,758-Speed 13139.54 samples/sec Loss 6.4871 LearningRate 0.1032 Epoch: 12 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:40,337-Speed 12977.19 samples/sec Loss 6.5120 LearningRate 0.1032 Epoch: 12 Global Step: 32130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:41,899-Speed 13115.28 samples/sec Loss 6.5326 LearningRate 0.1032 Epoch: 12 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:43,514-Speed 12691.65 samples/sec Loss 6.4975 LearningRate 0.1031 Epoch: 12 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:45,095-Speed 12957.80 samples/sec Loss 6.5719 LearningRate 0.1031 Epoch: 12 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:46,649-Speed 13190.23 samples/sec Loss 6.5261 LearningRate 0.1031 Epoch: 12 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:48,208-Speed 13148.84 samples/sec Loss 6.6207 LearningRate 0.1030 Epoch: 12 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:49,806-Speed 12825.23 samples/sec Loss 6.5074 LearningRate 0.1030 Epoch: 12 Global Step: 32190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:51,378-Speed 13039.22 samples/sec Loss 6.5472 LearningRate 0.1030 Epoch: 12 Global Step: 32200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:52,956-Speed 12990.36 samples/sec Loss 6.5414 LearningRate 0.1030 Epoch: 12 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:54,543-Speed 12910.30 samples/sec Loss 6.5718 LearningRate 0.1029 Epoch: 12 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:56,108-Speed 13095.16 samples/sec Loss 6.4495 LearningRate 0.1029 Epoch: 12 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:57,673-Speed 13098.25 samples/sec Loss 6.5442 LearningRate 0.1029 Epoch: 12 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:16:59,211-Speed 13363.96 samples/sec Loss 6.5298 LearningRate 0.1028 Epoch: 12 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:00,804-Speed 12867.03 samples/sec Loss 6.5495 LearningRate 0.1028 Epoch: 12 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:02,391-Speed 12911.59 samples/sec Loss 6.5388 LearningRate 0.1028 Epoch: 12 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:03,958-Speed 13083.33 samples/sec Loss 6.5338 LearningRate 0.1027 Epoch: 12 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:05,494-Speed 13339.88 samples/sec Loss 6.5699 LearningRate 0.1027 Epoch: 12 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:07,074-Speed 12973.49 samples/sec Loss 6.5474 LearningRate 0.1027 Epoch: 12 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:08,633-Speed 13140.34 samples/sec Loss 6.5213 LearningRate 0.1027 Epoch: 12 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:10,198-Speed 13099.91 samples/sec Loss 6.5907 LearningRate 0.1026 Epoch: 12 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:11,799-Speed 12797.15 samples/sec Loss 6.4839 LearningRate 0.1026 Epoch: 12 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:13,344-Speed 13268.38 samples/sec Loss 6.5656 LearningRate 0.1026 Epoch: 12 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:14,962-Speed 12663.91 samples/sec Loss 6.4865 LearningRate 0.1025 Epoch: 12 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:16,515-Speed 13196.63 samples/sec Loss 6.5376 LearningRate 0.1025 Epoch: 12 Global Step: 32360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:18,068-Speed 13201.44 samples/sec Loss 6.6440 LearningRate 0.1025 Epoch: 12 Global Step: 32370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:19,641-Speed 13020.91 samples/sec Loss 6.5638 LearningRate 0.1024 Epoch: 12 Global Step: 32380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:21,201-Speed 13136.72 samples/sec Loss 6.4796 LearningRate 0.1024 Epoch: 12 Global Step: 32390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:22,761-Speed 13145.99 samples/sec Loss 6.5880 LearningRate 0.1024 Epoch: 12 Global Step: 32400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:24,334-Speed 13023.42 samples/sec Loss 6.6653 LearningRate 0.1024 Epoch: 12 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:25,899-Speed 13093.04 samples/sec Loss 6.5428 LearningRate 0.1023 Epoch: 12 Global Step: 32420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:27,512-Speed 12711.60 samples/sec Loss 6.5012 LearningRate 0.1023 Epoch: 12 Global Step: 32430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:29,079-Speed 13077.58 samples/sec Loss 6.5190 LearningRate 0.1023 Epoch: 12 Global Step: 32440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:30,702-Speed 12621.76 samples/sec Loss 6.5884 LearningRate 0.1022 Epoch: 12 Global Step: 32450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:32,287-Speed 12930.50 samples/sec Loss 6.4563 LearningRate 0.1022 Epoch: 12 Global Step: 32460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:33,862-Speed 13014.79 samples/sec Loss 6.5306 LearningRate 0.1022 Epoch: 12 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:35,433-Speed 13047.38 samples/sec Loss 6.5435 LearningRate 0.1021 Epoch: 12 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:36,991-Speed 13154.49 samples/sec Loss 6.5903 LearningRate 0.1021 Epoch: 12 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:38,544-Speed 13198.91 samples/sec Loss 6.4303 LearningRate 0.1021 Epoch: 12 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:40,132-Speed 12907.79 samples/sec Loss 6.5918 LearningRate 0.1021 Epoch: 12 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:41,697-Speed 13093.55 samples/sec Loss 6.6213 LearningRate 0.1020 Epoch: 12 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:43,297-Speed 12808.90 samples/sec Loss 6.5457 LearningRate 0.1020 Epoch: 12 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:44,876-Speed 12979.30 samples/sec Loss 6.5401 LearningRate 0.1020 Epoch: 12 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:46,465-Speed 12896.00 samples/sec Loss 6.4685 LearningRate 0.1019 Epoch: 12 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:48,079-Speed 12703.98 samples/sec Loss 6.4568 LearningRate 0.1019 Epoch: 12 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:49,656-Speed 12992.53 samples/sec Loss 6.5271 LearningRate 0.1019 Epoch: 12 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:17:51,222-Speed 13082.97 samples/sec Loss 6.6046 LearningRate 0.1018 Epoch: 12 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:52,810-Speed 12914.16 samples/sec Loss 6.5276 LearningRate 0.1018 Epoch: 12 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:54,374-Speed 13101.17 samples/sec Loss 6.5755 LearningRate 0.1018 Epoch: 12 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:55,917-Speed 13282.81 samples/sec Loss 6.6369 LearningRate 0.1018 Epoch: 12 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:57,462-Speed 13266.57 samples/sec Loss 6.4802 LearningRate 0.1017 Epoch: 12 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:17:59,008-Speed 13256.33 samples/sec Loss 6.4433 LearningRate 0.1017 Epoch: 12 Global Step: 32630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:00,569-Speed 13126.53 samples/sec Loss 6.6073 LearningRate 0.1017 Epoch: 12 Global Step: 32640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:02,156-Speed 12911.20 samples/sec Loss 6.5819 LearningRate 0.1016 Epoch: 12 Global Step: 32650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:03,717-Speed 13130.49 samples/sec Loss 6.5620 LearningRate 0.1016 Epoch: 12 Global Step: 32660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:05,266-Speed 13229.55 samples/sec Loss 6.5950 LearningRate 0.1016 Epoch: 12 Global Step: 32670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:06,819-Speed 13193.78 samples/sec Loss 6.5021 LearningRate 0.1016 Epoch: 12 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:08,397-Speed 12992.60 samples/sec Loss 6.5782 LearningRate 0.1015 Epoch: 12 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:09,970-Speed 13030.22 samples/sec Loss 6.5450 LearningRate 0.1015 Epoch: 12 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:11,541-Speed 13039.98 samples/sec Loss 6.5323 LearningRate 0.1015 Epoch: 12 Global Step: 32710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:13,085-Speed 13278.30 samples/sec Loss 6.4740 LearningRate 0.1014 Epoch: 12 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:14,662-Speed 12992.15 samples/sec Loss 6.5070 LearningRate 0.1014 Epoch: 12 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:18:16,209-Speed 13249.18 samples/sec Loss 6.5092 LearningRate 0.1014 Epoch: 12 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:18:17,763-Speed 13193.86 samples/sec Loss 6.5674 LearningRate 0.1013 Epoch: 12 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:18:19,313-Speed 13212.45 samples/sec Loss 6.5608 LearningRate 0.1013 Epoch: 12 Global Step: 32760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:20,871-Speed 13155.86 samples/sec Loss 6.6400 LearningRate 0.1013 Epoch: 12 Global Step: 32770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:22,436-Speed 13093.37 samples/sec Loss 6.4721 LearningRate 0.1013 Epoch: 12 Global Step: 32780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:23,990-Speed 13213.02 samples/sec Loss 6.5043 LearningRate 0.1012 Epoch: 12 Global Step: 32790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:25,542-Speed 13197.39 samples/sec Loss 6.5968 LearningRate 0.1012 Epoch: 12 Global Step: 32800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:27,106-Speed 13114.42 samples/sec Loss 6.5718 LearningRate 0.1012 Epoch: 12 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:28,671-Speed 13084.34 samples/sec Loss 6.6205 LearningRate 0.1011 Epoch: 12 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:30,244-Speed 13050.29 samples/sec Loss 6.5227 LearningRate 0.1011 Epoch: 12 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:31,833-Speed 12899.38 samples/sec Loss 6.6115 LearningRate 0.1011 Epoch: 12 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:33,387-Speed 13195.19 samples/sec Loss 6.5440 LearningRate 0.1010 Epoch: 12 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:34,949-Speed 13119.06 samples/sec Loss 6.4730 LearningRate 0.1010 Epoch: 12 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:36,518-Speed 13057.04 samples/sec Loss 6.6420 LearningRate 0.1010 Epoch: 12 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:51,276-Speed 1387.93 samples/sec Loss 6.2980 LearningRate 0.1010 Epoch: 13 Global Step: 32880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:52,856-Speed 12979.08 samples/sec Loss 5.7061 LearningRate 0.1009 Epoch: 13 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:54,447-Speed 12881.23 samples/sec Loss 5.6224 LearningRate 0.1009 Epoch: 13 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:56,036-Speed 12890.20 samples/sec Loss 5.6811 LearningRate 0.1009 Epoch: 13 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:57,707-Speed 12274.59 samples/sec Loss 5.6034 LearningRate 0.1008 Epoch: 13 Global Step: 32920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:18:59,291-Speed 12999.25 samples/sec Loss 5.7309 LearningRate 0.1008 Epoch: 13 Global Step: 32930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:00,941-Speed 12414.86 samples/sec Loss 5.6888 LearningRate 0.1008 Epoch: 13 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:02,507-Speed 13093.81 samples/sec Loss 5.6974 LearningRate 0.1008 Epoch: 13 Global Step: 32950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:04,074-Speed 13074.90 samples/sec Loss 5.7639 LearningRate 0.1007 Epoch: 13 Global Step: 32960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:05,710-Speed 12524.79 samples/sec Loss 5.7512 LearningRate 0.1007 Epoch: 13 Global Step: 32970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:07,256-Speed 13263.65 samples/sec Loss 5.7975 LearningRate 0.1007 Epoch: 13 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:08,844-Speed 12906.93 samples/sec Loss 5.7852 LearningRate 0.1006 Epoch: 13 Global Step: 32990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:10,434-Speed 12886.46 samples/sec Loss 5.8017 LearningRate 0.1006 Epoch: 13 Global Step: 33000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:12,041-Speed 12749.44 samples/sec Loss 5.7579 LearningRate 0.1006 Epoch: 13 Global Step: 33010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:13,619-Speed 12990.12 samples/sec Loss 5.8266 LearningRate 0.1005 Epoch: 13 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:15,220-Speed 12801.09 samples/sec Loss 5.8248 LearningRate 0.1005 Epoch: 13 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:16,789-Speed 13059.56 samples/sec Loss 5.7351 LearningRate 0.1005 Epoch: 13 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:18,363-Speed 13020.08 samples/sec Loss 5.8688 LearningRate 0.1005 Epoch: 13 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:19,943-Speed 12967.16 samples/sec Loss 5.9541 LearningRate 0.1004 Epoch: 13 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:19:21,531-Speed 12909.10 samples/sec Loss 5.8665 LearningRate 0.1004 Epoch: 13 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:23,114-Speed 12945.75 samples/sec Loss 5.8482 LearningRate 0.1004 Epoch: 13 Global Step: 33080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:24,702-Speed 12905.23 samples/sec Loss 5.9068 LearningRate 0.1003 Epoch: 13 Global Step: 33090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:26,298-Speed 12851.94 samples/sec Loss 5.9610 LearningRate 0.1003 Epoch: 13 Global Step: 33100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:27,886-Speed 12910.24 samples/sec Loss 5.8117 LearningRate 0.1003 Epoch: 13 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:29,470-Speed 12936.61 samples/sec Loss 5.9399 LearningRate 0.1003 Epoch: 13 Global Step: 33120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:31,052-Speed 12958.62 samples/sec Loss 5.8380 LearningRate 0.1002 Epoch: 13 Global Step: 33130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:32,619-Speed 13112.74 samples/sec Loss 5.9818 LearningRate 0.1002 Epoch: 13 Global Step: 33140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:34,194-Speed 13006.86 samples/sec Loss 5.8698 LearningRate 0.1002 Epoch: 13 Global Step: 33150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:35,764-Speed 13054.37 samples/sec Loss 5.9859 LearningRate 0.1001 Epoch: 13 Global Step: 33160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:37,345-Speed 12964.31 samples/sec Loss 5.9330 LearningRate 0.1001 Epoch: 13 Global Step: 33170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:38,916-Speed 13050.01 samples/sec Loss 5.9471 LearningRate 0.1001 Epoch: 13 Global Step: 33180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:40,538-Speed 12632.90 samples/sec Loss 6.0511 LearningRate 0.1000 Epoch: 13 Global Step: 33190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:42,130-Speed 12879.28 samples/sec Loss 6.0651 LearningRate 0.1000 Epoch: 13 Global Step: 33200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:43,716-Speed 12919.53 samples/sec Loss 5.9823 LearningRate 0.1000 Epoch: 13 Global Step: 33210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:45,291-Speed 13011.55 samples/sec Loss 5.9402 LearningRate 0.1000 Epoch: 13 Global Step: 33220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:46,869-Speed 12987.58 samples/sec Loss 6.0274 LearningRate 0.0999 Epoch: 13 Global Step: 33230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:48,468-Speed 12820.84 samples/sec Loss 6.0624 LearningRate 0.0999 Epoch: 13 Global Step: 33240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:50,064-Speed 12844.32 samples/sec Loss 6.0041 LearningRate 0.0999 Epoch: 13 Global Step: 33250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:51,644-Speed 12967.37 samples/sec Loss 6.0545 LearningRate 0.0998 Epoch: 13 Global Step: 33260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:53,202-Speed 13153.82 samples/sec Loss 5.9548 LearningRate 0.0998 Epoch: 13 Global Step: 33270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:54,790-Speed 12901.71 samples/sec Loss 6.0006 LearningRate 0.0998 Epoch: 13 Global Step: 33280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:19:56,339-Speed 13227.09 samples/sec Loss 6.0309 LearningRate 0.0998 Epoch: 13 Global Step: 33290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:57,911-Speed 13045.75 samples/sec Loss 6.0644 LearningRate 0.0997 Epoch: 13 Global Step: 33300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:19:59,491-Speed 12975.06 samples/sec Loss 6.1770 LearningRate 0.0997 Epoch: 13 Global Step: 33310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:01,069-Speed 12982.79 samples/sec Loss 6.1309 LearningRate 0.0997 Epoch: 13 Global Step: 33320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:02,668-Speed 12829.21 samples/sec Loss 6.0781 LearningRate 0.0996 Epoch: 13 Global Step: 33330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:04,246-Speed 12980.44 samples/sec Loss 5.9491 LearningRate 0.0996 Epoch: 13 Global Step: 33340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:05,846-Speed 12809.28 samples/sec Loss 6.0693 LearningRate 0.0996 Epoch: 13 Global Step: 33350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:07,426-Speed 12971.03 samples/sec Loss 6.1693 LearningRate 0.0995 Epoch: 13 Global Step: 33360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:08,988-Speed 13153.79 samples/sec Loss 6.0825 LearningRate 0.0995 Epoch: 13 Global Step: 33370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:10,574-Speed 12917.25 samples/sec Loss 6.1077 LearningRate 0.0995 Epoch: 13 Global Step: 33380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:20:12,178-Speed 12775.43 samples/sec Loss 6.0288 LearningRate 0.0995 Epoch: 13 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:13,748-Speed 13058.07 samples/sec Loss 6.1135 LearningRate 0.0994 Epoch: 13 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:15,319-Speed 13043.71 samples/sec Loss 6.1467 LearningRate 0.0994 Epoch: 13 Global Step: 33410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:16,895-Speed 13003.53 samples/sec Loss 6.1279 LearningRate 0.0994 Epoch: 13 Global Step: 33420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:18,491-Speed 12848.25 samples/sec Loss 6.1659 LearningRate 0.0993 Epoch: 13 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:20,076-Speed 12925.67 samples/sec Loss 6.1606 LearningRate 0.0993 Epoch: 13 Global Step: 33440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:21,683-Speed 12749.15 samples/sec Loss 6.1028 LearningRate 0.0993 Epoch: 13 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:23,254-Speed 13053.80 samples/sec Loss 6.1850 LearningRate 0.0993 Epoch: 13 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:24,838-Speed 12933.55 samples/sec Loss 6.2409 LearningRate 0.0992 Epoch: 13 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:26,439-Speed 12805.34 samples/sec Loss 6.1420 LearningRate 0.0992 Epoch: 13 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:28,009-Speed 13057.40 samples/sec Loss 6.1376 LearningRate 0.0992 Epoch: 13 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:29,595-Speed 12944.81 samples/sec Loss 6.1983 LearningRate 0.0991 Epoch: 13 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:31,162-Speed 13072.95 samples/sec Loss 6.2127 LearningRate 0.0991 Epoch: 13 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:32,739-Speed 13004.33 samples/sec Loss 6.1677 LearningRate 0.0991 Epoch: 13 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:34,307-Speed 13070.03 samples/sec Loss 6.1987 LearningRate 0.0990 Epoch: 13 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:35,888-Speed 12958.17 samples/sec Loss 6.2102 LearningRate 0.0990 Epoch: 13 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:37,515-Speed 12600.01 samples/sec Loss 6.2941 LearningRate 0.0990 Epoch: 13 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:39,105-Speed 12885.66 samples/sec Loss 6.1921 LearningRate 0.0990 Epoch: 13 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:40,666-Speed 13136.58 samples/sec Loss 6.1505 LearningRate 0.0989 Epoch: 13 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:42,238-Speed 13032.41 samples/sec Loss 6.2084 LearningRate 0.0989 Epoch: 13 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:43,817-Speed 12994.15 samples/sec Loss 6.2146 LearningRate 0.0989 Epoch: 13 Global Step: 33590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:45,409-Speed 12870.32 samples/sec Loss 6.2577 LearningRate 0.0988 Epoch: 13 Global Step: 33600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:46,996-Speed 12914.37 samples/sec Loss 6.2520 LearningRate 0.0988 Epoch: 13 Global Step: 33610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:48,581-Speed 12955.18 samples/sec Loss 6.2571 LearningRate 0.0988 Epoch: 13 Global Step: 33620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:50,211-Speed 12570.29 samples/sec Loss 6.1190 LearningRate 0.0988 Epoch: 13 Global Step: 33630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:51,777-Speed 13091.54 samples/sec Loss 6.2314 LearningRate 0.0987 Epoch: 13 Global Step: 33640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:53,354-Speed 12996.75 samples/sec Loss 6.1446 LearningRate 0.0987 Epoch: 13 Global Step: 33650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:54,934-Speed 12970.79 samples/sec Loss 6.3025 LearningRate 0.0987 Epoch: 13 Global Step: 33660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:20:56,541-Speed 12748.54 samples/sec Loss 6.3110 LearningRate 0.0986 Epoch: 13 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:58,102-Speed 13134.04 samples/sec Loss 6.2066 LearningRate 0.0986 Epoch: 13 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:20:59,673-Speed 13044.69 samples/sec Loss 6.3134 LearningRate 0.0986 Epoch: 13 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:01,260-Speed 12910.40 samples/sec Loss 6.3235 LearningRate 0.0986 Epoch: 13 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:02,847-Speed 12919.45 samples/sec Loss 6.2776 LearningRate 0.0985 Epoch: 13 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:04,441-Speed 12882.07 samples/sec Loss 6.2540 LearningRate 0.0985 Epoch: 13 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:06,027-Speed 12922.91 samples/sec Loss 6.2310 LearningRate 0.0985 Epoch: 13 Global Step: 33730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:07,627-Speed 12808.52 samples/sec Loss 6.3728 LearningRate 0.0984 Epoch: 13 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:09,238-Speed 12724.60 samples/sec Loss 6.3508 LearningRate 0.0984 Epoch: 13 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:10,823-Speed 12928.46 samples/sec Loss 6.2701 LearningRate 0.0984 Epoch: 13 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:12,425-Speed 12790.41 samples/sec Loss 6.2117 LearningRate 0.0983 Epoch: 13 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:14,031-Speed 12767.42 samples/sec Loss 6.3288 LearningRate 0.0983 Epoch: 13 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:15,599-Speed 13064.60 samples/sec Loss 6.2202 LearningRate 0.0983 Epoch: 13 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:17,148-Speed 13233.62 samples/sec Loss 6.3093 LearningRate 0.0983 Epoch: 13 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:18,750-Speed 12792.19 samples/sec Loss 6.2686 LearningRate 0.0982 Epoch: 13 Global Step: 33810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:20,333-Speed 12950.73 samples/sec Loss 6.3059 LearningRate 0.0982 Epoch: 13 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:21,924-Speed 12885.94 samples/sec Loss 6.4232 LearningRate 0.0982 Epoch: 13 Global Step: 33830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:23,497-Speed 13027.52 samples/sec Loss 6.3069 LearningRate 0.0981 Epoch: 13 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:25,080-Speed 12941.89 samples/sec Loss 6.4116 LearningRate 0.0981 Epoch: 13 Global Step: 33850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:26,648-Speed 13072.45 samples/sec Loss 6.2885 LearningRate 0.0981 Epoch: 13 Global Step: 33860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:28,246-Speed 12830.32 samples/sec Loss 6.3575 LearningRate 0.0981 Epoch: 13 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:29,853-Speed 12749.12 samples/sec Loss 6.2547 LearningRate 0.0980 Epoch: 13 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:31,433-Speed 12994.92 samples/sec Loss 6.2849 LearningRate 0.0980 Epoch: 13 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:33,009-Speed 13005.24 samples/sec Loss 6.2361 LearningRate 0.0980 Epoch: 13 Global Step: 33900 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:21:34,573-Speed 13108.16 samples/sec Loss 6.2922 LearningRate 0.0979 Epoch: 13 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:36,131-Speed 13156.49 samples/sec Loss 6.2553 LearningRate 0.0979 Epoch: 13 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:37,700-Speed 13061.38 samples/sec Loss 6.3270 LearningRate 0.0979 Epoch: 13 Global Step: 33930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:39,293-Speed 12870.31 samples/sec Loss 6.3017 LearningRate 0.0979 Epoch: 13 Global Step: 33940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:40,858-Speed 13093.90 samples/sec Loss 6.3068 LearningRate 0.0978 Epoch: 13 Global Step: 33950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:42,475-Speed 12669.36 samples/sec Loss 6.3538 LearningRate 0.0978 Epoch: 13 Global Step: 33960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:44,052-Speed 12997.52 samples/sec Loss 6.3990 LearningRate 0.0978 Epoch: 13 Global Step: 33970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:45,616-Speed 13122.70 samples/sec Loss 6.3615 LearningRate 0.0977 Epoch: 13 Global Step: 33980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:47,196-Speed 12969.33 samples/sec Loss 6.3661 LearningRate 0.0977 Epoch: 13 Global Step: 33990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:48,796-Speed 12815.80 samples/sec Loss 6.3306 LearningRate 0.0977 Epoch: 13 Global Step: 34000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:50,369-Speed 13029.36 samples/sec Loss 6.3523 LearningRate 0.0976 Epoch: 13 Global Step: 34010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:51,920-Speed 13214.79 samples/sec Loss 6.3138 LearningRate 0.0976 Epoch: 13 Global Step: 34020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:21:53,515-Speed 12855.06 samples/sec Loss 6.5012 LearningRate 0.0976 Epoch: 13 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:55,078-Speed 13114.43 samples/sec Loss 6.3162 LearningRate 0.0976 Epoch: 13 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:56,636-Speed 13152.35 samples/sec Loss 6.3560 LearningRate 0.0975 Epoch: 13 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:58,216-Speed 12976.27 samples/sec Loss 6.3680 LearningRate 0.0975 Epoch: 13 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:21:59,793-Speed 12991.02 samples/sec Loss 6.2919 LearningRate 0.0975 Epoch: 13 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:01,394-Speed 12803.62 samples/sec Loss 6.3331 LearningRate 0.0974 Epoch: 13 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:02,972-Speed 12986.97 samples/sec Loss 6.4532 LearningRate 0.0974 Epoch: 13 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:04,575-Speed 12791.27 samples/sec Loss 6.3822 LearningRate 0.0974 Epoch: 13 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:06,163-Speed 12896.69 samples/sec Loss 6.2177 LearningRate 0.0974 Epoch: 13 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:07,747-Speed 12971.44 samples/sec Loss 6.3603 LearningRate 0.0973 Epoch: 13 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:09,303-Speed 13171.98 samples/sec Loss 6.3401 LearningRate 0.0973 Epoch: 13 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:10,870-Speed 13078.21 samples/sec Loss 6.3248 LearningRate 0.0973 Epoch: 13 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:12,444-Speed 13018.92 samples/sec Loss 6.4170 LearningRate 0.0972 Epoch: 13 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:14,047-Speed 12790.23 samples/sec Loss 6.4287 LearningRate 0.0972 Epoch: 13 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:15,632-Speed 12963.74 samples/sec Loss 6.3790 LearningRate 0.0972 Epoch: 13 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:17,197-Speed 13095.87 samples/sec Loss 6.3886 LearningRate 0.0972 Epoch: 13 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:18,805-Speed 12740.83 samples/sec Loss 6.3501 LearningRate 0.0971 Epoch: 13 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:20,370-Speed 13097.30 samples/sec Loss 6.4361 LearningRate 0.0971 Epoch: 13 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:21,942-Speed 13039.59 samples/sec Loss 6.3072 LearningRate 0.0971 Epoch: 13 Global Step: 34210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:23,524-Speed 12954.35 samples/sec Loss 6.3557 LearningRate 0.0970 Epoch: 13 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:25,079-Speed 13185.65 samples/sec Loss 6.4194 LearningRate 0.0970 Epoch: 13 Global Step: 34230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:26,632-Speed 13193.99 samples/sec Loss 6.3236 LearningRate 0.0970 Epoch: 13 Global Step: 34240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:28,230-Speed 12824.62 samples/sec Loss 6.3118 LearningRate 0.0969 Epoch: 13 Global Step: 34250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:29,823-Speed 12864.20 samples/sec Loss 6.3504 LearningRate 0.0969 Epoch: 13 Global Step: 34260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:31,398-Speed 13007.17 samples/sec Loss 6.2229 LearningRate 0.0969 Epoch: 13 Global Step: 34270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:32,982-Speed 12945.38 samples/sec Loss 6.3081 LearningRate 0.0969 Epoch: 13 Global Step: 34280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:34,567-Speed 12930.78 samples/sec Loss 6.3169 LearningRate 0.0968 Epoch: 13 Global Step: 34290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:36,151-Speed 12941.87 samples/sec Loss 6.3233 LearningRate 0.0968 Epoch: 13 Global Step: 34300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:37,729-Speed 12983.01 samples/sec Loss 6.3667 LearningRate 0.0968 Epoch: 13 Global Step: 34310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:39,332-Speed 12787.92 samples/sec Loss 6.3813 LearningRate 0.0967 Epoch: 13 Global Step: 34320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:22:40,917-Speed 12929.50 samples/sec Loss 6.3948 LearningRate 0.0967 Epoch: 13 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:42,491-Speed 13018.89 samples/sec Loss 6.4999 LearningRate 0.0967 Epoch: 13 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:44,077-Speed 12924.99 samples/sec Loss 6.3957 LearningRate 0.0967 Epoch: 13 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:45,653-Speed 13008.57 samples/sec Loss 6.4822 LearningRate 0.0966 Epoch: 13 Global Step: 34360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:47,251-Speed 12824.58 samples/sec Loss 6.3783 LearningRate 0.0966 Epoch: 13 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:48,818-Speed 13077.60 samples/sec Loss 6.4201 LearningRate 0.0966 Epoch: 13 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:50,404-Speed 12921.63 samples/sec Loss 6.3733 LearningRate 0.0965 Epoch: 13 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:52,010-Speed 12792.57 samples/sec Loss 6.3573 LearningRate 0.0965 Epoch: 13 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:53,578-Speed 13075.88 samples/sec Loss 6.4724 LearningRate 0.0965 Epoch: 13 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:55,198-Speed 12655.24 samples/sec Loss 6.3562 LearningRate 0.0965 Epoch: 13 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:22:56,781-Speed 12938.73 samples/sec Loss 6.4063 LearningRate 0.0964 Epoch: 13 Global Step: 34430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:58,357-Speed 13006.90 samples/sec Loss 6.4510 LearningRate 0.0964 Epoch: 13 Global Step: 34440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:22:59,949-Speed 12876.39 samples/sec Loss 6.3817 LearningRate 0.0964 Epoch: 13 Global Step: 34450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:23:01,523-Speed 13022.29 samples/sec Loss 6.3755 LearningRate 0.0963 Epoch: 13 Global Step: 34460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:23:03,109-Speed 12922.98 samples/sec Loss 6.3720 LearningRate 0.0963 Epoch: 13 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:23:04,686-Speed 12990.42 samples/sec Loss 6.4831 LearningRate 0.0963 Epoch: 13 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:23:06,259-Speed 13032.04 samples/sec Loss 6.3982 LearningRate 0.0963 Epoch: 13 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:07,826-Speed 13077.31 samples/sec Loss 6.4623 LearningRate 0.0962 Epoch: 13 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:09,402-Speed 13003.95 samples/sec Loss 6.4131 LearningRate 0.0962 Epoch: 13 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:10,973-Speed 13039.92 samples/sec Loss 6.4931 LearningRate 0.0962 Epoch: 13 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:12,570-Speed 12841.27 samples/sec Loss 6.3796 LearningRate 0.0961 Epoch: 13 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:14,146-Speed 13002.10 samples/sec Loss 6.4103 LearningRate 0.0961 Epoch: 13 Global Step: 34540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:15,723-Speed 12996.00 samples/sec Loss 6.2907 LearningRate 0.0961 Epoch: 13 Global Step: 34550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:17,318-Speed 12847.80 samples/sec Loss 6.4177 LearningRate 0.0961 Epoch: 13 Global Step: 34560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:18,890-Speed 13044.88 samples/sec Loss 6.3912 LearningRate 0.0960 Epoch: 13 Global Step: 34570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:20,469-Speed 12976.97 samples/sec Loss 6.3694 LearningRate 0.0960 Epoch: 13 Global Step: 34580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:22,069-Speed 12807.13 samples/sec Loss 6.3549 LearningRate 0.0960 Epoch: 13 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:23:23,622-Speed 13198.64 samples/sec Loss 6.4518 LearningRate 0.0959 Epoch: 13 Global Step: 34600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:25,220-Speed 12826.26 samples/sec Loss 6.2967 LearningRate 0.0959 Epoch: 13 Global Step: 34610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:26,855-Speed 12528.76 samples/sec Loss 6.4870 LearningRate 0.0959 Epoch: 13 Global Step: 34620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:28,428-Speed 13027.09 samples/sec Loss 6.4278 LearningRate 0.0959 Epoch: 13 Global Step: 34630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:30,013-Speed 12932.72 samples/sec Loss 6.4269 LearningRate 0.0958 Epoch: 13 Global Step: 34640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:31,617-Speed 12778.43 samples/sec Loss 6.3954 LearningRate 0.0958 Epoch: 13 Global Step: 34650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:33,187-Speed 13060.44 samples/sec Loss 6.4150 LearningRate 0.0958 Epoch: 13 Global Step: 34660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:34,774-Speed 12911.59 samples/sec Loss 6.4109 LearningRate 0.0957 Epoch: 13 Global Step: 34670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:36,362-Speed 12896.81 samples/sec Loss 6.4090 LearningRate 0.0957 Epoch: 13 Global Step: 34680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:37,957-Speed 12856.44 samples/sec Loss 6.4809 LearningRate 0.0957 Epoch: 13 Global Step: 34690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:39,530-Speed 13033.18 samples/sec Loss 6.4331 LearningRate 0.0957 Epoch: 13 Global Step: 34700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:41,132-Speed 12794.04 samples/sec Loss 6.4645 LearningRate 0.0956 Epoch: 13 Global Step: 34710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:23:42,681-Speed 13241.94 samples/sec Loss 6.4289 LearningRate 0.0956 Epoch: 13 Global Step: 34720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:44,256-Speed 13008.38 samples/sec Loss 6.4424 LearningRate 0.0956 Epoch: 13 Global Step: 34730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:45,852-Speed 12838.14 samples/sec Loss 6.3470 LearningRate 0.0955 Epoch: 13 Global Step: 34740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:47,456-Speed 12779.28 samples/sec Loss 6.4101 LearningRate 0.0955 Epoch: 13 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:49,037-Speed 12960.86 samples/sec Loss 6.3774 LearningRate 0.0955 Epoch: 13 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:50,620-Speed 12950.15 samples/sec Loss 6.2852 LearningRate 0.0954 Epoch: 13 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:52,225-Speed 12767.75 samples/sec Loss 6.3136 LearningRate 0.0954 Epoch: 13 Global Step: 34780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:53,796-Speed 13056.84 samples/sec Loss 6.4179 LearningRate 0.0954 Epoch: 13 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:55,378-Speed 12953.51 samples/sec Loss 6.4159 LearningRate 0.0954 Epoch: 13 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:56,980-Speed 12794.48 samples/sec Loss 6.4697 LearningRate 0.0953 Epoch: 13 Global Step: 34810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:23:58,535-Speed 13181.46 samples/sec Loss 6.3730 LearningRate 0.0953 Epoch: 13 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:24:00,142-Speed 12750.53 samples/sec Loss 6.3974 LearningRate 0.0953 Epoch: 13 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:24:01,750-Speed 12739.18 samples/sec Loss 6.4131 LearningRate 0.0952 Epoch: 13 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:24:03,326-Speed 13007.93 samples/sec Loss 6.3057 LearningRate 0.0952 Epoch: 13 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:04,904-Speed 12987.06 samples/sec Loss 6.4090 LearningRate 0.0952 Epoch: 13 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:06,476-Speed 13036.66 samples/sec Loss 6.3648 LearningRate 0.0952 Epoch: 13 Global Step: 34870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:08,047-Speed 13047.93 samples/sec Loss 6.4656 LearningRate 0.0951 Epoch: 13 Global Step: 34880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:09,626-Speed 12981.43 samples/sec Loss 6.3622 LearningRate 0.0951 Epoch: 13 Global Step: 34890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:11,205-Speed 12970.11 samples/sec Loss 6.3473 LearningRate 0.0951 Epoch: 13 Global Step: 34900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:12,781-Speed 13027.36 samples/sec Loss 6.3667 LearningRate 0.0950 Epoch: 13 Global Step: 34910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:14,348-Speed 13072.32 samples/sec Loss 6.3360 LearningRate 0.0950 Epoch: 13 Global Step: 34920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:15,980-Speed 12559.85 samples/sec Loss 6.4597 LearningRate 0.0950 Epoch: 13 Global Step: 34930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:17,547-Speed 13079.55 samples/sec Loss 6.4258 LearningRate 0.0950 Epoch: 13 Global Step: 34940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:24:19,113-Speed 13090.81 samples/sec Loss 6.3504 LearningRate 0.0949 Epoch: 13 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:24:20,682-Speed 13065.06 samples/sec Loss 6.3078 LearningRate 0.0949 Epoch: 13 Global Step: 34960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:24:22,259-Speed 13011.95 samples/sec Loss 6.3113 LearningRate 0.0949 Epoch: 13 Global Step: 34970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:24:23,837-Speed 12989.83 samples/sec Loss 6.4326 LearningRate 0.0948 Epoch: 13 Global Step: 34980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:24:25,410-Speed 13029.11 samples/sec Loss 6.4348 LearningRate 0.0948 Epoch: 13 Global Step: 34990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:24:26,988-Speed 12983.64 samples/sec Loss 6.3849 LearningRate 0.0948 Epoch: 13 Global Step: 35000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:24:50,377-[lfw][35000]XNorm: 11.195635 Training: 2022-01-14 15:24:50,378-[lfw][35000]Accuracy-Flip: 0.99450+-0.00325 Training: 2022-01-14 15:24:50,378-[lfw][35000]Accuracy-Highest: 0.99583 Training: 2022-01-14 15:25:16,553-[cfp_fp][35000]XNorm: 9.451408 Training: 2022-01-14 15:25:16,554-[cfp_fp][35000]Accuracy-Flip: 0.95514+-0.00961 Training: 2022-01-14 15:25:16,555-[cfp_fp][35000]Accuracy-Highest: 0.95514 Training: 2022-01-14 15:25:39,982-[agedb_30][35000]XNorm: 10.894569 Training: 2022-01-14 15:25:39,983-[agedb_30][35000]Accuracy-Flip: 0.95733+-0.00824 Training: 2022-01-14 15:25:39,984-[agedb_30][35000]Accuracy-Highest: 0.95800 Training: 2022-01-14 15:25:41,548-Speed 274.68 samples/sec Loss 6.3776 LearningRate 0.0948 Epoch: 13 Global Step: 35010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:25:43,153-Speed 12768.55 samples/sec Loss 6.3392 LearningRate 0.0947 Epoch: 13 Global Step: 35020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:25:44,709-Speed 13166.48 samples/sec Loss 6.3804 LearningRate 0.0947 Epoch: 13 Global Step: 35030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:25:46,292-Speed 12946.03 samples/sec Loss 6.3874 LearningRate 0.0947 Epoch: 13 Global Step: 35040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:25:47,911-Speed 12662.55 samples/sec Loss 6.4749 LearningRate 0.0946 Epoch: 13 Global Step: 35050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:25:49,475-Speed 13099.54 samples/sec Loss 6.3554 LearningRate 0.0946 Epoch: 13 Global Step: 35060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:25:51,027-Speed 13206.90 samples/sec Loss 6.2861 LearningRate 0.0946 Epoch: 13 Global Step: 35070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:25:52,610-Speed 12947.20 samples/sec Loss 6.3865 LearningRate 0.0946 Epoch: 13 Global Step: 35080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:25:54,209-Speed 12817.83 samples/sec Loss 6.4075 LearningRate 0.0945 Epoch: 13 Global Step: 35090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:25:55,766-Speed 13158.15 samples/sec Loss 6.4648 LearningRate 0.0945 Epoch: 13 Global Step: 35100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:25:57,361-Speed 12854.57 samples/sec Loss 6.4338 LearningRate 0.0945 Epoch: 13 Global Step: 35110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:25:58,944-Speed 12943.72 samples/sec Loss 6.4378 LearningRate 0.0944 Epoch: 13 Global Step: 35120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:00,485-Speed 13298.71 samples/sec Loss 6.4425 LearningRate 0.0944 Epoch: 13 Global Step: 35130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:02,071-Speed 12917.74 samples/sec Loss 6.4247 LearningRate 0.0944 Epoch: 13 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:03,650-Speed 12979.51 samples/sec Loss 6.2930 LearningRate 0.0944 Epoch: 13 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:05,268-Speed 12667.74 samples/sec Loss 6.3925 LearningRate 0.0943 Epoch: 13 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:06,849-Speed 12962.89 samples/sec Loss 6.2936 LearningRate 0.0943 Epoch: 13 Global Step: 35170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:08,410-Speed 13126.98 samples/sec Loss 6.4427 LearningRate 0.0943 Epoch: 13 Global Step: 35180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:09,990-Speed 12972.11 samples/sec Loss 6.3333 LearningRate 0.0942 Epoch: 13 Global Step: 35190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:11,566-Speed 13007.75 samples/sec Loss 6.3821 LearningRate 0.0942 Epoch: 13 Global Step: 35200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:13,147-Speed 12966.35 samples/sec Loss 6.4513 LearningRate 0.0942 Epoch: 13 Global Step: 35210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:14,761-Speed 12695.60 samples/sec Loss 6.3445 LearningRate 0.0942 Epoch: 13 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:16,325-Speed 13100.38 samples/sec Loss 6.3320 LearningRate 0.0941 Epoch: 13 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:17,906-Speed 12984.82 samples/sec Loss 6.3331 LearningRate 0.0941 Epoch: 13 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:19,509-Speed 12779.77 samples/sec Loss 6.3194 LearningRate 0.0941 Epoch: 13 Global Step: 35250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:21,097-Speed 12913.39 samples/sec Loss 6.4015 LearningRate 0.0940 Epoch: 13 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:22,664-Speed 13072.89 samples/sec Loss 6.2859 LearningRate 0.0940 Epoch: 13 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:24,229-Speed 13091.23 samples/sec Loss 6.3681 LearningRate 0.0940 Epoch: 13 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:25,817-Speed 12906.12 samples/sec Loss 6.3898 LearningRate 0.0940 Epoch: 13 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:27,420-Speed 12787.30 samples/sec Loss 6.4586 LearningRate 0.0939 Epoch: 13 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:29,021-Speed 12795.18 samples/sec Loss 6.4386 LearningRate 0.0939 Epoch: 13 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:26:30,608-Speed 12910.62 samples/sec Loss 6.3598 LearningRate 0.0939 Epoch: 13 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:32,200-Speed 12875.94 samples/sec Loss 6.3292 LearningRate 0.0938 Epoch: 13 Global Step: 35330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:33,815-Speed 12684.83 samples/sec Loss 6.3938 LearningRate 0.0938 Epoch: 13 Global Step: 35340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:35,412-Speed 12832.67 samples/sec Loss 6.3080 LearningRate 0.0938 Epoch: 13 Global Step: 35350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:36,972-Speed 13136.59 samples/sec Loss 6.4739 LearningRate 0.0938 Epoch: 13 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:38,544-Speed 13031.49 samples/sec Loss 6.4243 LearningRate 0.0937 Epoch: 13 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:40,127-Speed 12950.59 samples/sec Loss 6.3661 LearningRate 0.0937 Epoch: 13 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:41,724-Speed 12831.11 samples/sec Loss 6.4591 LearningRate 0.0937 Epoch: 13 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:43,376-Speed 12400.74 samples/sec Loss 6.4131 LearningRate 0.0936 Epoch: 13 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:26:59,290-Speed 1287.07 samples/sec Loss 6.0473 LearningRate 0.0936 Epoch: 14 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:00,921-Speed 12568.56 samples/sec Loss 5.4882 LearningRate 0.0936 Epoch: 14 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:02,545-Speed 12621.67 samples/sec Loss 5.5521 LearningRate 0.0936 Epoch: 14 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:04,124-Speed 12977.80 samples/sec Loss 5.5551 LearningRate 0.0935 Epoch: 14 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:05,713-Speed 12896.22 samples/sec Loss 5.5112 LearningRate 0.0935 Epoch: 14 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:07,316-Speed 12785.29 samples/sec Loss 5.6348 LearningRate 0.0935 Epoch: 14 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:08,872-Speed 13166.87 samples/sec Loss 5.5836 LearningRate 0.0934 Epoch: 14 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:10,447-Speed 13008.21 samples/sec Loss 5.6421 LearningRate 0.0934 Epoch: 14 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:12,016-Speed 13060.07 samples/sec Loss 5.5228 LearningRate 0.0934 Epoch: 14 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:13,615-Speed 12822.41 samples/sec Loss 5.5595 LearningRate 0.0934 Epoch: 14 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:15,180-Speed 13110.11 samples/sec Loss 5.6468 LearningRate 0.0933 Epoch: 14 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:16,748-Speed 13074.68 samples/sec Loss 5.6277 LearningRate 0.0933 Epoch: 14 Global Step: 35520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:18,323-Speed 13006.78 samples/sec Loss 5.6300 LearningRate 0.0933 Epoch: 14 Global Step: 35530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:19,903-Speed 12973.95 samples/sec Loss 5.5802 LearningRate 0.0932 Epoch: 14 Global Step: 35540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:21,493-Speed 12882.62 samples/sec Loss 5.7080 LearningRate 0.0932 Epoch: 14 Global Step: 35550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:23,068-Speed 13014.19 samples/sec Loss 5.6100 LearningRate 0.0932 Epoch: 14 Global Step: 35560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:24,672-Speed 12773.49 samples/sec Loss 5.7049 LearningRate 0.0932 Epoch: 14 Global Step: 35570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:26,264-Speed 12872.60 samples/sec Loss 5.7628 LearningRate 0.0931 Epoch: 14 Global Step: 35580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:27,840-Speed 13004.17 samples/sec Loss 5.6989 LearningRate 0.0931 Epoch: 14 Global Step: 35590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:29,421-Speed 12963.79 samples/sec Loss 5.5925 LearningRate 0.0931 Epoch: 14 Global Step: 35600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:31,011-Speed 12886.16 samples/sec Loss 5.7323 LearningRate 0.0930 Epoch: 14 Global Step: 35610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:27:32,577-Speed 13089.22 samples/sec Loss 5.7376 LearningRate 0.0930 Epoch: 14 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:34,171-Speed 12849.30 samples/sec Loss 5.7000 LearningRate 0.0930 Epoch: 14 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:35,747-Speed 13004.39 samples/sec Loss 5.7413 LearningRate 0.0930 Epoch: 14 Global Step: 35640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:37,317-Speed 13055.84 samples/sec Loss 5.8198 LearningRate 0.0929 Epoch: 14 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:38,902-Speed 12929.53 samples/sec Loss 5.6759 LearningRate 0.0929 Epoch: 14 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:40,477-Speed 13011.76 samples/sec Loss 5.7737 LearningRate 0.0929 Epoch: 14 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:42,054-Speed 12993.37 samples/sec Loss 5.7778 LearningRate 0.0929 Epoch: 14 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:43,638-Speed 12930.16 samples/sec Loss 5.8778 LearningRate 0.0928 Epoch: 14 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:45,212-Speed 13038.67 samples/sec Loss 5.7870 LearningRate 0.0928 Epoch: 14 Global Step: 35700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:46,809-Speed 12828.54 samples/sec Loss 5.8356 LearningRate 0.0928 Epoch: 14 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:27:48,386-Speed 12999.88 samples/sec Loss 5.9015 LearningRate 0.0927 Epoch: 14 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:49,982-Speed 12841.90 samples/sec Loss 5.8264 LearningRate 0.0927 Epoch: 14 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:51,547-Speed 13092.85 samples/sec Loss 5.8735 LearningRate 0.0927 Epoch: 14 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:53,108-Speed 13129.69 samples/sec Loss 5.8296 LearningRate 0.0927 Epoch: 14 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:54,681-Speed 13021.77 samples/sec Loss 5.8424 LearningRate 0.0926 Epoch: 14 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:56,252-Speed 13043.96 samples/sec Loss 5.7171 LearningRate 0.0926 Epoch: 14 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:57,835-Speed 12946.66 samples/sec Loss 5.8555 LearningRate 0.0926 Epoch: 14 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:27:59,416-Speed 12959.83 samples/sec Loss 5.8603 LearningRate 0.0925 Epoch: 14 Global Step: 35790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:00,965-Speed 13232.83 samples/sec Loss 5.9443 LearningRate 0.0925 Epoch: 14 Global Step: 35800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:02,553-Speed 12905.75 samples/sec Loss 5.9337 LearningRate 0.0925 Epoch: 14 Global Step: 35810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:04,126-Speed 13022.88 samples/sec Loss 5.9181 LearningRate 0.0925 Epoch: 14 Global Step: 35820 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:28:05,683-Speed 13165.91 samples/sec Loss 5.9111 LearningRate 0.0924 Epoch: 14 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:07,275-Speed 12868.40 samples/sec Loss 5.9520 LearningRate 0.0924 Epoch: 14 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:08,863-Speed 12907.22 samples/sec Loss 5.9793 LearningRate 0.0924 Epoch: 14 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:10,428-Speed 13099.27 samples/sec Loss 5.9264 LearningRate 0.0923 Epoch: 14 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:12,005-Speed 12991.42 samples/sec Loss 6.0203 LearningRate 0.0923 Epoch: 14 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:13,575-Speed 13046.13 samples/sec Loss 5.9397 LearningRate 0.0923 Epoch: 14 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:15,136-Speed 13132.82 samples/sec Loss 5.8716 LearningRate 0.0923 Epoch: 14 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:16,721-Speed 12928.26 samples/sec Loss 6.0401 LearningRate 0.0922 Epoch: 14 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:18,319-Speed 12821.21 samples/sec Loss 5.9591 LearningRate 0.0922 Epoch: 14 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:19,895-Speed 13004.92 samples/sec Loss 5.9747 LearningRate 0.0922 Epoch: 14 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:21,465-Speed 13051.39 samples/sec Loss 6.1099 LearningRate 0.0921 Epoch: 14 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:23,065-Speed 12804.07 samples/sec Loss 6.0534 LearningRate 0.0921 Epoch: 14 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:24,658-Speed 12859.38 samples/sec Loss 5.9680 LearningRate 0.0921 Epoch: 14 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:26,236-Speed 12986.72 samples/sec Loss 6.0450 LearningRate 0.0921 Epoch: 14 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:27,814-Speed 12989.12 samples/sec Loss 5.9828 LearningRate 0.0920 Epoch: 14 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:29,381-Speed 13074.54 samples/sec Loss 6.0295 LearningRate 0.0920 Epoch: 14 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:30,942-Speed 13128.70 samples/sec Loss 5.9372 LearningRate 0.0920 Epoch: 14 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:32,519-Speed 12993.06 samples/sec Loss 6.0276 LearningRate 0.0919 Epoch: 14 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:34,101-Speed 12951.52 samples/sec Loss 6.0312 LearningRate 0.0919 Epoch: 14 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:35,667-Speed 13083.23 samples/sec Loss 6.0202 LearningRate 0.0919 Epoch: 14 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:37,236-Speed 13058.74 samples/sec Loss 5.9672 LearningRate 0.0919 Epoch: 14 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:28:38,808-Speed 13032.07 samples/sec Loss 5.9527 LearningRate 0.0918 Epoch: 14 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:40,378-Speed 13050.73 samples/sec Loss 5.9708 LearningRate 0.0918 Epoch: 14 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:41,954-Speed 13006.34 samples/sec Loss 6.0187 LearningRate 0.0918 Epoch: 14 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:43,517-Speed 13109.76 samples/sec Loss 6.0245 LearningRate 0.0917 Epoch: 14 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:45,097-Speed 12972.10 samples/sec Loss 5.9750 LearningRate 0.0917 Epoch: 14 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:46,654-Speed 13151.05 samples/sec Loss 6.0478 LearningRate 0.0917 Epoch: 14 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:48,210-Speed 13170.28 samples/sec Loss 6.0834 LearningRate 0.0917 Epoch: 14 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:49,778-Speed 13065.66 samples/sec Loss 6.0015 LearningRate 0.0916 Epoch: 14 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:51,334-Speed 13178.11 samples/sec Loss 6.0616 LearningRate 0.0916 Epoch: 14 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:52,906-Speed 13030.48 samples/sec Loss 6.0780 LearningRate 0.0916 Epoch: 14 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:28:54,448-Speed 13287.30 samples/sec Loss 6.0798 LearningRate 0.0916 Epoch: 14 Global Step: 36140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:28:56,007-Speed 13144.70 samples/sec Loss 6.1298 LearningRate 0.0915 Epoch: 14 Global Step: 36150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:28:57,592-Speed 12927.49 samples/sec Loss 5.9430 LearningRate 0.0915 Epoch: 14 Global Step: 36160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:28:59,201-Speed 12734.13 samples/sec Loss 6.0486 LearningRate 0.0915 Epoch: 14 Global Step: 36170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:00,779-Speed 12992.19 samples/sec Loss 6.0868 LearningRate 0.0914 Epoch: 14 Global Step: 36180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:02,343-Speed 13100.66 samples/sec Loss 6.0726 LearningRate 0.0914 Epoch: 14 Global Step: 36190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:03,921-Speed 12990.25 samples/sec Loss 6.0493 LearningRate 0.0914 Epoch: 14 Global Step: 36200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:05,486-Speed 13088.61 samples/sec Loss 6.1066 LearningRate 0.0914 Epoch: 14 Global Step: 36210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:07,039-Speed 13200.62 samples/sec Loss 6.1610 LearningRate 0.0913 Epoch: 14 Global Step: 36220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:08,630-Speed 12879.57 samples/sec Loss 6.1068 LearningRate 0.0913 Epoch: 14 Global Step: 36230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:10,196-Speed 13087.40 samples/sec Loss 6.0476 LearningRate 0.0913 Epoch: 14 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:11,758-Speed 13117.93 samples/sec Loss 6.0453 LearningRate 0.0912 Epoch: 14 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:13,338-Speed 12961.13 samples/sec Loss 6.1457 LearningRate 0.0912 Epoch: 14 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:14,889-Speed 13220.92 samples/sec Loss 6.0939 LearningRate 0.0912 Epoch: 14 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:16,445-Speed 13165.57 samples/sec Loss 6.1213 LearningRate 0.0912 Epoch: 14 Global Step: 36280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:18,016-Speed 13037.11 samples/sec Loss 6.1114 LearningRate 0.0911 Epoch: 14 Global Step: 36290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:19,589-Speed 13028.04 samples/sec Loss 6.1403 LearningRate 0.0911 Epoch: 14 Global Step: 36300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:21,157-Speed 13106.27 samples/sec Loss 6.0818 LearningRate 0.0911 Epoch: 14 Global Step: 36310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:22,733-Speed 13003.38 samples/sec Loss 6.0933 LearningRate 0.0910 Epoch: 14 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:24,294-Speed 13124.74 samples/sec Loss 6.2199 LearningRate 0.0910 Epoch: 14 Global Step: 36330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:25,878-Speed 12940.63 samples/sec Loss 6.0344 LearningRate 0.0910 Epoch: 14 Global Step: 36340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:27,437-Speed 13134.29 samples/sec Loss 6.0514 LearningRate 0.0910 Epoch: 14 Global Step: 36350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:29,019-Speed 12954.59 samples/sec Loss 6.1442 LearningRate 0.0909 Epoch: 14 Global Step: 36360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:30,574-Speed 13178.25 samples/sec Loss 6.1174 LearningRate 0.0909 Epoch: 14 Global Step: 36370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:32,151-Speed 12991.61 samples/sec Loss 6.1488 LearningRate 0.0909 Epoch: 14 Global Step: 36380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:33,747-Speed 12836.20 samples/sec Loss 6.1155 LearningRate 0.0908 Epoch: 14 Global Step: 36390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:35,305-Speed 13157.03 samples/sec Loss 6.1220 LearningRate 0.0908 Epoch: 14 Global Step: 36400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:36,894-Speed 12897.87 samples/sec Loss 6.1234 LearningRate 0.0908 Epoch: 14 Global Step: 36410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:38,471-Speed 12986.46 samples/sec Loss 6.0988 LearningRate 0.0908 Epoch: 14 Global Step: 36420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:29:40,045-Speed 13023.12 samples/sec Loss 6.1433 LearningRate 0.0907 Epoch: 14 Global Step: 36430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:41,620-Speed 13008.02 samples/sec Loss 6.2504 LearningRate 0.0907 Epoch: 14 Global Step: 36440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:43,164-Speed 13270.44 samples/sec Loss 6.2036 LearningRate 0.0907 Epoch: 14 Global Step: 36450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:44,730-Speed 13082.45 samples/sec Loss 6.2086 LearningRate 0.0907 Epoch: 14 Global Step: 36460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:46,299-Speed 13063.13 samples/sec Loss 6.1658 LearningRate 0.0906 Epoch: 14 Global Step: 36470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:47,893-Speed 12854.05 samples/sec Loss 6.1566 LearningRate 0.0906 Epoch: 14 Global Step: 36480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:49,486-Speed 12861.61 samples/sec Loss 6.2119 LearningRate 0.0906 Epoch: 14 Global Step: 36490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:51,048-Speed 13121.22 samples/sec Loss 6.2333 LearningRate 0.0905 Epoch: 14 Global Step: 36500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:52,627-Speed 12976.75 samples/sec Loss 6.2143 LearningRate 0.0905 Epoch: 14 Global Step: 36510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:54,205-Speed 12985.30 samples/sec Loss 6.0949 LearningRate 0.0905 Epoch: 14 Global Step: 36520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:29:55,770-Speed 13094.40 samples/sec Loss 6.1853 LearningRate 0.0905 Epoch: 14 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:29:57,343-Speed 13029.27 samples/sec Loss 6.2004 LearningRate 0.0904 Epoch: 14 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:29:58,898-Speed 13172.31 samples/sec Loss 6.2695 LearningRate 0.0904 Epoch: 14 Global Step: 36550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:00,475-Speed 13001.29 samples/sec Loss 6.2128 LearningRate 0.0904 Epoch: 14 Global Step: 36560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:02,050-Speed 13011.29 samples/sec Loss 6.2509 LearningRate 0.0903 Epoch: 14 Global Step: 36570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:03,636-Speed 12916.85 samples/sec Loss 6.1042 LearningRate 0.0903 Epoch: 14 Global Step: 36580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:05,197-Speed 13129.46 samples/sec Loss 6.2043 LearningRate 0.0903 Epoch: 14 Global Step: 36590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:06,753-Speed 13168.72 samples/sec Loss 6.1333 LearningRate 0.0903 Epoch: 14 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:08,332-Speed 12975.26 samples/sec Loss 6.2019 LearningRate 0.0902 Epoch: 14 Global Step: 36610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:09,891-Speed 13145.01 samples/sec Loss 6.2210 LearningRate 0.0902 Epoch: 14 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:11,461-Speed 13058.32 samples/sec Loss 6.1376 LearningRate 0.0902 Epoch: 14 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:13,028-Speed 13067.26 samples/sec Loss 6.1548 LearningRate 0.0901 Epoch: 14 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:14,591-Speed 13116.02 samples/sec Loss 6.1803 LearningRate 0.0901 Epoch: 14 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:30:16,171-Speed 12970.31 samples/sec Loss 6.1756 LearningRate 0.0901 Epoch: 14 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:30:17,733-Speed 13112.29 samples/sec Loss 6.1303 LearningRate 0.0901 Epoch: 14 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:30:19,287-Speed 13187.81 samples/sec Loss 6.1531 LearningRate 0.0900 Epoch: 14 Global Step: 36680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:20,838-Speed 13216.99 samples/sec Loss 6.2353 LearningRate 0.0900 Epoch: 14 Global Step: 36690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:22,397-Speed 13138.48 samples/sec Loss 6.1565 LearningRate 0.0900 Epoch: 14 Global Step: 36700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:23,953-Speed 13165.97 samples/sec Loss 6.1968 LearningRate 0.0900 Epoch: 14 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:25,527-Speed 13025.97 samples/sec Loss 6.1871 LearningRate 0.0899 Epoch: 14 Global Step: 36720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:27,090-Speed 13111.65 samples/sec Loss 6.1857 LearningRate 0.0899 Epoch: 14 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:28,654-Speed 13095.95 samples/sec Loss 6.2700 LearningRate 0.0899 Epoch: 14 Global Step: 36740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:30,246-Speed 12877.62 samples/sec Loss 6.0762 LearningRate 0.0898 Epoch: 14 Global Step: 36750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:31,823-Speed 12990.85 samples/sec Loss 6.2202 LearningRate 0.0898 Epoch: 14 Global Step: 36760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:33,406-Speed 12938.67 samples/sec Loss 6.2418 LearningRate 0.0898 Epoch: 14 Global Step: 36770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:34,979-Speed 13030.60 samples/sec Loss 6.2849 LearningRate 0.0898 Epoch: 14 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:30:36,567-Speed 12905.66 samples/sec Loss 6.2349 LearningRate 0.0897 Epoch: 14 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:30:38,119-Speed 13198.78 samples/sec Loss 6.2351 LearningRate 0.0897 Epoch: 14 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:39,701-Speed 12955.13 samples/sec Loss 6.1547 LearningRate 0.0897 Epoch: 14 Global Step: 36810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:41,274-Speed 13033.37 samples/sec Loss 6.2942 LearningRate 0.0896 Epoch: 14 Global Step: 36820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:42,864-Speed 12886.26 samples/sec Loss 6.2572 LearningRate 0.0896 Epoch: 14 Global Step: 36830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:44,435-Speed 13045.37 samples/sec Loss 6.1434 LearningRate 0.0896 Epoch: 14 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:30:45,977-Speed 13288.95 samples/sec Loss 6.1830 LearningRate 0.0896 Epoch: 14 Global Step: 36850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:47,534-Speed 13156.87 samples/sec Loss 6.1936 LearningRate 0.0895 Epoch: 14 Global Step: 36860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:49,114-Speed 12967.77 samples/sec Loss 6.2988 LearningRate 0.0895 Epoch: 14 Global Step: 36870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:50,701-Speed 12915.89 samples/sec Loss 6.2619 LearningRate 0.0895 Epoch: 14 Global Step: 36880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:52,277-Speed 13007.18 samples/sec Loss 6.3390 LearningRate 0.0895 Epoch: 14 Global Step: 36890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:53,865-Speed 12899.50 samples/sec Loss 6.1692 LearningRate 0.0894 Epoch: 14 Global Step: 36900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:55,469-Speed 12777.09 samples/sec Loss 6.1626 LearningRate 0.0894 Epoch: 14 Global Step: 36910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:57,024-Speed 13179.96 samples/sec Loss 6.2700 LearningRate 0.0894 Epoch: 14 Global Step: 36920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:30:58,610-Speed 12914.87 samples/sec Loss 6.2498 LearningRate 0.0893 Epoch: 14 Global Step: 36930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:00,162-Speed 13208.21 samples/sec Loss 6.2726 LearningRate 0.0893 Epoch: 14 Global Step: 36940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:01,733-Speed 13041.80 samples/sec Loss 6.2084 LearningRate 0.0893 Epoch: 14 Global Step: 36950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:03,320-Speed 12911.04 samples/sec Loss 6.1884 LearningRate 0.0893 Epoch: 14 Global Step: 36960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:04,906-Speed 12914.70 samples/sec Loss 6.2504 LearningRate 0.0892 Epoch: 14 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:06,476-Speed 13068.47 samples/sec Loss 6.2378 LearningRate 0.0892 Epoch: 14 Global Step: 36980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:08,027-Speed 13209.28 samples/sec Loss 6.2059 LearningRate 0.0892 Epoch: 14 Global Step: 36990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:09,627-Speed 12804.29 samples/sec Loss 6.2450 LearningRate 0.0891 Epoch: 14 Global Step: 37000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:11,194-Speed 13079.63 samples/sec Loss 6.2908 LearningRate 0.0891 Epoch: 14 Global Step: 37010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:12,779-Speed 12925.30 samples/sec Loss 6.2541 LearningRate 0.0891 Epoch: 14 Global Step: 37020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:14,385-Speed 12755.00 samples/sec Loss 6.2598 LearningRate 0.0891 Epoch: 14 Global Step: 37030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:15,968-Speed 12953.88 samples/sec Loss 6.3121 LearningRate 0.0890 Epoch: 14 Global Step: 37040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:17,555-Speed 12907.08 samples/sec Loss 6.2763 LearningRate 0.0890 Epoch: 14 Global Step: 37050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:19,140-Speed 12932.89 samples/sec Loss 6.2560 LearningRate 0.0890 Epoch: 14 Global Step: 37060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:20,717-Speed 12993.38 samples/sec Loss 6.2461 LearningRate 0.0889 Epoch: 14 Global Step: 37070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:22,286-Speed 13058.06 samples/sec Loss 6.2843 LearningRate 0.0889 Epoch: 14 Global Step: 37080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:23,854-Speed 13064.62 samples/sec Loss 6.1952 LearningRate 0.0889 Epoch: 14 Global Step: 37090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:25,438-Speed 12944.58 samples/sec Loss 6.2670 LearningRate 0.0889 Epoch: 14 Global Step: 37100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:27,008-Speed 13050.77 samples/sec Loss 6.2822 LearningRate 0.0888 Epoch: 14 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:28,566-Speed 13144.43 samples/sec Loss 6.2608 LearningRate 0.0888 Epoch: 14 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:30,136-Speed 13062.30 samples/sec Loss 6.3455 LearningRate 0.0888 Epoch: 14 Global Step: 37130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:31,700-Speed 13096.77 samples/sec Loss 6.2597 LearningRate 0.0888 Epoch: 14 Global Step: 37140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:33,293-Speed 12864.68 samples/sec Loss 6.2201 LearningRate 0.0887 Epoch: 14 Global Step: 37150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:34,838-Speed 13259.07 samples/sec Loss 6.2892 LearningRate 0.0887 Epoch: 14 Global Step: 37160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:36,407-Speed 13059.02 samples/sec Loss 6.2703 LearningRate 0.0887 Epoch: 14 Global Step: 37170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:37,971-Speed 13099.55 samples/sec Loss 6.3012 LearningRate 0.0886 Epoch: 14 Global Step: 37180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:39,569-Speed 12822.31 samples/sec Loss 6.2144 LearningRate 0.0886 Epoch: 14 Global Step: 37190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:41,138-Speed 13073.71 samples/sec Loss 6.2314 LearningRate 0.0886 Epoch: 14 Global Step: 37200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:42,719-Speed 12964.22 samples/sec Loss 6.3028 LearningRate 0.0886 Epoch: 14 Global Step: 37210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:44,288-Speed 13061.22 samples/sec Loss 6.2360 LearningRate 0.0885 Epoch: 14 Global Step: 37220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:31:45,862-Speed 13016.66 samples/sec Loss 6.1852 LearningRate 0.0885 Epoch: 14 Global Step: 37230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:47,448-Speed 12916.35 samples/sec Loss 6.3233 LearningRate 0.0885 Epoch: 14 Global Step: 37240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:49,006-Speed 13151.24 samples/sec Loss 6.2190 LearningRate 0.0885 Epoch: 14 Global Step: 37250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:50,577-Speed 13043.62 samples/sec Loss 6.2272 LearningRate 0.0884 Epoch: 14 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:52,142-Speed 13094.97 samples/sec Loss 6.2188 LearningRate 0.0884 Epoch: 14 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:53,728-Speed 12915.82 samples/sec Loss 6.2415 LearningRate 0.0884 Epoch: 14 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:55,315-Speed 12919.88 samples/sec Loss 6.2601 LearningRate 0.0883 Epoch: 14 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:56,887-Speed 13028.50 samples/sec Loss 6.3051 LearningRate 0.0883 Epoch: 14 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:31:58,461-Speed 13015.20 samples/sec Loss 6.2143 LearningRate 0.0883 Epoch: 14 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:00,041-Speed 12970.60 samples/sec Loss 6.1576 LearningRate 0.0883 Epoch: 14 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:01,580-Speed 13314.89 samples/sec Loss 6.3321 LearningRate 0.0882 Epoch: 14 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:03,158-Speed 12984.51 samples/sec Loss 6.2189 LearningRate 0.0882 Epoch: 14 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:04,715-Speed 13161.14 samples/sec Loss 6.1666 LearningRate 0.0882 Epoch: 14 Global Step: 37350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:06,282-Speed 13076.93 samples/sec Loss 6.3070 LearningRate 0.0881 Epoch: 14 Global Step: 37360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:07,862-Speed 12967.13 samples/sec Loss 6.2624 LearningRate 0.0881 Epoch: 14 Global Step: 37370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:09,404-Speed 13292.45 samples/sec Loss 6.1562 LearningRate 0.0881 Epoch: 14 Global Step: 37380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:10,976-Speed 13036.29 samples/sec Loss 6.2752 LearningRate 0.0881 Epoch: 14 Global Step: 37390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:12,559-Speed 12947.25 samples/sec Loss 6.1700 LearningRate 0.0880 Epoch: 14 Global Step: 37400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:14,151-Speed 12869.55 samples/sec Loss 6.2986 LearningRate 0.0880 Epoch: 14 Global Step: 37410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:15,731-Speed 12966.86 samples/sec Loss 6.1921 LearningRate 0.0880 Epoch: 14 Global Step: 37420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:17,303-Speed 13035.51 samples/sec Loss 6.2589 LearningRate 0.0880 Epoch: 14 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:18,887-Speed 12937.06 samples/sec Loss 6.2362 LearningRate 0.0879 Epoch: 14 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:20,437-Speed 13216.62 samples/sec Loss 6.1563 LearningRate 0.0879 Epoch: 14 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:22,020-Speed 12945.67 samples/sec Loss 6.1651 LearningRate 0.0879 Epoch: 14 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:23,597-Speed 12991.13 samples/sec Loss 6.3682 LearningRate 0.0878 Epoch: 14 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:25,176-Speed 12976.64 samples/sec Loss 6.1276 LearningRate 0.0878 Epoch: 14 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:26,772-Speed 12844.89 samples/sec Loss 6.3127 LearningRate 0.0878 Epoch: 14 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:28,322-Speed 13218.16 samples/sec Loss 6.2658 LearningRate 0.0878 Epoch: 14 Global Step: 37500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:29,909-Speed 12917.00 samples/sec Loss 6.1885 LearningRate 0.0877 Epoch: 14 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:31,491-Speed 12952.54 samples/sec Loss 6.2529 LearningRate 0.0877 Epoch: 14 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:33,055-Speed 13104.68 samples/sec Loss 6.2020 LearningRate 0.0877 Epoch: 14 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:34,661-Speed 12759.09 samples/sec Loss 6.2565 LearningRate 0.0877 Epoch: 14 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:36,261-Speed 12803.63 samples/sec Loss 6.2225 LearningRate 0.0876 Epoch: 14 Global Step: 37550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:37,854-Speed 12866.14 samples/sec Loss 6.2226 LearningRate 0.0876 Epoch: 14 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:39,406-Speed 13193.41 samples/sec Loss 6.2621 LearningRate 0.0876 Epoch: 14 Global Step: 37570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:40,970-Speed 13103.69 samples/sec Loss 6.2326 LearningRate 0.0875 Epoch: 14 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:42,539-Speed 13059.28 samples/sec Loss 6.2814 LearningRate 0.0875 Epoch: 14 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:44,128-Speed 12892.50 samples/sec Loss 6.2163 LearningRate 0.0875 Epoch: 14 Global Step: 37600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:45,700-Speed 13039.59 samples/sec Loss 6.1788 LearningRate 0.0875 Epoch: 14 Global Step: 37610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:47,288-Speed 12904.15 samples/sec Loss 6.2367 LearningRate 0.0874 Epoch: 14 Global Step: 37620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:32:48,857-Speed 13061.27 samples/sec Loss 6.2577 LearningRate 0.0874 Epoch: 14 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:50,466-Speed 12740.08 samples/sec Loss 6.2433 LearningRate 0.0874 Epoch: 14 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:52,033-Speed 13069.47 samples/sec Loss 6.2945 LearningRate 0.0873 Epoch: 14 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:53,601-Speed 13072.90 samples/sec Loss 6.2980 LearningRate 0.0873 Epoch: 14 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:55,159-Speed 13149.42 samples/sec Loss 6.3283 LearningRate 0.0873 Epoch: 14 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:56,725-Speed 13084.79 samples/sec Loss 6.3309 LearningRate 0.0873 Epoch: 14 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:58,304-Speed 12982.25 samples/sec Loss 6.2825 LearningRate 0.0872 Epoch: 14 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:32:59,880-Speed 12997.72 samples/sec Loss 6.2611 LearningRate 0.0872 Epoch: 14 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:01,440-Speed 13136.40 samples/sec Loss 6.2094 LearningRate 0.0872 Epoch: 14 Global Step: 37710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:03,004-Speed 13103.48 samples/sec Loss 6.3065 LearningRate 0.0872 Epoch: 14 Global Step: 37720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:04,559-Speed 13175.74 samples/sec Loss 6.2683 LearningRate 0.0871 Epoch: 14 Global Step: 37730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:06,125-Speed 13086.06 samples/sec Loss 6.1930 LearningRate 0.0871 Epoch: 14 Global Step: 37740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:07,687-Speed 13117.96 samples/sec Loss 6.2843 LearningRate 0.0871 Epoch: 14 Global Step: 37750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:09,261-Speed 13018.70 samples/sec Loss 6.2402 LearningRate 0.0870 Epoch: 14 Global Step: 37760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:10,831-Speed 13047.13 samples/sec Loss 6.1778 LearningRate 0.0870 Epoch: 14 Global Step: 37770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:12,433-Speed 12794.71 samples/sec Loss 6.1579 LearningRate 0.0870 Epoch: 14 Global Step: 37780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:14,012-Speed 12975.81 samples/sec Loss 6.2833 LearningRate 0.0870 Epoch: 14 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:15,609-Speed 12830.83 samples/sec Loss 6.1668 LearningRate 0.0869 Epoch: 14 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:17,197-Speed 12898.44 samples/sec Loss 6.2307 LearningRate 0.0869 Epoch: 14 Global Step: 37810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:18,788-Speed 12885.06 samples/sec Loss 6.3218 LearningRate 0.0869 Epoch: 14 Global Step: 37820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:20,395-Speed 12747.00 samples/sec Loss 6.1388 LearningRate 0.0869 Epoch: 14 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:21,963-Speed 13071.83 samples/sec Loss 6.3217 LearningRate 0.0868 Epoch: 14 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:23,512-Speed 13224.70 samples/sec Loss 6.3082 LearningRate 0.0868 Epoch: 14 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:25,092-Speed 12963.55 samples/sec Loss 6.2619 LearningRate 0.0868 Epoch: 14 Global Step: 37860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:26,671-Speed 12984.83 samples/sec Loss 6.2539 LearningRate 0.0867 Epoch: 14 Global Step: 37870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:28,233-Speed 13114.70 samples/sec Loss 6.2357 LearningRate 0.0867 Epoch: 14 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:29,800-Speed 13075.24 samples/sec Loss 6.3179 LearningRate 0.0867 Epoch: 14 Global Step: 37890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:31,384-Speed 12939.08 samples/sec Loss 6.2867 LearningRate 0.0867 Epoch: 14 Global Step: 37900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:33:32,948-Speed 13101.91 samples/sec Loss 6.2614 LearningRate 0.0866 Epoch: 14 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:34,644-Speed 12079.88 samples/sec Loss 6.2142 LearningRate 0.0866 Epoch: 14 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:36,137-Speed 13719.16 samples/sec Loss 6.2243 LearningRate 0.0866 Epoch: 14 Global Step: 37930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:51,889-Speed 1300.34 samples/sec Loss 5.7953 LearningRate 0.0866 Epoch: 15 Global Step: 37940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:53,467-Speed 13036.42 samples/sec Loss 5.4048 LearningRate 0.0865 Epoch: 15 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:55,059-Speed 12871.52 samples/sec Loss 5.4363 LearningRate 0.0865 Epoch: 15 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:56,677-Speed 12661.11 samples/sec Loss 5.3364 LearningRate 0.0865 Epoch: 15 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:58,237-Speed 13132.02 samples/sec Loss 5.3346 LearningRate 0.0864 Epoch: 15 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:33:59,810-Speed 13028.76 samples/sec Loss 5.3504 LearningRate 0.0864 Epoch: 15 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:01,387-Speed 13010.47 samples/sec Loss 5.3175 LearningRate 0.0864 Epoch: 15 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:02,945-Speed 13155.22 samples/sec Loss 5.3703 LearningRate 0.0864 Epoch: 15 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:04,531-Speed 12918.51 samples/sec Loss 5.3841 LearningRate 0.0863 Epoch: 15 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:06,099-Speed 13067.01 samples/sec Loss 5.4936 LearningRate 0.0863 Epoch: 15 Global Step: 38030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:07,673-Speed 13056.57 samples/sec Loss 5.4137 LearningRate 0.0863 Epoch: 15 Global Step: 38040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:09,256-Speed 12940.69 samples/sec Loss 5.4694 LearningRate 0.0863 Epoch: 15 Global Step: 38050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:10,806-Speed 13226.32 samples/sec Loss 5.5659 LearningRate 0.0862 Epoch: 15 Global Step: 38060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:12,376-Speed 13050.44 samples/sec Loss 5.4350 LearningRate 0.0862 Epoch: 15 Global Step: 38070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:13,949-Speed 13019.92 samples/sec Loss 5.4373 LearningRate 0.0862 Epoch: 15 Global Step: 38080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:15,523-Speed 13026.66 samples/sec Loss 5.4662 LearningRate 0.0861 Epoch: 15 Global Step: 38090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:17,100-Speed 12993.20 samples/sec Loss 5.5079 LearningRate 0.0861 Epoch: 15 Global Step: 38100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:18,658-Speed 13150.21 samples/sec Loss 5.5677 LearningRate 0.0861 Epoch: 15 Global Step: 38110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:20,235-Speed 12987.62 samples/sec Loss 5.5441 LearningRate 0.0861 Epoch: 15 Global Step: 38120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:21,795-Speed 13138.72 samples/sec Loss 5.5411 LearningRate 0.0860 Epoch: 15 Global Step: 38130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:23,357-Speed 13117.42 samples/sec Loss 5.4956 LearningRate 0.0860 Epoch: 15 Global Step: 38140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:24,925-Speed 13067.07 samples/sec Loss 5.5869 LearningRate 0.0860 Epoch: 15 Global Step: 38150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:26,492-Speed 13079.29 samples/sec Loss 5.5524 LearningRate 0.0860 Epoch: 15 Global Step: 38160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:28,079-Speed 12917.35 samples/sec Loss 5.6161 LearningRate 0.0859 Epoch: 15 Global Step: 38170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:29,639-Speed 13134.39 samples/sec Loss 5.6137 LearningRate 0.0859 Epoch: 15 Global Step: 38180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:31,231-Speed 12867.96 samples/sec Loss 5.5867 LearningRate 0.0859 Epoch: 15 Global Step: 38190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:32,800-Speed 13062.89 samples/sec Loss 5.6033 LearningRate 0.0858 Epoch: 15 Global Step: 38200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:34,362-Speed 13119.25 samples/sec Loss 5.5912 LearningRate 0.0858 Epoch: 15 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:35,939-Speed 12997.17 samples/sec Loss 5.6711 LearningRate 0.0858 Epoch: 15 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:37,506-Speed 13073.30 samples/sec Loss 5.7012 LearningRate 0.0858 Epoch: 15 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:39,079-Speed 13026.09 samples/sec Loss 5.7320 LearningRate 0.0857 Epoch: 15 Global Step: 38240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:40,651-Speed 13033.61 samples/sec Loss 5.7471 LearningRate 0.0857 Epoch: 15 Global Step: 38250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:42,235-Speed 12941.63 samples/sec Loss 5.7025 LearningRate 0.0857 Epoch: 15 Global Step: 38260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:43,789-Speed 13179.97 samples/sec Loss 5.7684 LearningRate 0.0857 Epoch: 15 Global Step: 38270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:45,363-Speed 13022.83 samples/sec Loss 5.7221 LearningRate 0.0856 Epoch: 15 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:46,936-Speed 13032.12 samples/sec Loss 5.6683 LearningRate 0.0856 Epoch: 15 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:34:48,499-Speed 13106.06 samples/sec Loss 5.7199 LearningRate 0.0856 Epoch: 15 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:50,058-Speed 13143.10 samples/sec Loss 5.7814 LearningRate 0.0855 Epoch: 15 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:51,662-Speed 12777.85 samples/sec Loss 5.7069 LearningRate 0.0855 Epoch: 15 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:53,213-Speed 13212.01 samples/sec Loss 5.7469 LearningRate 0.0855 Epoch: 15 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:34:54,758-Speed 13263.62 samples/sec Loss 5.6701 LearningRate 0.0855 Epoch: 15 Global Step: 38340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:56,306-Speed 13243.08 samples/sec Loss 5.7733 LearningRate 0.0854 Epoch: 15 Global Step: 38350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:57,844-Speed 13318.56 samples/sec Loss 5.7366 LearningRate 0.0854 Epoch: 15 Global Step: 38360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:34:59,426-Speed 12955.75 samples/sec Loss 5.7725 LearningRate 0.0854 Epoch: 15 Global Step: 38370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:01,001-Speed 13012.67 samples/sec Loss 5.8114 LearningRate 0.0854 Epoch: 15 Global Step: 38380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:02,567-Speed 13084.92 samples/sec Loss 5.7541 LearningRate 0.0853 Epoch: 15 Global Step: 38390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:04,123-Speed 13164.61 samples/sec Loss 5.8480 LearningRate 0.0853 Epoch: 15 Global Step: 38400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:05,696-Speed 13032.28 samples/sec Loss 5.8535 LearningRate 0.0853 Epoch: 15 Global Step: 38410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:07,262-Speed 13083.28 samples/sec Loss 5.7658 LearningRate 0.0852 Epoch: 15 Global Step: 38420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:08,814-Speed 13199.95 samples/sec Loss 5.8160 LearningRate 0.0852 Epoch: 15 Global Step: 38430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:10,383-Speed 13056.69 samples/sec Loss 5.8037 LearningRate 0.0852 Epoch: 15 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:11,932-Speed 13231.12 samples/sec Loss 5.8408 LearningRate 0.0852 Epoch: 15 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:13,482-Speed 13223.32 samples/sec Loss 5.9078 LearningRate 0.0851 Epoch: 15 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:15,027-Speed 13256.13 samples/sec Loss 5.8316 LearningRate 0.0851 Epoch: 15 Global Step: 38470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:16,604-Speed 13000.29 samples/sec Loss 5.8456 LearningRate 0.0851 Epoch: 15 Global Step: 38480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:18,176-Speed 13033.37 samples/sec Loss 5.8548 LearningRate 0.0851 Epoch: 15 Global Step: 38490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:19,780-Speed 12774.64 samples/sec Loss 5.8837 LearningRate 0.0850 Epoch: 15 Global Step: 38500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:21,340-Speed 13137.97 samples/sec Loss 5.9180 LearningRate 0.0850 Epoch: 15 Global Step: 38510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:22,935-Speed 12846.28 samples/sec Loss 5.8616 LearningRate 0.0850 Epoch: 15 Global Step: 38520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:24,509-Speed 13020.45 samples/sec Loss 6.0055 LearningRate 0.0849 Epoch: 15 Global Step: 38530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:26,076-Speed 13081.06 samples/sec Loss 5.8807 LearningRate 0.0849 Epoch: 15 Global Step: 38540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:27,636-Speed 13128.62 samples/sec Loss 5.8379 LearningRate 0.0849 Epoch: 15 Global Step: 38550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:29,217-Speed 12964.76 samples/sec Loss 5.9057 LearningRate 0.0849 Epoch: 15 Global Step: 38560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:35:30,768-Speed 13211.45 samples/sec Loss 5.8046 LearningRate 0.0848 Epoch: 15 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:32,326-Speed 13148.33 samples/sec Loss 5.8652 LearningRate 0.0848 Epoch: 15 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:33,936-Speed 12727.45 samples/sec Loss 5.8387 LearningRate 0.0848 Epoch: 15 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:35,480-Speed 13276.20 samples/sec Loss 5.8438 LearningRate 0.0848 Epoch: 15 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:37,029-Speed 13228.01 samples/sec Loss 5.9238 LearningRate 0.0847 Epoch: 15 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:38,593-Speed 13093.63 samples/sec Loss 5.9054 LearningRate 0.0847 Epoch: 15 Global Step: 38620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:40,160-Speed 13077.41 samples/sec Loss 5.9225 LearningRate 0.0847 Epoch: 15 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:41,730-Speed 13058.17 samples/sec Loss 5.8391 LearningRate 0.0846 Epoch: 15 Global Step: 38640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:43,285-Speed 13173.59 samples/sec Loss 5.9549 LearningRate 0.0846 Epoch: 15 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:44,828-Speed 13281.38 samples/sec Loss 5.8712 LearningRate 0.0846 Epoch: 15 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:46,406-Speed 12992.01 samples/sec Loss 5.9591 LearningRate 0.0846 Epoch: 15 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:35:47,940-Speed 13351.43 samples/sec Loss 6.0194 LearningRate 0.0845 Epoch: 15 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:35:49,501-Speed 13123.61 samples/sec Loss 5.9156 LearningRate 0.0845 Epoch: 15 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:35:51,073-Speed 13041.54 samples/sec Loss 5.9113 LearningRate 0.0845 Epoch: 15 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:35:52,653-Speed 12961.70 samples/sec Loss 5.9266 LearningRate 0.0845 Epoch: 15 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:35:54,196-Speed 13286.59 samples/sec Loss 5.9328 LearningRate 0.0844 Epoch: 15 Global Step: 38720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:55,774-Speed 12986.22 samples/sec Loss 5.9445 LearningRate 0.0844 Epoch: 15 Global Step: 38730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:57,337-Speed 13105.97 samples/sec Loss 5.9429 LearningRate 0.0844 Epoch: 15 Global Step: 38740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:35:58,901-Speed 13097.03 samples/sec Loss 5.9342 LearningRate 0.0843 Epoch: 15 Global Step: 38750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:00,465-Speed 13102.99 samples/sec Loss 6.0001 LearningRate 0.0843 Epoch: 15 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:02,024-Speed 13150.22 samples/sec Loss 5.9659 LearningRate 0.0843 Epoch: 15 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:03,568-Speed 13270.95 samples/sec Loss 5.8818 LearningRate 0.0843 Epoch: 15 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:05,142-Speed 13017.13 samples/sec Loss 5.9564 LearningRate 0.0842 Epoch: 15 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:06,730-Speed 12913.46 samples/sec Loss 5.8999 LearningRate 0.0842 Epoch: 15 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:08,304-Speed 13015.33 samples/sec Loss 5.9250 LearningRate 0.0842 Epoch: 15 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:09,857-Speed 13192.54 samples/sec Loss 6.0186 LearningRate 0.0842 Epoch: 15 Global Step: 38820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:11,420-Speed 13113.49 samples/sec Loss 6.0272 LearningRate 0.0841 Epoch: 15 Global Step: 38830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:12,991-Speed 13043.95 samples/sec Loss 5.9462 LearningRate 0.0841 Epoch: 15 Global Step: 38840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:14,559-Speed 13065.42 samples/sec Loss 5.8939 LearningRate 0.0841 Epoch: 15 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:16,078-Speed 13490.82 samples/sec Loss 5.9739 LearningRate 0.0841 Epoch: 15 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:17,634-Speed 13172.13 samples/sec Loss 5.9843 LearningRate 0.0840 Epoch: 15 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:19,223-Speed 12892.55 samples/sec Loss 6.0665 LearningRate 0.0840 Epoch: 15 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:20,800-Speed 12993.53 samples/sec Loss 6.0263 LearningRate 0.0840 Epoch: 15 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:22,370-Speed 13049.51 samples/sec Loss 5.8933 LearningRate 0.0839 Epoch: 15 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:23,933-Speed 13116.41 samples/sec Loss 5.8687 LearningRate 0.0839 Epoch: 15 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:25,494-Speed 13128.20 samples/sec Loss 6.0632 LearningRate 0.0839 Epoch: 15 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:27,057-Speed 13106.76 samples/sec Loss 6.0495 LearningRate 0.0839 Epoch: 15 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:28,626-Speed 13065.03 samples/sec Loss 5.9500 LearningRate 0.0838 Epoch: 15 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:30,210-Speed 12932.72 samples/sec Loss 5.9674 LearningRate 0.0838 Epoch: 15 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:31,772-Speed 13119.77 samples/sec Loss 5.9884 LearningRate 0.0838 Epoch: 15 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:33,342-Speed 13051.66 samples/sec Loss 6.0093 LearningRate 0.0838 Epoch: 15 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:34,882-Speed 13301.90 samples/sec Loss 6.0455 LearningRate 0.0837 Epoch: 15 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:36,431-Speed 13236.15 samples/sec Loss 6.0484 LearningRate 0.0837 Epoch: 15 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:38,007-Speed 13000.79 samples/sec Loss 6.0429 LearningRate 0.0837 Epoch: 15 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:39,553-Speed 13255.73 samples/sec Loss 6.0065 LearningRate 0.0836 Epoch: 15 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:41,123-Speed 13048.34 samples/sec Loss 5.9543 LearningRate 0.0836 Epoch: 15 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:42,676-Speed 13198.19 samples/sec Loss 6.1061 LearningRate 0.0836 Epoch: 15 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:44,237-Speed 13124.62 samples/sec Loss 6.0391 LearningRate 0.0836 Epoch: 15 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:45,806-Speed 13068.18 samples/sec Loss 6.0186 LearningRate 0.0835 Epoch: 15 Global Step: 39050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:47,385-Speed 12973.02 samples/sec Loss 5.9107 LearningRate 0.0835 Epoch: 15 Global Step: 39060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:48,929-Speed 13271.98 samples/sec Loss 6.1070 LearningRate 0.0835 Epoch: 15 Global Step: 39070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:50,490-Speed 13131.43 samples/sec Loss 6.0298 LearningRate 0.0835 Epoch: 15 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:52,046-Speed 13172.38 samples/sec Loss 6.0850 LearningRate 0.0834 Epoch: 15 Global Step: 39090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:53,586-Speed 13299.49 samples/sec Loss 6.0429 LearningRate 0.0834 Epoch: 15 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:55,139-Speed 13196.48 samples/sec Loss 6.0545 LearningRate 0.0834 Epoch: 15 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:36:56,699-Speed 13140.88 samples/sec Loss 6.1434 LearningRate 0.0834 Epoch: 15 Global Step: 39120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:58,231-Speed 13373.41 samples/sec Loss 6.0415 LearningRate 0.0833 Epoch: 15 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:36:59,800-Speed 13059.73 samples/sec Loss 6.1127 LearningRate 0.0833 Epoch: 15 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:01,359-Speed 13139.89 samples/sec Loss 6.0446 LearningRate 0.0833 Epoch: 15 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:02,937-Speed 12987.54 samples/sec Loss 6.0481 LearningRate 0.0832 Epoch: 15 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:04,502-Speed 13095.29 samples/sec Loss 6.1368 LearningRate 0.0832 Epoch: 15 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:06,064-Speed 13125.00 samples/sec Loss 6.0477 LearningRate 0.0832 Epoch: 15 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:07,614-Speed 13216.48 samples/sec Loss 6.0091 LearningRate 0.0832 Epoch: 15 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:09,183-Speed 13063.48 samples/sec Loss 6.1917 LearningRate 0.0831 Epoch: 15 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:10,763-Speed 12976.17 samples/sec Loss 5.9847 LearningRate 0.0831 Epoch: 15 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:12,323-Speed 13133.56 samples/sec Loss 6.0136 LearningRate 0.0831 Epoch: 15 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:13,897-Speed 13013.15 samples/sec Loss 6.0380 LearningRate 0.0831 Epoch: 15 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:15,458-Speed 13128.21 samples/sec Loss 6.0840 LearningRate 0.0830 Epoch: 15 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:17,010-Speed 13202.73 samples/sec Loss 6.0004 LearningRate 0.0830 Epoch: 15 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:18,582-Speed 13033.57 samples/sec Loss 6.0347 LearningRate 0.0830 Epoch: 15 Global Step: 39260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:20,143-Speed 13127.55 samples/sec Loss 6.0294 LearningRate 0.0829 Epoch: 15 Global Step: 39270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:21,697-Speed 13187.90 samples/sec Loss 6.0000 LearningRate 0.0829 Epoch: 15 Global Step: 39280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:23,267-Speed 13051.36 samples/sec Loss 6.0095 LearningRate 0.0829 Epoch: 15 Global Step: 39290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:24,858-Speed 12873.38 samples/sec Loss 5.9560 LearningRate 0.0829 Epoch: 15 Global Step: 39300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:26,429-Speed 13048.70 samples/sec Loss 6.0302 LearningRate 0.0828 Epoch: 15 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:27,975-Speed 13251.32 samples/sec Loss 6.0618 LearningRate 0.0828 Epoch: 15 Global Step: 39320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:29,543-Speed 13073.19 samples/sec Loss 6.0534 LearningRate 0.0828 Epoch: 15 Global Step: 39330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:31,117-Speed 13024.09 samples/sec Loss 6.0641 LearningRate 0.0828 Epoch: 15 Global Step: 39340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:32,687-Speed 13044.72 samples/sec Loss 6.0700 LearningRate 0.0827 Epoch: 15 Global Step: 39350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:34,251-Speed 13101.48 samples/sec Loss 6.0741 LearningRate 0.0827 Epoch: 15 Global Step: 39360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:35,837-Speed 12922.62 samples/sec Loss 6.1166 LearningRate 0.0827 Epoch: 15 Global Step: 39370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:37,416-Speed 12975.33 samples/sec Loss 6.0116 LearningRate 0.0827 Epoch: 15 Global Step: 39380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:38,980-Speed 13099.14 samples/sec Loss 6.0511 LearningRate 0.0826 Epoch: 15 Global Step: 39390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:40,547-Speed 13079.11 samples/sec Loss 6.0627 LearningRate 0.0826 Epoch: 15 Global Step: 39400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:42,092-Speed 13263.56 samples/sec Loss 6.0595 LearningRate 0.0826 Epoch: 15 Global Step: 39410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:43,667-Speed 13012.73 samples/sec Loss 6.0315 LearningRate 0.0825 Epoch: 15 Global Step: 39420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:45,227-Speed 13133.34 samples/sec Loss 6.0501 LearningRate 0.0825 Epoch: 15 Global Step: 39430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:46,790-Speed 13110.87 samples/sec Loss 6.1319 LearningRate 0.0825 Epoch: 15 Global Step: 39440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:48,377-Speed 12907.84 samples/sec Loss 6.0454 LearningRate 0.0825 Epoch: 15 Global Step: 39450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:49,960-Speed 12945.51 samples/sec Loss 6.0989 LearningRate 0.0824 Epoch: 15 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:37:51,526-Speed 13093.94 samples/sec Loss 6.1153 LearningRate 0.0824 Epoch: 15 Global Step: 39470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:53,098-Speed 13034.18 samples/sec Loss 6.0818 LearningRate 0.0824 Epoch: 15 Global Step: 39480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:54,682-Speed 12937.15 samples/sec Loss 6.1897 LearningRate 0.0824 Epoch: 15 Global Step: 39490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:56,231-Speed 13230.53 samples/sec Loss 6.0475 LearningRate 0.0823 Epoch: 15 Global Step: 39500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:57,802-Speed 13034.20 samples/sec Loss 6.0525 LearningRate 0.0823 Epoch: 15 Global Step: 39510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:37:59,369-Speed 13080.89 samples/sec Loss 6.0840 LearningRate 0.0823 Epoch: 15 Global Step: 39520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:00,930-Speed 13124.99 samples/sec Loss 5.9714 LearningRate 0.0823 Epoch: 15 Global Step: 39530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:02,481-Speed 13211.35 samples/sec Loss 6.1231 LearningRate 0.0822 Epoch: 15 Global Step: 39540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:04,030-Speed 13232.95 samples/sec Loss 6.0371 LearningRate 0.0822 Epoch: 15 Global Step: 39550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:05,608-Speed 12981.78 samples/sec Loss 6.1020 LearningRate 0.0822 Epoch: 15 Global Step: 39560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:07,183-Speed 13010.61 samples/sec Loss 6.2461 LearningRate 0.0821 Epoch: 15 Global Step: 39570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:08,734-Speed 13213.18 samples/sec Loss 6.0641 LearningRate 0.0821 Epoch: 15 Global Step: 39580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:10,293-Speed 13141.81 samples/sec Loss 6.1310 LearningRate 0.0821 Epoch: 15 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:11,856-Speed 13110.71 samples/sec Loss 6.0122 LearningRate 0.0821 Epoch: 15 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:13,417-Speed 13125.63 samples/sec Loss 6.1767 LearningRate 0.0820 Epoch: 15 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:14,989-Speed 13036.49 samples/sec Loss 6.0655 LearningRate 0.0820 Epoch: 15 Global Step: 39620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:16,551-Speed 13121.56 samples/sec Loss 6.0486 LearningRate 0.0820 Epoch: 15 Global Step: 39630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:18,112-Speed 13130.10 samples/sec Loss 6.0678 LearningRate 0.0820 Epoch: 15 Global Step: 39640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:19,646-Speed 13354.11 samples/sec Loss 6.1182 LearningRate 0.0819 Epoch: 15 Global Step: 39650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:21,211-Speed 13097.97 samples/sec Loss 6.2202 LearningRate 0.0819 Epoch: 15 Global Step: 39660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:22,799-Speed 12897.77 samples/sec Loss 6.1966 LearningRate 0.0819 Epoch: 15 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:38:24,338-Speed 13315.06 samples/sec Loss 6.1311 LearningRate 0.0819 Epoch: 15 Global Step: 39680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:25,885-Speed 13248.96 samples/sec Loss 6.1505 LearningRate 0.0818 Epoch: 15 Global Step: 39690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:27,440-Speed 13174.10 samples/sec Loss 5.9869 LearningRate 0.0818 Epoch: 15 Global Step: 39700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:29,005-Speed 13096.83 samples/sec Loss 6.0004 LearningRate 0.0818 Epoch: 15 Global Step: 39710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:30,556-Speed 13207.01 samples/sec Loss 6.1557 LearningRate 0.0817 Epoch: 15 Global Step: 39720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:32,116-Speed 13139.39 samples/sec Loss 6.0660 LearningRate 0.0817 Epoch: 15 Global Step: 39730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:33,663-Speed 13246.34 samples/sec Loss 6.0629 LearningRate 0.0817 Epoch: 15 Global Step: 39740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:35,230-Speed 13073.34 samples/sec Loss 6.1034 LearningRate 0.0817 Epoch: 15 Global Step: 39750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:36,784-Speed 13195.81 samples/sec Loss 6.0985 LearningRate 0.0816 Epoch: 15 Global Step: 39760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:38,350-Speed 13082.21 samples/sec Loss 6.1289 LearningRate 0.0816 Epoch: 15 Global Step: 39770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:39,917-Speed 13072.78 samples/sec Loss 6.1778 LearningRate 0.0816 Epoch: 15 Global Step: 39780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:41,471-Speed 13185.29 samples/sec Loss 6.0932 LearningRate 0.0816 Epoch: 15 Global Step: 39790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:43,036-Speed 13098.73 samples/sec Loss 6.0397 LearningRate 0.0815 Epoch: 15 Global Step: 39800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:44,593-Speed 13155.46 samples/sec Loss 6.1014 LearningRate 0.0815 Epoch: 15 Global Step: 39810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:46,163-Speed 13053.64 samples/sec Loss 6.0530 LearningRate 0.0815 Epoch: 15 Global Step: 39820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:47,733-Speed 13053.89 samples/sec Loss 6.1457 LearningRate 0.0815 Epoch: 15 Global Step: 39830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:49,301-Speed 13062.27 samples/sec Loss 6.1684 LearningRate 0.0814 Epoch: 15 Global Step: 39840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:50,859-Speed 13154.08 samples/sec Loss 6.1269 LearningRate 0.0814 Epoch: 15 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:38:52,416-Speed 13164.87 samples/sec Loss 6.1527 LearningRate 0.0814 Epoch: 15 Global Step: 39860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:53,974-Speed 13143.41 samples/sec Loss 6.1536 LearningRate 0.0813 Epoch: 15 Global Step: 39870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:55,548-Speed 13021.71 samples/sec Loss 6.1701 LearningRate 0.0813 Epoch: 15 Global Step: 39880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:57,117-Speed 13060.11 samples/sec Loss 6.0832 LearningRate 0.0813 Epoch: 15 Global Step: 39890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:38:58,678-Speed 13132.60 samples/sec Loss 6.1586 LearningRate 0.0813 Epoch: 15 Global Step: 39900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:39:00,233-Speed 13171.88 samples/sec Loss 6.1745 LearningRate 0.0812 Epoch: 15 Global Step: 39910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:39:01,788-Speed 13185.09 samples/sec Loss 6.0120 LearningRate 0.0812 Epoch: 15 Global Step: 39920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:39:03,342-Speed 13183.18 samples/sec Loss 6.1324 LearningRate 0.0812 Epoch: 15 Global Step: 39930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:39:04,910-Speed 13062.73 samples/sec Loss 5.9740 LearningRate 0.0812 Epoch: 15 Global Step: 39940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:39:06,476-Speed 13087.49 samples/sec Loss 6.0491 LearningRate 0.0811 Epoch: 15 Global Step: 39950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:39:08,041-Speed 13096.55 samples/sec Loss 6.0687 LearningRate 0.0811 Epoch: 15 Global Step: 39960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:39:09,610-Speed 13053.50 samples/sec Loss 6.1123 LearningRate 0.0811 Epoch: 15 Global Step: 39970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:39:11,192-Speed 12958.20 samples/sec Loss 6.1473 LearningRate 0.0811 Epoch: 15 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:39:12,777-Speed 12923.86 samples/sec Loss 6.1319 LearningRate 0.0810 Epoch: 15 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:39:14,349-Speed 13038.07 samples/sec Loss 6.2048 LearningRate 0.0810 Epoch: 15 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:39:36,460-[lfw][40000]XNorm: 10.595803 Training: 2022-01-14 15:39:36,461-[lfw][40000]Accuracy-Flip: 0.99417+-0.00344 Training: 2022-01-14 15:39:36,461-[lfw][40000]Accuracy-Highest: 0.99583 Training: 2022-01-14 15:40:02,584-[cfp_fp][40000]XNorm: 8.936859 Training: 2022-01-14 15:40:02,585-[cfp_fp][40000]Accuracy-Flip: 0.95886+-0.00975 Training: 2022-01-14 15:40:02,586-[cfp_fp][40000]Accuracy-Highest: 0.95886 Training: 2022-01-14 15:40:25,100-[agedb_30][40000]XNorm: 10.232083 Training: 2022-01-14 15:40:25,101-[agedb_30][40000]Accuracy-Flip: 0.96567+-0.00831 Training: 2022-01-14 15:40:25,102-[agedb_30][40000]Accuracy-Highest: 0.96567 Training: 2022-01-14 15:40:26,644-Speed 283.29 samples/sec Loss 6.1560 LearningRate 0.0810 Epoch: 15 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:40:28,172-Speed 13410.31 samples/sec Loss 6.1166 LearningRate 0.0809 Epoch: 15 Global Step: 40020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:29,726-Speed 13183.75 samples/sec Loss 6.1081 LearningRate 0.0809 Epoch: 15 Global Step: 40030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:31,306-Speed 12967.76 samples/sec Loss 6.0064 LearningRate 0.0809 Epoch: 15 Global Step: 40040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:32,883-Speed 12989.56 samples/sec Loss 6.0918 LearningRate 0.0809 Epoch: 15 Global Step: 40050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:34,435-Speed 13206.68 samples/sec Loss 6.1410 LearningRate 0.0808 Epoch: 15 Global Step: 40060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:35,996-Speed 13131.22 samples/sec Loss 6.1335 LearningRate 0.0808 Epoch: 15 Global Step: 40070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:37,562-Speed 13082.10 samples/sec Loss 5.9851 LearningRate 0.0808 Epoch: 15 Global Step: 40080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:39,130-Speed 13065.98 samples/sec Loss 6.1228 LearningRate 0.0808 Epoch: 15 Global Step: 40090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:40,692-Speed 13121.36 samples/sec Loss 6.0839 LearningRate 0.0807 Epoch: 15 Global Step: 40100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:42,253-Speed 13127.54 samples/sec Loss 6.1024 LearningRate 0.0807 Epoch: 15 Global Step: 40110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:43,820-Speed 13074.10 samples/sec Loss 6.1701 LearningRate 0.0807 Epoch: 15 Global Step: 40120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:40:45,381-Speed 13128.77 samples/sec Loss 6.1293 LearningRate 0.0807 Epoch: 15 Global Step: 40130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:40:46,942-Speed 13129.92 samples/sec Loss 6.1468 LearningRate 0.0806 Epoch: 15 Global Step: 40140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:40:48,500-Speed 13149.92 samples/sec Loss 6.1333 LearningRate 0.0806 Epoch: 15 Global Step: 40150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:40:50,066-Speed 13081.16 samples/sec Loss 6.1138 LearningRate 0.0806 Epoch: 15 Global Step: 40160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:51,630-Speed 13105.23 samples/sec Loss 6.1182 LearningRate 0.0806 Epoch: 15 Global Step: 40170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:53,201-Speed 13043.02 samples/sec Loss 6.1177 LearningRate 0.0805 Epoch: 15 Global Step: 40180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:54,772-Speed 13042.44 samples/sec Loss 6.0378 LearningRate 0.0805 Epoch: 15 Global Step: 40190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:56,342-Speed 13056.56 samples/sec Loss 6.1209 LearningRate 0.0805 Epoch: 15 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:57,902-Speed 13132.61 samples/sec Loss 6.1562 LearningRate 0.0804 Epoch: 15 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:40:59,470-Speed 13069.47 samples/sec Loss 6.1150 LearningRate 0.0804 Epoch: 15 Global Step: 40220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:01,034-Speed 13103.91 samples/sec Loss 6.0713 LearningRate 0.0804 Epoch: 15 Global Step: 40230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:02,614-Speed 12966.50 samples/sec Loss 6.1407 LearningRate 0.0804 Epoch: 15 Global Step: 40240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:04,191-Speed 12995.30 samples/sec Loss 6.0547 LearningRate 0.0803 Epoch: 15 Global Step: 40250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:05,777-Speed 12922.57 samples/sec Loss 6.2064 LearningRate 0.0803 Epoch: 15 Global Step: 40260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:07,345-Speed 13067.94 samples/sec Loss 6.1350 LearningRate 0.0803 Epoch: 15 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:08,914-Speed 13056.76 samples/sec Loss 6.1939 LearningRate 0.0803 Epoch: 15 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:10,478-Speed 13107.84 samples/sec Loss 6.0879 LearningRate 0.0802 Epoch: 15 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:12,047-Speed 13052.93 samples/sec Loss 6.1257 LearningRate 0.0802 Epoch: 15 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:13,632-Speed 12932.71 samples/sec Loss 6.1273 LearningRate 0.0802 Epoch: 15 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:15,203-Speed 13039.85 samples/sec Loss 6.0518 LearningRate 0.0802 Epoch: 15 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:16,763-Speed 13140.94 samples/sec Loss 6.0612 LearningRate 0.0801 Epoch: 15 Global Step: 40330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:18,323-Speed 13133.56 samples/sec Loss 6.1136 LearningRate 0.0801 Epoch: 15 Global Step: 40340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:19,893-Speed 13059.04 samples/sec Loss 6.0876 LearningRate 0.0801 Epoch: 15 Global Step: 40350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:21,459-Speed 13082.08 samples/sec Loss 6.1314 LearningRate 0.0801 Epoch: 15 Global Step: 40360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:23,032-Speed 13024.82 samples/sec Loss 6.1698 LearningRate 0.0800 Epoch: 15 Global Step: 40370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:24,614-Speed 12949.17 samples/sec Loss 6.1310 LearningRate 0.0800 Epoch: 15 Global Step: 40380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:26,168-Speed 13203.51 samples/sec Loss 6.1987 LearningRate 0.0800 Epoch: 15 Global Step: 40390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:27,743-Speed 13011.55 samples/sec Loss 6.1017 LearningRate 0.0799 Epoch: 15 Global Step: 40400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:29,307-Speed 13098.93 samples/sec Loss 6.1173 LearningRate 0.0799 Epoch: 15 Global Step: 40410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:30,870-Speed 13110.28 samples/sec Loss 6.1353 LearningRate 0.0799 Epoch: 15 Global Step: 40420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:41:32,476-Speed 12755.62 samples/sec Loss 6.1184 LearningRate 0.0799 Epoch: 15 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:34,053-Speed 12989.96 samples/sec Loss 6.0834 LearningRate 0.0798 Epoch: 15 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:35,690-Speed 12522.45 samples/sec Loss 6.0837 LearningRate 0.0798 Epoch: 15 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:37,161-Speed 13931.65 samples/sec Loss 6.1610 LearningRate 0.0798 Epoch: 15 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:51,017-Speed 1478.25 samples/sec Loss 5.5692 LearningRate 0.0798 Epoch: 16 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:52,637-Speed 12652.72 samples/sec Loss 5.2246 LearningRate 0.0797 Epoch: 16 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:54,243-Speed 12756.46 samples/sec Loss 5.2926 LearningRate 0.0797 Epoch: 16 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:55,798-Speed 13179.31 samples/sec Loss 5.2013 LearningRate 0.0797 Epoch: 16 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:57,358-Speed 13130.58 samples/sec Loss 5.2705 LearningRate 0.0797 Epoch: 16 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:41:58,929-Speed 13045.86 samples/sec Loss 5.2332 LearningRate 0.0796 Epoch: 16 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:00,480-Speed 13210.73 samples/sec Loss 5.2575 LearningRate 0.0796 Epoch: 16 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:42:02,053-Speed 13030.00 samples/sec Loss 5.3235 LearningRate 0.0796 Epoch: 16 Global Step: 40540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:03,614-Speed 13119.98 samples/sec Loss 5.2506 LearningRate 0.0796 Epoch: 16 Global Step: 40550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:05,195-Speed 12960.02 samples/sec Loss 5.3660 LearningRate 0.0795 Epoch: 16 Global Step: 40560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:06,748-Speed 13194.37 samples/sec Loss 5.2483 LearningRate 0.0795 Epoch: 16 Global Step: 40570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:08,322-Speed 13021.44 samples/sec Loss 5.2557 LearningRate 0.0795 Epoch: 16 Global Step: 40580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:09,917-Speed 12844.31 samples/sec Loss 5.4027 LearningRate 0.0794 Epoch: 16 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:11,488-Speed 13045.61 samples/sec Loss 5.3446 LearningRate 0.0794 Epoch: 16 Global Step: 40600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:13,074-Speed 12926.25 samples/sec Loss 5.3672 LearningRate 0.0794 Epoch: 16 Global Step: 40610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:14,625-Speed 13204.76 samples/sec Loss 5.3081 LearningRate 0.0794 Epoch: 16 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:16,207-Speed 12956.69 samples/sec Loss 5.4361 LearningRate 0.0793 Epoch: 16 Global Step: 40630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:17,775-Speed 13066.30 samples/sec Loss 5.3079 LearningRate 0.0793 Epoch: 16 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:42:19,351-Speed 13003.37 samples/sec Loss 5.4869 LearningRate 0.0793 Epoch: 16 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:42:20,931-Speed 12966.19 samples/sec Loss 5.4579 LearningRate 0.0793 Epoch: 16 Global Step: 40660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:22,526-Speed 12850.81 samples/sec Loss 5.3872 LearningRate 0.0792 Epoch: 16 Global Step: 40670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:24,075-Speed 13227.71 samples/sec Loss 5.5345 LearningRate 0.0792 Epoch: 16 Global Step: 40680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:25,645-Speed 13055.93 samples/sec Loss 5.4875 LearningRate 0.0792 Epoch: 16 Global Step: 40690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:27,219-Speed 13014.50 samples/sec Loss 5.3873 LearningRate 0.0792 Epoch: 16 Global Step: 40700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:28,782-Speed 13108.93 samples/sec Loss 5.4758 LearningRate 0.0791 Epoch: 16 Global Step: 40710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:30,344-Speed 13123.19 samples/sec Loss 5.4822 LearningRate 0.0791 Epoch: 16 Global Step: 40720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:31,925-Speed 12962.96 samples/sec Loss 5.5341 LearningRate 0.0791 Epoch: 16 Global Step: 40730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:33,509-Speed 12937.08 samples/sec Loss 5.4511 LearningRate 0.0791 Epoch: 16 Global Step: 40740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:35,078-Speed 13058.98 samples/sec Loss 5.5804 LearningRate 0.0790 Epoch: 16 Global Step: 40750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:36,668-Speed 12891.88 samples/sec Loss 5.4905 LearningRate 0.0790 Epoch: 16 Global Step: 40760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:38,246-Speed 12977.08 samples/sec Loss 5.4633 LearningRate 0.0790 Epoch: 16 Global Step: 40770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:39,831-Speed 12932.81 samples/sec Loss 5.5972 LearningRate 0.0789 Epoch: 16 Global Step: 40780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:41,419-Speed 12902.10 samples/sec Loss 5.5656 LearningRate 0.0789 Epoch: 16 Global Step: 40790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:42,995-Speed 12997.11 samples/sec Loss 5.5631 LearningRate 0.0789 Epoch: 16 Global Step: 40800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:44,579-Speed 12940.90 samples/sec Loss 5.5515 LearningRate 0.0789 Epoch: 16 Global Step: 40810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:42:46,149-Speed 13054.41 samples/sec Loss 5.6206 LearningRate 0.0788 Epoch: 16 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:47,723-Speed 13014.64 samples/sec Loss 5.6134 LearningRate 0.0788 Epoch: 16 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:49,308-Speed 12924.95 samples/sec Loss 5.6210 LearningRate 0.0788 Epoch: 16 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:50,909-Speed 12802.94 samples/sec Loss 5.6451 LearningRate 0.0788 Epoch: 16 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:52,477-Speed 13064.02 samples/sec Loss 5.6387 LearningRate 0.0787 Epoch: 16 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:54,041-Speed 13106.45 samples/sec Loss 5.6512 LearningRate 0.0787 Epoch: 16 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:55,609-Speed 13071.45 samples/sec Loss 5.7068 LearningRate 0.0787 Epoch: 16 Global Step: 40880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:57,160-Speed 13212.31 samples/sec Loss 5.6462 LearningRate 0.0787 Epoch: 16 Global Step: 40890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:42:58,717-Speed 13156.58 samples/sec Loss 5.6205 LearningRate 0.0786 Epoch: 16 Global Step: 40900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:00,281-Speed 13108.63 samples/sec Loss 5.5689 LearningRate 0.0786 Epoch: 16 Global Step: 40910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:01,840-Speed 13137.67 samples/sec Loss 5.6949 LearningRate 0.0786 Epoch: 16 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:43:03,409-Speed 13062.98 samples/sec Loss 5.6634 LearningRate 0.0786 Epoch: 16 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:43:04,991-Speed 12954.39 samples/sec Loss 5.7161 LearningRate 0.0785 Epoch: 16 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:43:06,549-Speed 13149.83 samples/sec Loss 5.7236 LearningRate 0.0785 Epoch: 16 Global Step: 40950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:08,154-Speed 12765.05 samples/sec Loss 5.6632 LearningRate 0.0785 Epoch: 16 Global Step: 40960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:09,724-Speed 13051.86 samples/sec Loss 5.6976 LearningRate 0.0785 Epoch: 16 Global Step: 40970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:11,305-Speed 12964.65 samples/sec Loss 5.6969 LearningRate 0.0784 Epoch: 16 Global Step: 40980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:12,876-Speed 13035.20 samples/sec Loss 5.7162 LearningRate 0.0784 Epoch: 16 Global Step: 40990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:14,465-Speed 12896.88 samples/sec Loss 5.7191 LearningRate 0.0784 Epoch: 16 Global Step: 41000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:16,026-Speed 13134.51 samples/sec Loss 5.7157 LearningRate 0.0783 Epoch: 16 Global Step: 41010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:17,627-Speed 12794.18 samples/sec Loss 5.7882 LearningRate 0.0783 Epoch: 16 Global Step: 41020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:19,186-Speed 13163.78 samples/sec Loss 5.6211 LearningRate 0.0783 Epoch: 16 Global Step: 41030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:20,780-Speed 12862.92 samples/sec Loss 5.7862 LearningRate 0.0783 Epoch: 16 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:22,360-Speed 12961.86 samples/sec Loss 5.7321 LearningRate 0.0782 Epoch: 16 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:43:23,929-Speed 13059.49 samples/sec Loss 5.6794 LearningRate 0.0782 Epoch: 16 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:43:25,518-Speed 12896.41 samples/sec Loss 5.6817 LearningRate 0.0782 Epoch: 16 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:43:27,052-Speed 13362.74 samples/sec Loss 5.7753 LearningRate 0.0782 Epoch: 16 Global Step: 41080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:28,620-Speed 13058.92 samples/sec Loss 5.7906 LearningRate 0.0781 Epoch: 16 Global Step: 41090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:30,213-Speed 12871.17 samples/sec Loss 5.7584 LearningRate 0.0781 Epoch: 16 Global Step: 41100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:31,793-Speed 12961.81 samples/sec Loss 5.7043 LearningRate 0.0781 Epoch: 16 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:33,380-Speed 12908.81 samples/sec Loss 5.7761 LearningRate 0.0781 Epoch: 16 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:34,943-Speed 13111.41 samples/sec Loss 5.8174 LearningRate 0.0780 Epoch: 16 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:36,526-Speed 12952.42 samples/sec Loss 5.8147 LearningRate 0.0780 Epoch: 16 Global Step: 41140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:38,085-Speed 13140.92 samples/sec Loss 5.7749 LearningRate 0.0780 Epoch: 16 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:39,645-Speed 13132.40 samples/sec Loss 5.8517 LearningRate 0.0780 Epoch: 16 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:41,237-Speed 12873.39 samples/sec Loss 5.7685 LearningRate 0.0779 Epoch: 16 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:42,789-Speed 13205.17 samples/sec Loss 5.7102 LearningRate 0.0779 Epoch: 16 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:44,373-Speed 12927.54 samples/sec Loss 5.7232 LearningRate 0.0779 Epoch: 16 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:45,938-Speed 13116.01 samples/sec Loss 5.7826 LearningRate 0.0779 Epoch: 16 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:47,510-Speed 13031.05 samples/sec Loss 5.7608 LearningRate 0.0778 Epoch: 16 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:43:49,070-Speed 13136.76 samples/sec Loss 5.7819 LearningRate 0.0778 Epoch: 16 Global Step: 41220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:43:50,637-Speed 13073.78 samples/sec Loss 5.7688 LearningRate 0.0778 Epoch: 16 Global Step: 41230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:43:52,224-Speed 12910.72 samples/sec Loss 5.6841 LearningRate 0.0778 Epoch: 16 Global Step: 41240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:43:53,797-Speed 13031.05 samples/sec Loss 5.7714 LearningRate 0.0777 Epoch: 16 Global Step: 41250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:43:55,372-Speed 13008.56 samples/sec Loss 5.7924 LearningRate 0.0777 Epoch: 16 Global Step: 41260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:43:56,950-Speed 12984.60 samples/sec Loss 5.8485 LearningRate 0.0777 Epoch: 16 Global Step: 41270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:43:58,514-Speed 13099.52 samples/sec Loss 5.8905 LearningRate 0.0776 Epoch: 16 Global Step: 41280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:00,114-Speed 12814.94 samples/sec Loss 5.7922 LearningRate 0.0776 Epoch: 16 Global Step: 41290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:01,704-Speed 12883.89 samples/sec Loss 5.8431 LearningRate 0.0776 Epoch: 16 Global Step: 41300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:03,278-Speed 13017.11 samples/sec Loss 5.8540 LearningRate 0.0776 Epoch: 16 Global Step: 41310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:04,851-Speed 13027.65 samples/sec Loss 5.7952 LearningRate 0.0775 Epoch: 16 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:06,436-Speed 12927.33 samples/sec Loss 5.8157 LearningRate 0.0775 Epoch: 16 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:08,010-Speed 13017.20 samples/sec Loss 5.7864 LearningRate 0.0775 Epoch: 16 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:09,611-Speed 12794.91 samples/sec Loss 5.7965 LearningRate 0.0775 Epoch: 16 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:11,177-Speed 13092.02 samples/sec Loss 5.8046 LearningRate 0.0774 Epoch: 16 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:12,775-Speed 12817.24 samples/sec Loss 5.8338 LearningRate 0.0774 Epoch: 16 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:14,369-Speed 12888.40 samples/sec Loss 5.7210 LearningRate 0.0774 Epoch: 16 Global Step: 41380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:15,901-Speed 13376.53 samples/sec Loss 5.8347 LearningRate 0.0774 Epoch: 16 Global Step: 41390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:17,473-Speed 13039.15 samples/sec Loss 5.8150 LearningRate 0.0773 Epoch: 16 Global Step: 41400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:19,031-Speed 13145.53 samples/sec Loss 5.8246 LearningRate 0.0773 Epoch: 16 Global Step: 41410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:20,612-Speed 12967.90 samples/sec Loss 5.8180 LearningRate 0.0773 Epoch: 16 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:44:22,175-Speed 13110.99 samples/sec Loss 5.8652 LearningRate 0.0773 Epoch: 16 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:44:23,754-Speed 12974.74 samples/sec Loss 5.9544 LearningRate 0.0772 Epoch: 16 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:44:25,298-Speed 13273.14 samples/sec Loss 5.8539 LearningRate 0.0772 Epoch: 16 Global Step: 41450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:26,874-Speed 12999.77 samples/sec Loss 5.8897 LearningRate 0.0772 Epoch: 16 Global Step: 41460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:28,467-Speed 12862.95 samples/sec Loss 5.8480 LearningRate 0.0772 Epoch: 16 Global Step: 41470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:30,025-Speed 13152.08 samples/sec Loss 5.8508 LearningRate 0.0771 Epoch: 16 Global Step: 41480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:31,583-Speed 13156.69 samples/sec Loss 5.9671 LearningRate 0.0771 Epoch: 16 Global Step: 41490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:33,158-Speed 13006.28 samples/sec Loss 5.8542 LearningRate 0.0771 Epoch: 16 Global Step: 41500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:34,727-Speed 13055.78 samples/sec Loss 5.8963 LearningRate 0.0771 Epoch: 16 Global Step: 41510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:36,297-Speed 13056.22 samples/sec Loss 5.9135 LearningRate 0.0770 Epoch: 16 Global Step: 41520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:37,876-Speed 12979.64 samples/sec Loss 5.8412 LearningRate 0.0770 Epoch: 16 Global Step: 41530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:39,438-Speed 13116.35 samples/sec Loss 5.8261 LearningRate 0.0770 Epoch: 16 Global Step: 41540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:41,015-Speed 12998.59 samples/sec Loss 5.9113 LearningRate 0.0769 Epoch: 16 Global Step: 41550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:42,592-Speed 12986.56 samples/sec Loss 5.8771 LearningRate 0.0769 Epoch: 16 Global Step: 41560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:44,173-Speed 12965.60 samples/sec Loss 5.8356 LearningRate 0.0769 Epoch: 16 Global Step: 41570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:45,740-Speed 13074.48 samples/sec Loss 5.8312 LearningRate 0.0769 Epoch: 16 Global Step: 41580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:47,317-Speed 12988.40 samples/sec Loss 5.8507 LearningRate 0.0768 Epoch: 16 Global Step: 41590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:48,900-Speed 12948.20 samples/sec Loss 5.9183 LearningRate 0.0768 Epoch: 16 Global Step: 41600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:50,461-Speed 13128.57 samples/sec Loss 5.9880 LearningRate 0.0768 Epoch: 16 Global Step: 41610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:44:52,042-Speed 12960.66 samples/sec Loss 5.8474 LearningRate 0.0768 Epoch: 16 Global Step: 41620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:53,581-Speed 13312.00 samples/sec Loss 5.7955 LearningRate 0.0767 Epoch: 16 Global Step: 41630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:55,157-Speed 12997.02 samples/sec Loss 5.8854 LearningRate 0.0767 Epoch: 16 Global Step: 41640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:56,709-Speed 13204.93 samples/sec Loss 5.8897 LearningRate 0.0767 Epoch: 16 Global Step: 41650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:58,276-Speed 13081.20 samples/sec Loss 5.8654 LearningRate 0.0767 Epoch: 16 Global Step: 41660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:44:59,834-Speed 13145.98 samples/sec Loss 5.8904 LearningRate 0.0766 Epoch: 16 Global Step: 41670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:45:01,400-Speed 13090.18 samples/sec Loss 5.9056 LearningRate 0.0766 Epoch: 16 Global Step: 41680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:45:02,958-Speed 13149.04 samples/sec Loss 5.8908 LearningRate 0.0766 Epoch: 16 Global Step: 41690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:45:04,532-Speed 13017.95 samples/sec Loss 5.8617 LearningRate 0.0766 Epoch: 16 Global Step: 41700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:45:06,132-Speed 12810.27 samples/sec Loss 5.8705 LearningRate 0.0765 Epoch: 16 Global Step: 41710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:45:07,693-Speed 13126.87 samples/sec Loss 5.9574 LearningRate 0.0765 Epoch: 16 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:09,265-Speed 13030.01 samples/sec Loss 5.8627 LearningRate 0.0765 Epoch: 16 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:10,822-Speed 13165.97 samples/sec Loss 5.9791 LearningRate 0.0765 Epoch: 16 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:12,386-Speed 13093.53 samples/sec Loss 5.8678 LearningRate 0.0764 Epoch: 16 Global Step: 41750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:13,965-Speed 12976.47 samples/sec Loss 5.8783 LearningRate 0.0764 Epoch: 16 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:15,570-Speed 12775.68 samples/sec Loss 5.8112 LearningRate 0.0764 Epoch: 16 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:17,143-Speed 13029.70 samples/sec Loss 5.9267 LearningRate 0.0764 Epoch: 16 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:18,724-Speed 12981.19 samples/sec Loss 5.9807 LearningRate 0.0763 Epoch: 16 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:20,293-Speed 13058.04 samples/sec Loss 6.0350 LearningRate 0.0763 Epoch: 16 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:21,851-Speed 13151.00 samples/sec Loss 5.8651 LearningRate 0.0763 Epoch: 16 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:23,418-Speed 13075.12 samples/sec Loss 5.8454 LearningRate 0.0763 Epoch: 16 Global Step: 41820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:24,996-Speed 12982.01 samples/sec Loss 5.9678 LearningRate 0.0762 Epoch: 16 Global Step: 41830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:26,562-Speed 13089.36 samples/sec Loss 5.9568 LearningRate 0.0762 Epoch: 16 Global Step: 41840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:28,133-Speed 13045.25 samples/sec Loss 5.7805 LearningRate 0.0762 Epoch: 16 Global Step: 41850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:29,707-Speed 13013.90 samples/sec Loss 5.9730 LearningRate 0.0762 Epoch: 16 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:31,270-Speed 13108.37 samples/sec Loss 5.9242 LearningRate 0.0761 Epoch: 16 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:32,839-Speed 13062.44 samples/sec Loss 5.9300 LearningRate 0.0761 Epoch: 16 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:34,421-Speed 12952.07 samples/sec Loss 5.9550 LearningRate 0.0761 Epoch: 16 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:36,003-Speed 12957.35 samples/sec Loss 5.9890 LearningRate 0.0760 Epoch: 16 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:37,559-Speed 13169.22 samples/sec Loss 5.8510 LearningRate 0.0760 Epoch: 16 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:39,152-Speed 12859.68 samples/sec Loss 5.9400 LearningRate 0.0760 Epoch: 16 Global Step: 41920 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:45:40,722-Speed 13051.92 samples/sec Loss 5.8845 LearningRate 0.0760 Epoch: 16 Global Step: 41930 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 15:45:42,304-Speed 12951.17 samples/sec Loss 5.9455 LearningRate 0.0759 Epoch: 16 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:43,853-Speed 13229.88 samples/sec Loss 5.9670 LearningRate 0.0759 Epoch: 16 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:45,409-Speed 13169.46 samples/sec Loss 5.9240 LearningRate 0.0759 Epoch: 16 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:46,985-Speed 13003.60 samples/sec Loss 5.9370 LearningRate 0.0759 Epoch: 16 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:48,551-Speed 13082.79 samples/sec Loss 5.9063 LearningRate 0.0758 Epoch: 16 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:50,136-Speed 12925.58 samples/sec Loss 5.9427 LearningRate 0.0758 Epoch: 16 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:51,733-Speed 12836.24 samples/sec Loss 6.0217 LearningRate 0.0758 Epoch: 16 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:45:53,281-Speed 13253.39 samples/sec Loss 6.0305 LearningRate 0.0758 Epoch: 16 Global Step: 42010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:54,860-Speed 12972.07 samples/sec Loss 5.9583 LearningRate 0.0757 Epoch: 16 Global Step: 42020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:56,427-Speed 13081.87 samples/sec Loss 6.0735 LearningRate 0.0757 Epoch: 16 Global Step: 42030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:58,011-Speed 12937.08 samples/sec Loss 5.9278 LearningRate 0.0757 Epoch: 16 Global Step: 42040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:45:59,580-Speed 13052.51 samples/sec Loss 5.9679 LearningRate 0.0757 Epoch: 16 Global Step: 42050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:01,155-Speed 13017.61 samples/sec Loss 5.8985 LearningRate 0.0756 Epoch: 16 Global Step: 42060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:02,746-Speed 12878.71 samples/sec Loss 5.9583 LearningRate 0.0756 Epoch: 16 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:04,337-Speed 12897.15 samples/sec Loss 5.9992 LearningRate 0.0756 Epoch: 16 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:05,892-Speed 13180.39 samples/sec Loss 5.8953 LearningRate 0.0756 Epoch: 16 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:07,479-Speed 12915.15 samples/sec Loss 6.0301 LearningRate 0.0755 Epoch: 16 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:09,051-Speed 13026.19 samples/sec Loss 5.8948 LearningRate 0.0755 Epoch: 16 Global Step: 42110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:10,628-Speed 13002.67 samples/sec Loss 6.0118 LearningRate 0.0755 Epoch: 16 Global Step: 42120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:12,220-Speed 12870.00 samples/sec Loss 5.9347 LearningRate 0.0755 Epoch: 16 Global Step: 42130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:13,800-Speed 12963.21 samples/sec Loss 5.9118 LearningRate 0.0754 Epoch: 16 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:15,368-Speed 13089.67 samples/sec Loss 5.8427 LearningRate 0.0754 Epoch: 16 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:16,937-Speed 13053.96 samples/sec Loss 5.9083 LearningRate 0.0754 Epoch: 16 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:18,510-Speed 13029.59 samples/sec Loss 6.0344 LearningRate 0.0754 Epoch: 16 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:20,076-Speed 13083.15 samples/sec Loss 5.9170 LearningRate 0.0753 Epoch: 16 Global Step: 42180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:21,673-Speed 12831.47 samples/sec Loss 5.9538 LearningRate 0.0753 Epoch: 16 Global Step: 42190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:23,227-Speed 13186.25 samples/sec Loss 5.9543 LearningRate 0.0753 Epoch: 16 Global Step: 42200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:24,795-Speed 13070.57 samples/sec Loss 5.9911 LearningRate 0.0753 Epoch: 16 Global Step: 42210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:26,345-Speed 13217.01 samples/sec Loss 5.9906 LearningRate 0.0752 Epoch: 16 Global Step: 42220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:27,938-Speed 12866.78 samples/sec Loss 6.0413 LearningRate 0.0752 Epoch: 16 Global Step: 42230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:29,513-Speed 13007.94 samples/sec Loss 5.9623 LearningRate 0.0752 Epoch: 16 Global Step: 42240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:31,053-Speed 13307.28 samples/sec Loss 5.8909 LearningRate 0.0752 Epoch: 16 Global Step: 42250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:32,631-Speed 12983.37 samples/sec Loss 5.9545 LearningRate 0.0751 Epoch: 16 Global Step: 42260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:34,181-Speed 13221.14 samples/sec Loss 6.0301 LearningRate 0.0751 Epoch: 16 Global Step: 42270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:35,771-Speed 12887.89 samples/sec Loss 5.9619 LearningRate 0.0751 Epoch: 16 Global Step: 42280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:37,333-Speed 13117.85 samples/sec Loss 5.9553 LearningRate 0.0751 Epoch: 16 Global Step: 42290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:46:38,877-Speed 13269.75 samples/sec Loss 5.9839 LearningRate 0.0750 Epoch: 16 Global Step: 42300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:40,474-Speed 12835.86 samples/sec Loss 5.9777 LearningRate 0.0750 Epoch: 16 Global Step: 42310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:42,047-Speed 13023.94 samples/sec Loss 6.0162 LearningRate 0.0750 Epoch: 16 Global Step: 42320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:43,617-Speed 13053.95 samples/sec Loss 5.8806 LearningRate 0.0749 Epoch: 16 Global Step: 42330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:45,213-Speed 12838.52 samples/sec Loss 5.9420 LearningRate 0.0749 Epoch: 16 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:46,803-Speed 12888.45 samples/sec Loss 5.8686 LearningRate 0.0749 Epoch: 16 Global Step: 42350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:48,382-Speed 12976.76 samples/sec Loss 5.9449 LearningRate 0.0749 Epoch: 16 Global Step: 42360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:49,954-Speed 13036.94 samples/sec Loss 5.9238 LearningRate 0.0748 Epoch: 16 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:51,514-Speed 13138.52 samples/sec Loss 5.9760 LearningRate 0.0748 Epoch: 16 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:53,072-Speed 13152.09 samples/sec Loss 5.9310 LearningRate 0.0748 Epoch: 16 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:46:54,657-Speed 12929.31 samples/sec Loss 5.9259 LearningRate 0.0748 Epoch: 16 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:56,243-Speed 12920.95 samples/sec Loss 5.9876 LearningRate 0.0747 Epoch: 16 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:57,803-Speed 13145.03 samples/sec Loss 6.0172 LearningRate 0.0747 Epoch: 16 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:46:59,373-Speed 13056.17 samples/sec Loss 5.9457 LearningRate 0.0747 Epoch: 16 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:47:00,932-Speed 13144.01 samples/sec Loss 5.9490 LearningRate 0.0747 Epoch: 16 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:47:02,506-Speed 13013.34 samples/sec Loss 5.9693 LearningRate 0.0746 Epoch: 16 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:47:04,070-Speed 13101.00 samples/sec Loss 6.0264 LearningRate 0.0746 Epoch: 16 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:47:05,622-Speed 13205.78 samples/sec Loss 6.0145 LearningRate 0.0746 Epoch: 16 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:47:07,187-Speed 13097.17 samples/sec Loss 6.0102 LearningRate 0.0746 Epoch: 16 Global Step: 42480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:08,763-Speed 12992.80 samples/sec Loss 5.9743 LearningRate 0.0745 Epoch: 16 Global Step: 42490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:10,343-Speed 12975.10 samples/sec Loss 5.9702 LearningRate 0.0745 Epoch: 16 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:11,911-Speed 13066.74 samples/sec Loss 5.9962 LearningRate 0.0745 Epoch: 16 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:13,504-Speed 12868.00 samples/sec Loss 5.8920 LearningRate 0.0745 Epoch: 16 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:15,099-Speed 12846.57 samples/sec Loss 5.8806 LearningRate 0.0744 Epoch: 16 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:16,669-Speed 13049.05 samples/sec Loss 5.9085 LearningRate 0.0744 Epoch: 16 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:18,253-Speed 12937.02 samples/sec Loss 5.9913 LearningRate 0.0744 Epoch: 16 Global Step: 42550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:19,794-Speed 13294.84 samples/sec Loss 5.9143 LearningRate 0.0744 Epoch: 16 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:21,387-Speed 12863.55 samples/sec Loss 6.0373 LearningRate 0.0743 Epoch: 16 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:22,962-Speed 13013.24 samples/sec Loss 6.0192 LearningRate 0.0743 Epoch: 16 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:47:24,535-Speed 13026.65 samples/sec Loss 6.0583 LearningRate 0.0743 Epoch: 16 Global Step: 42590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:26,106-Speed 13045.90 samples/sec Loss 5.8878 LearningRate 0.0743 Epoch: 16 Global Step: 42600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:27,688-Speed 12944.33 samples/sec Loss 5.8754 LearningRate 0.0742 Epoch: 16 Global Step: 42610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:29,253-Speed 13095.73 samples/sec Loss 5.9824 LearningRate 0.0742 Epoch: 16 Global Step: 42620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:30,859-Speed 12758.08 samples/sec Loss 5.9258 LearningRate 0.0742 Epoch: 16 Global Step: 42630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:32,409-Speed 13218.82 samples/sec Loss 5.9677 LearningRate 0.0742 Epoch: 16 Global Step: 42640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:33,991-Speed 12954.84 samples/sec Loss 5.9421 LearningRate 0.0741 Epoch: 16 Global Step: 42650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:35,556-Speed 13098.56 samples/sec Loss 5.9603 LearningRate 0.0741 Epoch: 16 Global Step: 42660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:37,124-Speed 13064.94 samples/sec Loss 5.9088 LearningRate 0.0741 Epoch: 16 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:38,702-Speed 12980.69 samples/sec Loss 5.9288 LearningRate 0.0741 Epoch: 16 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:40,272-Speed 13053.17 samples/sec Loss 5.9841 LearningRate 0.0740 Epoch: 16 Global Step: 42690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:41,813-Speed 13296.27 samples/sec Loss 6.0672 LearningRate 0.0740 Epoch: 16 Global Step: 42700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:43,364-Speed 13215.55 samples/sec Loss 5.9165 LearningRate 0.0740 Epoch: 16 Global Step: 42710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:47:44,937-Speed 13022.34 samples/sec Loss 6.0393 LearningRate 0.0740 Epoch: 16 Global Step: 42720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:46,522-Speed 12930.01 samples/sec Loss 5.9613 LearningRate 0.0739 Epoch: 16 Global Step: 42730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:48,093-Speed 13047.16 samples/sec Loss 6.0426 LearningRate 0.0739 Epoch: 16 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:49,643-Speed 13214.67 samples/sec Loss 6.0756 LearningRate 0.0739 Epoch: 16 Global Step: 42750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:51,224-Speed 12959.49 samples/sec Loss 6.0120 LearningRate 0.0739 Epoch: 16 Global Step: 42760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:52,810-Speed 12921.05 samples/sec Loss 6.0170 LearningRate 0.0738 Epoch: 16 Global Step: 42770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:54,399-Speed 12897.47 samples/sec Loss 6.0243 LearningRate 0.0738 Epoch: 16 Global Step: 42780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:55,940-Speed 13302.15 samples/sec Loss 5.9527 LearningRate 0.0738 Epoch: 16 Global Step: 42790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:57,518-Speed 12978.26 samples/sec Loss 5.9181 LearningRate 0.0738 Epoch: 16 Global Step: 42800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:47:59,085-Speed 13081.20 samples/sec Loss 5.9307 LearningRate 0.0737 Epoch: 16 Global Step: 42810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:00,655-Speed 13059.41 samples/sec Loss 6.0135 LearningRate 0.0737 Epoch: 16 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:02,207-Speed 13204.32 samples/sec Loss 6.0089 LearningRate 0.0737 Epoch: 16 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:03,772-Speed 13095.49 samples/sec Loss 5.9324 LearningRate 0.0737 Epoch: 16 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:05,331-Speed 13139.33 samples/sec Loss 5.8920 LearningRate 0.0736 Epoch: 16 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:06,920-Speed 12897.33 samples/sec Loss 6.0074 LearningRate 0.0736 Epoch: 16 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:08,466-Speed 13251.24 samples/sec Loss 5.9014 LearningRate 0.0736 Epoch: 16 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:10,041-Speed 13012.69 samples/sec Loss 5.9305 LearningRate 0.0736 Epoch: 16 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:11,614-Speed 13030.29 samples/sec Loss 5.9043 LearningRate 0.0735 Epoch: 16 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:13,211-Speed 12825.80 samples/sec Loss 6.0076 LearningRate 0.0735 Epoch: 16 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:14,773-Speed 13120.54 samples/sec Loss 5.9866 LearningRate 0.0735 Epoch: 16 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:16,335-Speed 13121.94 samples/sec Loss 6.0359 LearningRate 0.0735 Epoch: 16 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:17,908-Speed 13023.13 samples/sec Loss 6.0728 LearningRate 0.0734 Epoch: 16 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:19,491-Speed 12939.95 samples/sec Loss 5.9557 LearningRate 0.0734 Epoch: 16 Global Step: 42940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:21,086-Speed 12849.04 samples/sec Loss 5.9607 LearningRate 0.0734 Epoch: 16 Global Step: 42950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:22,673-Speed 12911.61 samples/sec Loss 6.0646 LearningRate 0.0734 Epoch: 16 Global Step: 42960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:24,231-Speed 13151.36 samples/sec Loss 6.0027 LearningRate 0.0733 Epoch: 16 Global Step: 42970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:25,899-Speed 12286.78 samples/sec Loss 5.9733 LearningRate 0.0733 Epoch: 16 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:27,410-Speed 13555.74 samples/sec Loss 6.0383 LearningRate 0.0733 Epoch: 16 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:48:43,058-Speed 1308.95 samples/sec Loss 5.3816 LearningRate 0.0733 Epoch: 17 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:44,643-Speed 12947.15 samples/sec Loss 5.0659 LearningRate 0.0732 Epoch: 17 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:46,221-Speed 12985.24 samples/sec Loss 5.0978 LearningRate 0.0732 Epoch: 17 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:47,808-Speed 12917.30 samples/sec Loss 5.2074 LearningRate 0.0732 Epoch: 17 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:49,414-Speed 12757.41 samples/sec Loss 5.1787 LearningRate 0.0732 Epoch: 17 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:51,000-Speed 12917.13 samples/sec Loss 5.1610 LearningRate 0.0731 Epoch: 17 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:52,587-Speed 12912.94 samples/sec Loss 5.1614 LearningRate 0.0731 Epoch: 17 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:54,166-Speed 12976.90 samples/sec Loss 5.1867 LearningRate 0.0731 Epoch: 17 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:55,755-Speed 12899.04 samples/sec Loss 5.2406 LearningRate 0.0730 Epoch: 17 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:57,345-Speed 12891.08 samples/sec Loss 5.1078 LearningRate 0.0730 Epoch: 17 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:48:58,917-Speed 13032.09 samples/sec Loss 5.1913 LearningRate 0.0730 Epoch: 17 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:00,509-Speed 12876.06 samples/sec Loss 5.1286 LearningRate 0.0730 Epoch: 17 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:02,084-Speed 13009.86 samples/sec Loss 5.2316 LearningRate 0.0729 Epoch: 17 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:03,644-Speed 13161.24 samples/sec Loss 5.2691 LearningRate 0.0729 Epoch: 17 Global Step: 43130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:05,241-Speed 12829.90 samples/sec Loss 5.1610 LearningRate 0.0729 Epoch: 17 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:06,804-Speed 13112.49 samples/sec Loss 5.2875 LearningRate 0.0729 Epoch: 17 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:08,394-Speed 12884.56 samples/sec Loss 5.2736 LearningRate 0.0728 Epoch: 17 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:09,949-Speed 13177.32 samples/sec Loss 5.3032 LearningRate 0.0728 Epoch: 17 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:11,542-Speed 12868.34 samples/sec Loss 5.2469 LearningRate 0.0728 Epoch: 17 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:13,120-Speed 12984.05 samples/sec Loss 5.2364 LearningRate 0.0728 Epoch: 17 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:14,693-Speed 13028.56 samples/sec Loss 5.2676 LearningRate 0.0727 Epoch: 17 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:16,261-Speed 13066.11 samples/sec Loss 5.2992 LearningRate 0.0727 Epoch: 17 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:17,825-Speed 13106.71 samples/sec Loss 5.3023 LearningRate 0.0727 Epoch: 17 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:19,425-Speed 12805.09 samples/sec Loss 5.3331 LearningRate 0.0727 Epoch: 17 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:21,009-Speed 12936.84 samples/sec Loss 5.3103 LearningRate 0.0726 Epoch: 17 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:22,588-Speed 12976.88 samples/sec Loss 5.3366 LearningRate 0.0726 Epoch: 17 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:24,171-Speed 12943.64 samples/sec Loss 5.3703 LearningRate 0.0726 Epoch: 17 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:25,732-Speed 13123.81 samples/sec Loss 5.4343 LearningRate 0.0726 Epoch: 17 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:27,307-Speed 13010.70 samples/sec Loss 5.3886 LearningRate 0.0725 Epoch: 17 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:28,870-Speed 13110.46 samples/sec Loss 5.4700 LearningRate 0.0725 Epoch: 17 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:30,451-Speed 12957.21 samples/sec Loss 5.3657 LearningRate 0.0725 Epoch: 17 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:32,035-Speed 12941.45 samples/sec Loss 5.3842 LearningRate 0.0725 Epoch: 17 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:33,620-Speed 12931.32 samples/sec Loss 5.4236 LearningRate 0.0724 Epoch: 17 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:35,192-Speed 13032.27 samples/sec Loss 5.3533 LearningRate 0.0724 Epoch: 17 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:36,751-Speed 13139.62 samples/sec Loss 5.4958 LearningRate 0.0724 Epoch: 17 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:38,330-Speed 12982.30 samples/sec Loss 5.4961 LearningRate 0.0724 Epoch: 17 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:39,902-Speed 13033.40 samples/sec Loss 5.4709 LearningRate 0.0723 Epoch: 17 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:41,459-Speed 13163.04 samples/sec Loss 5.4593 LearningRate 0.0723 Epoch: 17 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:43,027-Speed 13066.70 samples/sec Loss 5.4823 LearningRate 0.0723 Epoch: 17 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:44,601-Speed 13016.16 samples/sec Loss 5.5100 LearningRate 0.0723 Epoch: 17 Global Step: 43390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:46,178-Speed 12990.91 samples/sec Loss 5.4763 LearningRate 0.0722 Epoch: 17 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:47,741-Speed 13112.40 samples/sec Loss 5.4433 LearningRate 0.0722 Epoch: 17 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:49:49,308-Speed 13078.06 samples/sec Loss 5.5134 LearningRate 0.0722 Epoch: 17 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:50,882-Speed 13016.84 samples/sec Loss 5.5352 LearningRate 0.0722 Epoch: 17 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:52,465-Speed 12942.12 samples/sec Loss 5.4756 LearningRate 0.0721 Epoch: 17 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:54,046-Speed 12957.39 samples/sec Loss 5.4643 LearningRate 0.0721 Epoch: 17 Global Step: 43450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:55,634-Speed 12905.57 samples/sec Loss 5.4650 LearningRate 0.0721 Epoch: 17 Global Step: 43460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:57,201-Speed 13081.67 samples/sec Loss 5.4694 LearningRate 0.0721 Epoch: 17 Global Step: 43470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:49:58,773-Speed 13032.41 samples/sec Loss 5.4773 LearningRate 0.0720 Epoch: 17 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:00,356-Speed 12946.17 samples/sec Loss 5.5528 LearningRate 0.0720 Epoch: 17 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:01,909-Speed 13192.00 samples/sec Loss 5.5551 LearningRate 0.0720 Epoch: 17 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:03,507-Speed 12826.24 samples/sec Loss 5.5383 LearningRate 0.0720 Epoch: 17 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:05,087-Speed 12965.89 samples/sec Loss 5.5091 LearningRate 0.0719 Epoch: 17 Global Step: 43520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:06,680-Speed 12867.19 samples/sec Loss 5.5733 LearningRate 0.0719 Epoch: 17 Global Step: 43530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:08,259-Speed 12973.23 samples/sec Loss 5.4867 LearningRate 0.0719 Epoch: 17 Global Step: 43540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:09,833-Speed 13012.06 samples/sec Loss 5.4655 LearningRate 0.0719 Epoch: 17 Global Step: 43550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:11,419-Speed 12927.66 samples/sec Loss 5.5623 LearningRate 0.0718 Epoch: 17 Global Step: 43560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:12,994-Speed 13007.96 samples/sec Loss 5.6020 LearningRate 0.0718 Epoch: 17 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:14,593-Speed 12816.00 samples/sec Loss 5.5874 LearningRate 0.0718 Epoch: 17 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:16,129-Speed 13341.38 samples/sec Loss 5.6044 LearningRate 0.0718 Epoch: 17 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:17,681-Speed 13199.79 samples/sec Loss 5.5115 LearningRate 0.0717 Epoch: 17 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:19,249-Speed 13070.66 samples/sec Loss 5.5729 LearningRate 0.0717 Epoch: 17 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:20,815-Speed 13086.67 samples/sec Loss 5.5525 LearningRate 0.0717 Epoch: 17 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:22,393-Speed 12980.74 samples/sec Loss 5.5721 LearningRate 0.0717 Epoch: 17 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:23,977-Speed 12937.49 samples/sec Loss 5.6417 LearningRate 0.0716 Epoch: 17 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:25,551-Speed 13013.34 samples/sec Loss 5.6400 LearningRate 0.0716 Epoch: 17 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:27,127-Speed 13008.99 samples/sec Loss 5.6031 LearningRate 0.0716 Epoch: 17 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:28,720-Speed 12860.81 samples/sec Loss 5.6356 LearningRate 0.0716 Epoch: 17 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:30,307-Speed 12909.60 samples/sec Loss 5.5136 LearningRate 0.0715 Epoch: 17 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:31,876-Speed 13064.78 samples/sec Loss 5.6508 LearningRate 0.0715 Epoch: 17 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:33,450-Speed 13016.91 samples/sec Loss 5.6063 LearningRate 0.0715 Epoch: 17 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:35,017-Speed 13075.05 samples/sec Loss 5.6236 LearningRate 0.0715 Epoch: 17 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:36,570-Speed 13200.61 samples/sec Loss 5.6227 LearningRate 0.0714 Epoch: 17 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:38,155-Speed 12930.46 samples/sec Loss 5.6566 LearningRate 0.0714 Epoch: 17 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:39,739-Speed 12932.52 samples/sec Loss 5.6827 LearningRate 0.0714 Epoch: 17 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:41,309-Speed 13049.88 samples/sec Loss 5.6317 LearningRate 0.0714 Epoch: 17 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:42,873-Speed 13101.11 samples/sec Loss 5.6566 LearningRate 0.0713 Epoch: 17 Global Step: 43760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:44,459-Speed 12919.06 samples/sec Loss 5.7120 LearningRate 0.0713 Epoch: 17 Global Step: 43770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:46,045-Speed 12924.37 samples/sec Loss 5.6919 LearningRate 0.0713 Epoch: 17 Global Step: 43780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:47,657-Speed 12708.25 samples/sec Loss 5.6491 LearningRate 0.0713 Epoch: 17 Global Step: 43790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:49,236-Speed 12973.90 samples/sec Loss 5.7778 LearningRate 0.0712 Epoch: 17 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:50,807-Speed 13051.75 samples/sec Loss 5.6951 LearningRate 0.0712 Epoch: 17 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:52,386-Speed 12969.97 samples/sec Loss 5.7462 LearningRate 0.0712 Epoch: 17 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:53,983-Speed 12829.27 samples/sec Loss 5.6899 LearningRate 0.0712 Epoch: 17 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:55,559-Speed 13006.65 samples/sec Loss 5.7123 LearningRate 0.0712 Epoch: 17 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:50:57,116-Speed 13182.96 samples/sec Loss 5.7752 LearningRate 0.0711 Epoch: 17 Global Step: 43850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:50:58,677-Speed 13119.81 samples/sec Loss 5.7693 LearningRate 0.0711 Epoch: 17 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:00,267-Speed 12891.26 samples/sec Loss 5.7538 LearningRate 0.0711 Epoch: 17 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:01,850-Speed 12946.53 samples/sec Loss 5.7597 LearningRate 0.0711 Epoch: 17 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:03,413-Speed 13106.58 samples/sec Loss 5.6458 LearningRate 0.0710 Epoch: 17 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:05,005-Speed 12872.02 samples/sec Loss 5.6148 LearningRate 0.0710 Epoch: 17 Global Step: 43900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:06,576-Speed 13046.85 samples/sec Loss 5.6773 LearningRate 0.0710 Epoch: 17 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:08,134-Speed 13145.60 samples/sec Loss 5.6936 LearningRate 0.0710 Epoch: 17 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:09,692-Speed 13156.00 samples/sec Loss 5.7219 LearningRate 0.0709 Epoch: 17 Global Step: 43930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:11,260-Speed 13068.15 samples/sec Loss 5.7869 LearningRate 0.0709 Epoch: 17 Global Step: 43940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:12,845-Speed 12924.61 samples/sec Loss 5.7165 LearningRate 0.0709 Epoch: 17 Global Step: 43950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:14,431-Speed 12920.73 samples/sec Loss 5.7125 LearningRate 0.0709 Epoch: 17 Global Step: 43960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:15,973-Speed 13292.74 samples/sec Loss 5.7048 LearningRate 0.0708 Epoch: 17 Global Step: 43970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:17,529-Speed 13172.83 samples/sec Loss 5.6538 LearningRate 0.0708 Epoch: 17 Global Step: 43980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:19,126-Speed 12830.50 samples/sec Loss 5.6985 LearningRate 0.0708 Epoch: 17 Global Step: 43990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:20,681-Speed 13175.56 samples/sec Loss 5.7081 LearningRate 0.0708 Epoch: 17 Global Step: 44000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:22,283-Speed 12787.50 samples/sec Loss 5.7599 LearningRate 0.0707 Epoch: 17 Global Step: 44010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:23,859-Speed 13000.65 samples/sec Loss 5.7113 LearningRate 0.0707 Epoch: 17 Global Step: 44020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:51:25,431-Speed 13038.95 samples/sec Loss 5.7134 LearningRate 0.0707 Epoch: 17 Global Step: 44030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:26,998-Speed 13077.12 samples/sec Loss 5.7087 LearningRate 0.0707 Epoch: 17 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:28,593-Speed 12844.12 samples/sec Loss 5.7783 LearningRate 0.0706 Epoch: 17 Global Step: 44050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:30,170-Speed 12996.50 samples/sec Loss 5.6735 LearningRate 0.0706 Epoch: 17 Global Step: 44060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:31,747-Speed 12992.80 samples/sec Loss 5.7337 LearningRate 0.0706 Epoch: 17 Global Step: 44070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:33,324-Speed 12996.55 samples/sec Loss 5.6368 LearningRate 0.0706 Epoch: 17 Global Step: 44080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:34,916-Speed 12871.54 samples/sec Loss 5.7241 LearningRate 0.0705 Epoch: 17 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:36,487-Speed 13047.81 samples/sec Loss 5.7032 LearningRate 0.0705 Epoch: 17 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:38,038-Speed 13205.01 samples/sec Loss 5.7264 LearningRate 0.0705 Epoch: 17 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:39,627-Speed 12894.97 samples/sec Loss 5.7413 LearningRate 0.0705 Epoch: 17 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:41,215-Speed 12929.70 samples/sec Loss 5.7847 LearningRate 0.0704 Epoch: 17 Global Step: 44130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:51:42,800-Speed 12927.19 samples/sec Loss 5.7334 LearningRate 0.0704 Epoch: 17 Global Step: 44140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:51:44,392-Speed 12872.13 samples/sec Loss 5.7689 LearningRate 0.0704 Epoch: 17 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:45,977-Speed 12925.94 samples/sec Loss 5.6828 LearningRate 0.0704 Epoch: 17 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:47,558-Speed 12961.36 samples/sec Loss 5.7390 LearningRate 0.0703 Epoch: 17 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:49,138-Speed 12963.25 samples/sec Loss 5.7777 LearningRate 0.0703 Epoch: 17 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:50,695-Speed 13165.71 samples/sec Loss 5.7558 LearningRate 0.0703 Epoch: 17 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:52,278-Speed 12940.41 samples/sec Loss 5.7571 LearningRate 0.0703 Epoch: 17 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:53,845-Speed 13073.17 samples/sec Loss 5.8151 LearningRate 0.0702 Epoch: 17 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:55,443-Speed 12827.65 samples/sec Loss 5.7791 LearningRate 0.0702 Epoch: 17 Global Step: 44220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:57,017-Speed 13017.33 samples/sec Loss 5.8093 LearningRate 0.0702 Epoch: 17 Global Step: 44230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:51:58,590-Speed 13032.51 samples/sec Loss 5.8510 LearningRate 0.0702 Epoch: 17 Global Step: 44240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:00,179-Speed 12886.83 samples/sec Loss 5.7818 LearningRate 0.0701 Epoch: 17 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:52:01,760-Speed 12971.43 samples/sec Loss 5.7517 LearningRate 0.0701 Epoch: 17 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:03,323-Speed 13109.71 samples/sec Loss 5.7107 LearningRate 0.0701 Epoch: 17 Global Step: 44270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:04,913-Speed 12882.87 samples/sec Loss 5.7476 LearningRate 0.0701 Epoch: 17 Global Step: 44280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:06,498-Speed 12925.65 samples/sec Loss 5.7774 LearningRate 0.0700 Epoch: 17 Global Step: 44290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:08,071-Speed 13028.94 samples/sec Loss 5.7344 LearningRate 0.0700 Epoch: 17 Global Step: 44300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:09,657-Speed 12920.06 samples/sec Loss 5.7774 LearningRate 0.0700 Epoch: 17 Global Step: 44310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:11,218-Speed 13125.44 samples/sec Loss 5.8139 LearningRate 0.0700 Epoch: 17 Global Step: 44320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:12,803-Speed 12925.98 samples/sec Loss 5.7447 LearningRate 0.0699 Epoch: 17 Global Step: 44330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:14,387-Speed 12939.54 samples/sec Loss 5.7647 LearningRate 0.0699 Epoch: 17 Global Step: 44340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:15,960-Speed 13024.33 samples/sec Loss 5.7089 LearningRate 0.0699 Epoch: 17 Global Step: 44350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:17,533-Speed 13026.35 samples/sec Loss 5.7798 LearningRate 0.0699 Epoch: 17 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:52:19,124-Speed 12884.99 samples/sec Loss 5.8589 LearningRate 0.0698 Epoch: 17 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:52:20,681-Speed 13157.41 samples/sec Loss 5.7944 LearningRate 0.0698 Epoch: 17 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:22,271-Speed 12894.94 samples/sec Loss 5.7337 LearningRate 0.0698 Epoch: 17 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:23,861-Speed 12884.45 samples/sec Loss 5.8340 LearningRate 0.0698 Epoch: 17 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:25,442-Speed 12963.27 samples/sec Loss 5.8151 LearningRate 0.0697 Epoch: 17 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:27,021-Speed 12979.85 samples/sec Loss 5.8339 LearningRate 0.0697 Epoch: 17 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:28,598-Speed 12986.63 samples/sec Loss 5.6949 LearningRate 0.0697 Epoch: 17 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:30,174-Speed 13004.42 samples/sec Loss 5.8628 LearningRate 0.0697 Epoch: 17 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:31,743-Speed 13063.99 samples/sec Loss 5.6875 LearningRate 0.0696 Epoch: 17 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:33,315-Speed 13029.89 samples/sec Loss 5.8836 LearningRate 0.0696 Epoch: 17 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:34,890-Speed 13010.75 samples/sec Loss 5.7729 LearningRate 0.0696 Epoch: 17 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:36,454-Speed 13101.39 samples/sec Loss 5.8351 LearningRate 0.0696 Epoch: 17 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:38,056-Speed 12786.02 samples/sec Loss 5.9314 LearningRate 0.0695 Epoch: 17 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:39,665-Speed 12737.01 samples/sec Loss 5.7774 LearningRate 0.0695 Epoch: 17 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:41,251-Speed 12930.45 samples/sec Loss 5.8503 LearningRate 0.0695 Epoch: 17 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:42,819-Speed 13072.49 samples/sec Loss 5.8470 LearningRate 0.0695 Epoch: 17 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:44,399-Speed 12963.56 samples/sec Loss 5.7880 LearningRate 0.0694 Epoch: 17 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:45,973-Speed 13024.36 samples/sec Loss 5.7790 LearningRate 0.0694 Epoch: 17 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:47,528-Speed 13178.50 samples/sec Loss 5.9032 LearningRate 0.0694 Epoch: 17 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:49,101-Speed 13022.15 samples/sec Loss 5.8291 LearningRate 0.0694 Epoch: 17 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:50,692-Speed 12880.68 samples/sec Loss 5.8421 LearningRate 0.0693 Epoch: 17 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:52:52,281-Speed 12898.59 samples/sec Loss 5.6898 LearningRate 0.0693 Epoch: 17 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:52:53,856-Speed 13003.37 samples/sec Loss 5.8560 LearningRate 0.0693 Epoch: 17 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:52:55,429-Speed 13028.45 samples/sec Loss 5.8243 LearningRate 0.0693 Epoch: 17 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:52:56,987-Speed 13152.60 samples/sec Loss 5.8171 LearningRate 0.0693 Epoch: 17 Global Step: 44610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:52:58,529-Speed 13286.57 samples/sec Loss 5.7358 LearningRate 0.0692 Epoch: 17 Global Step: 44620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:00,111-Speed 12951.11 samples/sec Loss 5.8598 LearningRate 0.0692 Epoch: 17 Global Step: 44630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:01,675-Speed 13108.36 samples/sec Loss 5.9024 LearningRate 0.0692 Epoch: 17 Global Step: 44640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:03,268-Speed 12855.70 samples/sec Loss 5.7776 LearningRate 0.0692 Epoch: 17 Global Step: 44650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:04,822-Speed 13187.22 samples/sec Loss 5.7664 LearningRate 0.0691 Epoch: 17 Global Step: 44660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:06,412-Speed 12893.08 samples/sec Loss 5.7917 LearningRate 0.0691 Epoch: 17 Global Step: 44670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:07,978-Speed 13081.61 samples/sec Loss 5.8403 LearningRate 0.0691 Epoch: 17 Global Step: 44680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:09,548-Speed 13049.06 samples/sec Loss 5.8777 LearningRate 0.0691 Epoch: 17 Global Step: 44690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:11,133-Speed 12930.64 samples/sec Loss 5.7877 LearningRate 0.0690 Epoch: 17 Global Step: 44700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:12,725-Speed 12870.83 samples/sec Loss 5.7176 LearningRate 0.0690 Epoch: 17 Global Step: 44710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:14,271-Speed 13252.81 samples/sec Loss 5.7888 LearningRate 0.0690 Epoch: 17 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:15,827-Speed 13167.20 samples/sec Loss 5.8266 LearningRate 0.0690 Epoch: 17 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:17,394-Speed 13076.28 samples/sec Loss 5.7995 LearningRate 0.0689 Epoch: 17 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:18,959-Speed 13094.25 samples/sec Loss 5.7874 LearningRate 0.0689 Epoch: 17 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:20,524-Speed 13094.68 samples/sec Loss 5.8235 LearningRate 0.0689 Epoch: 17 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:22,121-Speed 12827.85 samples/sec Loss 5.8088 LearningRate 0.0689 Epoch: 17 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:23,688-Speed 13076.85 samples/sec Loss 5.7537 LearningRate 0.0688 Epoch: 17 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:25,266-Speed 12988.61 samples/sec Loss 5.8111 LearningRate 0.0688 Epoch: 17 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:26,868-Speed 12789.97 samples/sec Loss 5.7592 LearningRate 0.0688 Epoch: 17 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:28,438-Speed 13049.10 samples/sec Loss 5.7983 LearningRate 0.0688 Epoch: 17 Global Step: 44810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:53:29,996-Speed 13149.87 samples/sec Loss 5.7795 LearningRate 0.0687 Epoch: 17 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:31,565-Speed 13065.96 samples/sec Loss 5.7511 LearningRate 0.0687 Epoch: 17 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:33,135-Speed 13057.11 samples/sec Loss 5.8510 LearningRate 0.0687 Epoch: 17 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:34,710-Speed 13005.34 samples/sec Loss 5.9139 LearningRate 0.0687 Epoch: 17 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:36,279-Speed 13061.05 samples/sec Loss 5.8282 LearningRate 0.0686 Epoch: 17 Global Step: 44860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:37,876-Speed 12828.85 samples/sec Loss 5.8171 LearningRate 0.0686 Epoch: 17 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:39,442-Speed 13089.06 samples/sec Loss 5.8299 LearningRate 0.0686 Epoch: 17 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:41,018-Speed 13000.78 samples/sec Loss 5.7694 LearningRate 0.0686 Epoch: 17 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:42,600-Speed 12947.97 samples/sec Loss 5.7915 LearningRate 0.0685 Epoch: 17 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:44,195-Speed 12849.80 samples/sec Loss 5.8335 LearningRate 0.0685 Epoch: 17 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:45,770-Speed 13009.42 samples/sec Loss 5.8816 LearningRate 0.0685 Epoch: 17 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:47,348-Speed 12984.46 samples/sec Loss 5.8306 LearningRate 0.0685 Epoch: 17 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:48,934-Speed 12916.33 samples/sec Loss 5.8084 LearningRate 0.0684 Epoch: 17 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:50,517-Speed 12948.36 samples/sec Loss 5.7745 LearningRate 0.0684 Epoch: 17 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:52,087-Speed 13051.14 samples/sec Loss 5.9483 LearningRate 0.0684 Epoch: 17 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:53,664-Speed 12998.51 samples/sec Loss 5.8759 LearningRate 0.0684 Epoch: 17 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:53:55,247-Speed 12942.49 samples/sec Loss 5.8288 LearningRate 0.0683 Epoch: 17 Global Step: 44980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:56,813-Speed 13089.84 samples/sec Loss 5.8015 LearningRate 0.0683 Epoch: 17 Global Step: 44990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:53:58,398-Speed 12926.19 samples/sec Loss 5.8266 LearningRate 0.0683 Epoch: 17 Global Step: 45000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:54:20,784-[lfw][45000]XNorm: 10.071829 Training: 2022-01-14 15:54:20,785-[lfw][45000]Accuracy-Flip: 0.99550+-0.00342 Training: 2022-01-14 15:54:20,785-[lfw][45000]Accuracy-Highest: 0.99583 Training: 2022-01-14 15:54:46,810-[cfp_fp][45000]XNorm: 8.531704 Training: 2022-01-14 15:54:46,811-[cfp_fp][45000]Accuracy-Flip: 0.95871+-0.01151 Training: 2022-01-14 15:54:46,811-[cfp_fp][45000]Accuracy-Highest: 0.95886 Training: 2022-01-14 15:55:08,863-[agedb_30][45000]XNorm: 9.816380 Training: 2022-01-14 15:55:08,864-[agedb_30][45000]Accuracy-Flip: 0.96200+-0.00927 Training: 2022-01-14 15:55:08,864-[agedb_30][45000]Accuracy-Highest: 0.96567 Training: 2022-01-14 15:55:10,460-Speed 284.20 samples/sec Loss 5.8827 LearningRate 0.0683 Epoch: 17 Global Step: 45010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:55:12,030-Speed 13070.97 samples/sec Loss 5.7659 LearningRate 0.0683 Epoch: 17 Global Step: 45020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:55:13,596-Speed 13079.44 samples/sec Loss 5.7879 LearningRate 0.0682 Epoch: 17 Global Step: 45030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:55:15,173-Speed 12994.16 samples/sec Loss 5.8341 LearningRate 0.0682 Epoch: 17 Global Step: 45040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:55:16,751-Speed 12987.83 samples/sec Loss 5.9206 LearningRate 0.0682 Epoch: 17 Global Step: 45050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:55:18,363-Speed 12710.63 samples/sec Loss 5.8619 LearningRate 0.0682 Epoch: 17 Global Step: 45060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:55:19,952-Speed 12908.80 samples/sec Loss 5.8594 LearningRate 0.0681 Epoch: 17 Global Step: 45070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:55:21,535-Speed 12936.85 samples/sec Loss 5.8585 LearningRate 0.0681 Epoch: 17 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:23,093-Speed 13153.61 samples/sec Loss 5.7544 LearningRate 0.0681 Epoch: 17 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:24,677-Speed 12942.64 samples/sec Loss 5.8336 LearningRate 0.0681 Epoch: 17 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:26,259-Speed 12951.26 samples/sec Loss 5.8588 LearningRate 0.0680 Epoch: 17 Global Step: 45110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:27,838-Speed 12980.04 samples/sec Loss 5.8734 LearningRate 0.0680 Epoch: 17 Global Step: 45120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:29,409-Speed 13042.65 samples/sec Loss 5.8536 LearningRate 0.0680 Epoch: 17 Global Step: 45130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:30,980-Speed 13042.61 samples/sec Loss 5.8513 LearningRate 0.0680 Epoch: 17 Global Step: 45140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:32,543-Speed 13112.30 samples/sec Loss 5.7920 LearningRate 0.0679 Epoch: 17 Global Step: 45150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:34,115-Speed 13034.05 samples/sec Loss 5.9221 LearningRate 0.0679 Epoch: 17 Global Step: 45160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:35,688-Speed 13023.02 samples/sec Loss 5.8237 LearningRate 0.0679 Epoch: 17 Global Step: 45170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:37,265-Speed 12992.86 samples/sec Loss 5.8059 LearningRate 0.0679 Epoch: 17 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:38,843-Speed 12988.37 samples/sec Loss 5.8291 LearningRate 0.0678 Epoch: 17 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:40,429-Speed 12917.84 samples/sec Loss 5.8533 LearningRate 0.0678 Epoch: 17 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:41,999-Speed 13052.55 samples/sec Loss 5.8660 LearningRate 0.0678 Epoch: 17 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:43,596-Speed 12835.18 samples/sec Loss 5.8389 LearningRate 0.0678 Epoch: 17 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:45,172-Speed 12998.65 samples/sec Loss 5.8442 LearningRate 0.0677 Epoch: 17 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:46,737-Speed 13094.77 samples/sec Loss 5.9557 LearningRate 0.0677 Epoch: 17 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:48,306-Speed 13056.96 samples/sec Loss 5.9651 LearningRate 0.0677 Epoch: 17 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:49,898-Speed 12879.10 samples/sec Loss 5.8999 LearningRate 0.0677 Epoch: 17 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:51,477-Speed 12974.83 samples/sec Loss 5.8505 LearningRate 0.0676 Epoch: 17 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:53,024-Speed 13243.91 samples/sec Loss 5.9379 LearningRate 0.0676 Epoch: 17 Global Step: 45280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:54,602-Speed 12989.80 samples/sec Loss 5.8322 LearningRate 0.0676 Epoch: 17 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:56,197-Speed 12841.84 samples/sec Loss 5.9489 LearningRate 0.0676 Epoch: 17 Global Step: 45300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:55:57,768-Speed 13038.13 samples/sec Loss 5.8997 LearningRate 0.0675 Epoch: 17 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:55:59,320-Speed 13206.74 samples/sec Loss 5.8242 LearningRate 0.0675 Epoch: 17 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:00,937-Speed 12673.89 samples/sec Loss 5.7861 LearningRate 0.0675 Epoch: 17 Global Step: 45330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:02,512-Speed 13011.45 samples/sec Loss 5.9313 LearningRate 0.0675 Epoch: 17 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:04,105-Speed 12879.04 samples/sec Loss 5.8388 LearningRate 0.0675 Epoch: 17 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:05,656-Speed 13209.04 samples/sec Loss 5.8047 LearningRate 0.0674 Epoch: 17 Global Step: 45360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:07,238-Speed 12951.78 samples/sec Loss 5.8378 LearningRate 0.0674 Epoch: 17 Global Step: 45370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:08,809-Speed 13060.64 samples/sec Loss 5.8443 LearningRate 0.0674 Epoch: 17 Global Step: 45380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:10,403-Speed 12852.14 samples/sec Loss 5.8104 LearningRate 0.0674 Epoch: 17 Global Step: 45390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:11,985-Speed 12952.95 samples/sec Loss 5.9215 LearningRate 0.0673 Epoch: 17 Global Step: 45400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:13,572-Speed 12912.34 samples/sec Loss 5.8196 LearningRate 0.0673 Epoch: 17 Global Step: 45410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:15,142-Speed 13052.82 samples/sec Loss 5.7961 LearningRate 0.0673 Epoch: 17 Global Step: 45420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:16,743-Speed 12794.11 samples/sec Loss 5.9326 LearningRate 0.0673 Epoch: 17 Global Step: 45430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:18,311-Speed 13067.91 samples/sec Loss 5.8909 LearningRate 0.0672 Epoch: 17 Global Step: 45440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:19,898-Speed 12914.26 samples/sec Loss 5.8301 LearningRate 0.0672 Epoch: 17 Global Step: 45450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:21,472-Speed 13022.66 samples/sec Loss 5.7648 LearningRate 0.0672 Epoch: 17 Global Step: 45460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:23,049-Speed 12995.07 samples/sec Loss 5.8979 LearningRate 0.0672 Epoch: 17 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:24,644-Speed 12849.88 samples/sec Loss 5.8188 LearningRate 0.0671 Epoch: 17 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:26,191-Speed 13247.77 samples/sec Loss 5.7636 LearningRate 0.0671 Epoch: 17 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:27,765-Speed 13014.34 samples/sec Loss 5.8616 LearningRate 0.0671 Epoch: 17 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:29,388-Speed 12623.61 samples/sec Loss 5.8902 LearningRate 0.0671 Epoch: 17 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:30,874-Speed 13791.86 samples/sec Loss 5.8265 LearningRate 0.0670 Epoch: 17 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:45,361-Speed 1413.83 samples/sec Loss 5.1111 LearningRate 0.0670 Epoch: 18 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:46,987-Speed 12605.41 samples/sec Loss 4.9405 LearningRate 0.0670 Epoch: 18 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:48,549-Speed 13118.48 samples/sec Loss 5.0153 LearningRate 0.0670 Epoch: 18 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:50,123-Speed 13024.02 samples/sec Loss 4.9324 LearningRate 0.0669 Epoch: 18 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:51,771-Speed 12425.48 samples/sec Loss 4.9782 LearningRate 0.0669 Epoch: 18 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:56:53,314-Speed 13292.28 samples/sec Loss 5.0368 LearningRate 0.0669 Epoch: 18 Global Step: 45580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:54,908-Speed 12849.85 samples/sec Loss 5.1517 LearningRate 0.0669 Epoch: 18 Global Step: 45590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:56,479-Speed 13046.24 samples/sec Loss 5.0535 LearningRate 0.0668 Epoch: 18 Global Step: 45600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:58,059-Speed 12967.52 samples/sec Loss 4.9688 LearningRate 0.0668 Epoch: 18 Global Step: 45610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:56:59,638-Speed 12978.95 samples/sec Loss 5.1084 LearningRate 0.0668 Epoch: 18 Global Step: 45620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:57:01,201-Speed 13110.48 samples/sec Loss 5.0347 LearningRate 0.0668 Epoch: 18 Global Step: 45630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:57:02,766-Speed 13098.19 samples/sec Loss 5.1815 LearningRate 0.0668 Epoch: 18 Global Step: 45640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:57:04,354-Speed 12899.79 samples/sec Loss 5.0936 LearningRate 0.0667 Epoch: 18 Global Step: 45650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:57:05,922-Speed 13073.17 samples/sec Loss 5.1412 LearningRate 0.0667 Epoch: 18 Global Step: 45660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:57:07,498-Speed 13003.40 samples/sec Loss 5.1582 LearningRate 0.0667 Epoch: 18 Global Step: 45670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:57:09,060-Speed 13113.73 samples/sec Loss 5.1795 LearningRate 0.0667 Epoch: 18 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:10,636-Speed 13004.22 samples/sec Loss 5.1117 LearningRate 0.0666 Epoch: 18 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:12,210-Speed 13019.55 samples/sec Loss 5.1758 LearningRate 0.0666 Epoch: 18 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:13,788-Speed 12990.40 samples/sec Loss 5.2188 LearningRate 0.0666 Epoch: 18 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:15,349-Speed 13149.92 samples/sec Loss 5.1339 LearningRate 0.0666 Epoch: 18 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:16,909-Speed 13132.23 samples/sec Loss 5.2014 LearningRate 0.0665 Epoch: 18 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:18,471-Speed 13119.49 samples/sec Loss 5.1634 LearningRate 0.0665 Epoch: 18 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:20,045-Speed 13023.51 samples/sec Loss 5.1638 LearningRate 0.0665 Epoch: 18 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:21,622-Speed 12995.11 samples/sec Loss 5.0954 LearningRate 0.0665 Epoch: 18 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:23,182-Speed 13133.85 samples/sec Loss 5.2656 LearningRate 0.0664 Epoch: 18 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:24,755-Speed 13025.52 samples/sec Loss 5.2462 LearningRate 0.0664 Epoch: 18 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:57:26,326-Speed 13043.77 samples/sec Loss 5.2395 LearningRate 0.0664 Epoch: 18 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:27,877-Speed 13210.17 samples/sec Loss 5.1912 LearningRate 0.0664 Epoch: 18 Global Step: 45800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:29,472-Speed 12851.51 samples/sec Loss 5.2703 LearningRate 0.0663 Epoch: 18 Global Step: 45810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:31,057-Speed 12927.39 samples/sec Loss 5.2303 LearningRate 0.0663 Epoch: 18 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:32,677-Speed 12652.13 samples/sec Loss 5.3392 LearningRate 0.0663 Epoch: 18 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:34,226-Speed 13232.54 samples/sec Loss 5.1859 LearningRate 0.0663 Epoch: 18 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:35,799-Speed 13019.73 samples/sec Loss 5.2836 LearningRate 0.0662 Epoch: 18 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:37,392-Speed 12864.86 samples/sec Loss 5.2484 LearningRate 0.0662 Epoch: 18 Global Step: 45860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:38,971-Speed 12982.01 samples/sec Loss 5.2917 LearningRate 0.0662 Epoch: 18 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:40,553-Speed 12954.83 samples/sec Loss 5.1823 LearningRate 0.0662 Epoch: 18 Global Step: 45880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:42,127-Speed 13019.70 samples/sec Loss 5.3307 LearningRate 0.0662 Epoch: 18 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:57:43,697-Speed 13051.61 samples/sec Loss 5.3043 LearningRate 0.0661 Epoch: 18 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:57:45,277-Speed 12969.99 samples/sec Loss 5.4037 LearningRate 0.0661 Epoch: 18 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:57:46,847-Speed 13045.51 samples/sec Loss 5.3453 LearningRate 0.0661 Epoch: 18 Global Step: 45920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:48,401-Speed 13190.18 samples/sec Loss 5.3529 LearningRate 0.0661 Epoch: 18 Global Step: 45930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:49,977-Speed 13001.25 samples/sec Loss 5.3423 LearningRate 0.0660 Epoch: 18 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:51,591-Speed 12700.80 samples/sec Loss 5.2915 LearningRate 0.0660 Epoch: 18 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:53,161-Speed 13054.50 samples/sec Loss 5.3279 LearningRate 0.0660 Epoch: 18 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:54,725-Speed 13104.24 samples/sec Loss 5.2896 LearningRate 0.0660 Epoch: 18 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:56,300-Speed 13037.78 samples/sec Loss 5.3235 LearningRate 0.0659 Epoch: 18 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:57,883-Speed 12949.08 samples/sec Loss 5.3895 LearningRate 0.0659 Epoch: 18 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:57:59,452-Speed 13057.39 samples/sec Loss 5.3619 LearningRate 0.0659 Epoch: 18 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:01,034-Speed 12959.86 samples/sec Loss 5.3738 LearningRate 0.0659 Epoch: 18 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:02,593-Speed 13137.68 samples/sec Loss 5.3105 LearningRate 0.0658 Epoch: 18 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:58:04,161-Speed 13071.40 samples/sec Loss 5.4640 LearningRate 0.0658 Epoch: 18 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:58:05,699-Speed 13322.38 samples/sec Loss 5.4497 LearningRate 0.0658 Epoch: 18 Global Step: 46040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:07,265-Speed 13087.05 samples/sec Loss 5.3188 LearningRate 0.0658 Epoch: 18 Global Step: 46050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:08,864-Speed 12818.29 samples/sec Loss 5.3307 LearningRate 0.0657 Epoch: 18 Global Step: 46060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:10,439-Speed 13007.63 samples/sec Loss 5.4415 LearningRate 0.0657 Epoch: 18 Global Step: 46070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:12,020-Speed 12958.57 samples/sec Loss 5.3711 LearningRate 0.0657 Epoch: 18 Global Step: 46080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:13,597-Speed 12994.66 samples/sec Loss 5.4235 LearningRate 0.0657 Epoch: 18 Global Step: 46090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:15,164-Speed 13077.51 samples/sec Loss 5.5001 LearningRate 0.0657 Epoch: 18 Global Step: 46100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:16,731-Speed 13075.27 samples/sec Loss 5.4060 LearningRate 0.0656 Epoch: 18 Global Step: 46110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:18,301-Speed 13060.49 samples/sec Loss 5.4336 LearningRate 0.0656 Epoch: 18 Global Step: 46120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:19,881-Speed 12973.25 samples/sec Loss 5.4921 LearningRate 0.0656 Epoch: 18 Global Step: 46130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:58:21,461-Speed 12970.22 samples/sec Loss 5.5050 LearningRate 0.0656 Epoch: 18 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:23,038-Speed 12989.09 samples/sec Loss 5.4058 LearningRate 0.0655 Epoch: 18 Global Step: 46150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:24,616-Speed 12990.20 samples/sec Loss 5.3611 LearningRate 0.0655 Epoch: 18 Global Step: 46160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:26,172-Speed 13164.44 samples/sec Loss 5.5571 LearningRate 0.0655 Epoch: 18 Global Step: 46170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:27,773-Speed 12828.65 samples/sec Loss 5.4712 LearningRate 0.0655 Epoch: 18 Global Step: 46180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:29,352-Speed 12979.84 samples/sec Loss 5.4954 LearningRate 0.0654 Epoch: 18 Global Step: 46190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:30,918-Speed 13088.07 samples/sec Loss 5.4619 LearningRate 0.0654 Epoch: 18 Global Step: 46200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:32,501-Speed 12944.06 samples/sec Loss 5.5678 LearningRate 0.0654 Epoch: 18 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:34,077-Speed 12999.23 samples/sec Loss 5.4487 LearningRate 0.0654 Epoch: 18 Global Step: 46220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:35,647-Speed 13051.77 samples/sec Loss 5.5360 LearningRate 0.0653 Epoch: 18 Global Step: 46230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:37,219-Speed 13034.68 samples/sec Loss 5.4658 LearningRate 0.0653 Epoch: 18 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:58:38,799-Speed 12978.13 samples/sec Loss 5.5762 LearningRate 0.0653 Epoch: 18 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:58:40,369-Speed 13042.86 samples/sec Loss 5.5319 LearningRate 0.0653 Epoch: 18 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:58:41,954-Speed 12933.62 samples/sec Loss 5.5441 LearningRate 0.0652 Epoch: 18 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:43,507-Speed 13195.19 samples/sec Loss 5.4699 LearningRate 0.0652 Epoch: 18 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:45,067-Speed 13131.72 samples/sec Loss 5.5187 LearningRate 0.0652 Epoch: 18 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:46,629-Speed 13120.66 samples/sec Loss 5.6116 LearningRate 0.0652 Epoch: 18 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:48,218-Speed 12900.18 samples/sec Loss 5.5199 LearningRate 0.0652 Epoch: 18 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:49,787-Speed 13056.46 samples/sec Loss 5.4826 LearningRate 0.0651 Epoch: 18 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:51,339-Speed 13208.30 samples/sec Loss 5.5021 LearningRate 0.0651 Epoch: 18 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:52,890-Speed 13207.66 samples/sec Loss 5.6350 LearningRate 0.0651 Epoch: 18 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:54,462-Speed 13036.80 samples/sec Loss 5.4600 LearningRate 0.0651 Epoch: 18 Global Step: 46350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:56,043-Speed 12959.41 samples/sec Loss 5.5450 LearningRate 0.0650 Epoch: 18 Global Step: 46360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:57,593-Speed 13219.91 samples/sec Loss 5.5902 LearningRate 0.0650 Epoch: 18 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:58:59,133-Speed 13315.15 samples/sec Loss 5.5411 LearningRate 0.0650 Epoch: 18 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:00,718-Speed 12922.07 samples/sec Loss 5.6140 LearningRate 0.0650 Epoch: 18 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:02,303-Speed 12951.14 samples/sec Loss 5.5944 LearningRate 0.0649 Epoch: 18 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:03,879-Speed 13000.12 samples/sec Loss 5.5865 LearningRate 0.0649 Epoch: 18 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:05,464-Speed 12965.31 samples/sec Loss 5.6156 LearningRate 0.0649 Epoch: 18 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:07,039-Speed 13008.49 samples/sec Loss 5.5175 LearningRate 0.0649 Epoch: 18 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:08,631-Speed 12872.28 samples/sec Loss 5.5946 LearningRate 0.0648 Epoch: 18 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:10,192-Speed 13129.79 samples/sec Loss 5.4335 LearningRate 0.0648 Epoch: 18 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:11,772-Speed 12973.50 samples/sec Loss 5.5117 LearningRate 0.0648 Epoch: 18 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:13,329-Speed 13155.34 samples/sec Loss 5.5010 LearningRate 0.0648 Epoch: 18 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:14,893-Speed 13102.50 samples/sec Loss 5.5082 LearningRate 0.0647 Epoch: 18 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:16,495-Speed 12794.05 samples/sec Loss 5.6063 LearningRate 0.0647 Epoch: 18 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:18,048-Speed 13188.27 samples/sec Loss 5.5466 LearningRate 0.0647 Epoch: 18 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:19,654-Speed 12760.18 samples/sec Loss 5.5536 LearningRate 0.0647 Epoch: 18 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:21,233-Speed 12981.30 samples/sec Loss 5.6388 LearningRate 0.0647 Epoch: 18 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:22,801-Speed 13076.00 samples/sec Loss 5.6423 LearningRate 0.0646 Epoch: 18 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:24,360-Speed 13146.41 samples/sec Loss 5.6276 LearningRate 0.0646 Epoch: 18 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:25,912-Speed 13204.96 samples/sec Loss 5.5745 LearningRate 0.0646 Epoch: 18 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:27,468-Speed 13160.99 samples/sec Loss 5.5572 LearningRate 0.0646 Epoch: 18 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:29,045-Speed 12998.68 samples/sec Loss 5.5647 LearningRate 0.0645 Epoch: 18 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:30,621-Speed 12996.67 samples/sec Loss 5.6008 LearningRate 0.0645 Epoch: 18 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:32,203-Speed 12956.79 samples/sec Loss 5.5591 LearningRate 0.0645 Epoch: 18 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:33,781-Speed 12988.29 samples/sec Loss 5.6368 LearningRate 0.0645 Epoch: 18 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:35,345-Speed 13099.68 samples/sec Loss 5.6564 LearningRate 0.0644 Epoch: 18 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:36,912-Speed 13075.28 samples/sec Loss 5.5742 LearningRate 0.0644 Epoch: 18 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:38,481-Speed 13064.20 samples/sec Loss 5.5122 LearningRate 0.0644 Epoch: 18 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:40,062-Speed 12960.16 samples/sec Loss 5.6658 LearningRate 0.0644 Epoch: 18 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:41,615-Speed 13189.51 samples/sec Loss 5.5961 LearningRate 0.0643 Epoch: 18 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 15:59:43,162-Speed 13248.26 samples/sec Loss 5.6464 LearningRate 0.0643 Epoch: 18 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:44,736-Speed 13015.21 samples/sec Loss 5.6640 LearningRate 0.0643 Epoch: 18 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:46,307-Speed 13047.92 samples/sec Loss 5.6996 LearningRate 0.0643 Epoch: 18 Global Step: 46680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 15:59:47,868-Speed 13122.18 samples/sec Loss 5.5837 LearningRate 0.0643 Epoch: 18 Global Step: 46690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:59:49,464-Speed 12838.77 samples/sec Loss 5.5696 LearningRate 0.0642 Epoch: 18 Global Step: 46700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:59:51,020-Speed 13171.83 samples/sec Loss 5.6955 LearningRate 0.0642 Epoch: 18 Global Step: 46710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:59:52,596-Speed 13002.39 samples/sec Loss 5.6338 LearningRate 0.0642 Epoch: 18 Global Step: 46720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:59:54,156-Speed 13139.69 samples/sec Loss 5.6205 LearningRate 0.0642 Epoch: 18 Global Step: 46730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:59:55,745-Speed 12897.49 samples/sec Loss 5.5746 LearningRate 0.0641 Epoch: 18 Global Step: 46740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:59:57,308-Speed 13106.08 samples/sec Loss 5.6022 LearningRate 0.0641 Epoch: 18 Global Step: 46750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 15:59:58,877-Speed 13060.23 samples/sec Loss 5.7334 LearningRate 0.0641 Epoch: 18 Global Step: 46760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:00,438-Speed 13123.23 samples/sec Loss 5.7501 LearningRate 0.0641 Epoch: 18 Global Step: 46770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:02,012-Speed 13019.13 samples/sec Loss 5.6291 LearningRate 0.0640 Epoch: 18 Global Step: 46780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:03,587-Speed 13007.27 samples/sec Loss 5.6670 LearningRate 0.0640 Epoch: 18 Global Step: 46790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:05,165-Speed 12994.08 samples/sec Loss 5.6849 LearningRate 0.0640 Epoch: 18 Global Step: 46800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:06,739-Speed 13015.65 samples/sec Loss 5.5923 LearningRate 0.0640 Epoch: 18 Global Step: 46810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:08,297-Speed 13145.24 samples/sec Loss 5.6134 LearningRate 0.0639 Epoch: 18 Global Step: 46820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:09,863-Speed 13091.44 samples/sec Loss 5.6473 LearningRate 0.0639 Epoch: 18 Global Step: 46830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:11,436-Speed 13019.62 samples/sec Loss 5.5949 LearningRate 0.0639 Epoch: 18 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:12,981-Speed 13268.46 samples/sec Loss 5.5916 LearningRate 0.0639 Epoch: 18 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:14,545-Speed 13104.72 samples/sec Loss 5.7632 LearningRate 0.0639 Epoch: 18 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:16,121-Speed 13000.00 samples/sec Loss 5.6532 LearningRate 0.0638 Epoch: 18 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:17,696-Speed 13010.18 samples/sec Loss 5.6691 LearningRate 0.0638 Epoch: 18 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:19,280-Speed 12934.13 samples/sec Loss 5.6988 LearningRate 0.0638 Epoch: 18 Global Step: 46890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:20,857-Speed 12991.82 samples/sec Loss 5.6381 LearningRate 0.0638 Epoch: 18 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:22,437-Speed 12973.88 samples/sec Loss 5.7728 LearningRate 0.0637 Epoch: 18 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:24,034-Speed 12829.51 samples/sec Loss 5.6274 LearningRate 0.0637 Epoch: 18 Global Step: 46920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:25,585-Speed 13212.01 samples/sec Loss 5.6309 LearningRate 0.0637 Epoch: 18 Global Step: 46930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:27,151-Speed 13078.88 samples/sec Loss 5.6834 LearningRate 0.0637 Epoch: 18 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:28,707-Speed 13172.40 samples/sec Loss 5.6951 LearningRate 0.0636 Epoch: 18 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:30,310-Speed 12784.82 samples/sec Loss 5.7592 LearningRate 0.0636 Epoch: 18 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:31,879-Speed 13052.80 samples/sec Loss 5.6257 LearningRate 0.0636 Epoch: 18 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:33,445-Speed 13088.66 samples/sec Loss 5.6117 LearningRate 0.0636 Epoch: 18 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:35,026-Speed 12961.09 samples/sec Loss 5.7003 LearningRate 0.0635 Epoch: 18 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:36,579-Speed 13193.57 samples/sec Loss 5.6805 LearningRate 0.0635 Epoch: 18 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:38,153-Speed 13022.35 samples/sec Loss 5.6621 LearningRate 0.0635 Epoch: 18 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:39,742-Speed 12927.40 samples/sec Loss 5.7381 LearningRate 0.0635 Epoch: 18 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:41,338-Speed 12839.57 samples/sec Loss 5.6643 LearningRate 0.0635 Epoch: 18 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:42,930-Speed 12870.61 samples/sec Loss 5.6347 LearningRate 0.0634 Epoch: 18 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:44,521-Speed 12911.19 samples/sec Loss 5.6581 LearningRate 0.0634 Epoch: 18 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:46,125-Speed 12771.45 samples/sec Loss 5.7117 LearningRate 0.0634 Epoch: 18 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:00:47,690-Speed 13095.29 samples/sec Loss 5.6773 LearningRate 0.0634 Epoch: 18 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:00:49,281-Speed 12882.46 samples/sec Loss 5.6128 LearningRate 0.0633 Epoch: 18 Global Step: 47080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:50,875-Speed 12852.41 samples/sec Loss 5.7367 LearningRate 0.0633 Epoch: 18 Global Step: 47090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:52,451-Speed 13000.59 samples/sec Loss 5.6063 LearningRate 0.0633 Epoch: 18 Global Step: 47100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:54,032-Speed 12968.13 samples/sec Loss 5.7441 LearningRate 0.0633 Epoch: 18 Global Step: 47110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:55,616-Speed 12934.65 samples/sec Loss 5.6380 LearningRate 0.0632 Epoch: 18 Global Step: 47120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:57,210-Speed 12853.69 samples/sec Loss 5.6295 LearningRate 0.0632 Epoch: 18 Global Step: 47130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:00:58,779-Speed 13056.46 samples/sec Loss 5.7340 LearningRate 0.0632 Epoch: 18 Global Step: 47140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:01:00,391-Speed 12714.56 samples/sec Loss 5.7229 LearningRate 0.0632 Epoch: 18 Global Step: 47150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:01:01,951-Speed 13132.86 samples/sec Loss 5.6608 LearningRate 0.0631 Epoch: 18 Global Step: 47160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:01:03,534-Speed 12944.17 samples/sec Loss 5.6241 LearningRate 0.0631 Epoch: 18 Global Step: 47170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:01:05,118-Speed 12942.32 samples/sec Loss 5.7725 LearningRate 0.0631 Epoch: 18 Global Step: 47180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:06,702-Speed 12934.86 samples/sec Loss 5.7117 LearningRate 0.0631 Epoch: 18 Global Step: 47190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:08,272-Speed 13051.66 samples/sec Loss 5.6939 LearningRate 0.0631 Epoch: 18 Global Step: 47200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:09,849-Speed 12995.73 samples/sec Loss 5.6198 LearningRate 0.0630 Epoch: 18 Global Step: 47210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:11,426-Speed 12993.65 samples/sec Loss 5.6426 LearningRate 0.0630 Epoch: 18 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:12,997-Speed 13039.16 samples/sec Loss 5.5906 LearningRate 0.0630 Epoch: 18 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:14,574-Speed 13000.70 samples/sec Loss 5.6350 LearningRate 0.0630 Epoch: 18 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:16,146-Speed 13027.41 samples/sec Loss 5.6749 LearningRate 0.0629 Epoch: 18 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:17,719-Speed 13029.84 samples/sec Loss 5.6894 LearningRate 0.0629 Epoch: 18 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:19,284-Speed 13092.14 samples/sec Loss 5.6752 LearningRate 0.0629 Epoch: 18 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:20,877-Speed 12863.45 samples/sec Loss 5.6697 LearningRate 0.0629 Epoch: 18 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:22,448-Speed 13047.83 samples/sec Loss 5.6602 LearningRate 0.0628 Epoch: 18 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:24,011-Speed 13108.88 samples/sec Loss 5.7264 LearningRate 0.0628 Epoch: 18 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:25,580-Speed 13061.69 samples/sec Loss 5.6659 LearningRate 0.0628 Epoch: 18 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:27,147-Speed 13074.75 samples/sec Loss 5.6487 LearningRate 0.0628 Epoch: 18 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:28,697-Speed 13216.22 samples/sec Loss 5.6948 LearningRate 0.0628 Epoch: 18 Global Step: 47330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:30,277-Speed 12977.10 samples/sec Loss 5.7041 LearningRate 0.0627 Epoch: 18 Global Step: 47340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:31,852-Speed 13007.60 samples/sec Loss 5.6517 LearningRate 0.0627 Epoch: 18 Global Step: 47350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:33,411-Speed 13141.32 samples/sec Loss 5.6618 LearningRate 0.0627 Epoch: 18 Global Step: 47360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:34,987-Speed 13006.99 samples/sec Loss 5.6671 LearningRate 0.0627 Epoch: 18 Global Step: 47370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:36,578-Speed 12871.92 samples/sec Loss 5.6528 LearningRate 0.0626 Epoch: 18 Global Step: 47380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:38,171-Speed 12862.64 samples/sec Loss 5.7383 LearningRate 0.0626 Epoch: 18 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:39,735-Speed 13104.01 samples/sec Loss 5.6131 LearningRate 0.0626 Epoch: 18 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:41,316-Speed 12961.46 samples/sec Loss 5.7164 LearningRate 0.0626 Epoch: 18 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:42,908-Speed 12875.76 samples/sec Loss 5.6902 LearningRate 0.0625 Epoch: 18 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:44,473-Speed 13088.59 samples/sec Loss 5.6272 LearningRate 0.0625 Epoch: 18 Global Step: 47430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:46,028-Speed 13181.31 samples/sec Loss 5.7399 LearningRate 0.0625 Epoch: 18 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:47,594-Speed 13086.10 samples/sec Loss 5.6923 LearningRate 0.0625 Epoch: 18 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:01:49,156-Speed 13120.14 samples/sec Loss 5.7362 LearningRate 0.0624 Epoch: 18 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:50,723-Speed 13072.29 samples/sec Loss 5.5937 LearningRate 0.0624 Epoch: 18 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:52,306-Speed 12941.98 samples/sec Loss 5.6902 LearningRate 0.0624 Epoch: 18 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:53,893-Speed 12906.57 samples/sec Loss 5.7255 LearningRate 0.0624 Epoch: 18 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:55,515-Speed 12638.27 samples/sec Loss 5.6747 LearningRate 0.0624 Epoch: 18 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:57,128-Speed 12702.75 samples/sec Loss 5.6739 LearningRate 0.0623 Epoch: 18 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:01:58,710-Speed 12951.67 samples/sec Loss 5.7161 LearningRate 0.0623 Epoch: 18 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:00,287-Speed 12999.47 samples/sec Loss 5.7137 LearningRate 0.0623 Epoch: 18 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:01,830-Speed 13277.25 samples/sec Loss 5.7109 LearningRate 0.0623 Epoch: 18 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:03,379-Speed 13228.57 samples/sec Loss 5.6345 LearningRate 0.0622 Epoch: 18 Global Step: 47550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:04,953-Speed 13021.89 samples/sec Loss 5.8242 LearningRate 0.0622 Epoch: 18 Global Step: 47560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:06,546-Speed 12861.22 samples/sec Loss 5.7501 LearningRate 0.0622 Epoch: 18 Global Step: 47570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:08,107-Speed 13124.00 samples/sec Loss 5.7641 LearningRate 0.0622 Epoch: 18 Global Step: 47580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:09,683-Speed 13005.00 samples/sec Loss 5.7389 LearningRate 0.0621 Epoch: 18 Global Step: 47590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:11,258-Speed 13012.81 samples/sec Loss 5.7646 LearningRate 0.0621 Epoch: 18 Global Step: 47600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:12,831-Speed 13022.18 samples/sec Loss 5.7159 LearningRate 0.0621 Epoch: 18 Global Step: 47610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:14,400-Speed 13064.25 samples/sec Loss 5.7896 LearningRate 0.0621 Epoch: 18 Global Step: 47620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:15,967-Speed 13072.96 samples/sec Loss 5.6737 LearningRate 0.0621 Epoch: 18 Global Step: 47630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:17,544-Speed 12992.50 samples/sec Loss 5.7324 LearningRate 0.0620 Epoch: 18 Global Step: 47640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:02:19,092-Speed 13240.60 samples/sec Loss 5.6520 LearningRate 0.0620 Epoch: 18 Global Step: 47650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:20,665-Speed 13019.71 samples/sec Loss 5.5736 LearningRate 0.0620 Epoch: 18 Global Step: 47660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:22,245-Speed 12973.89 samples/sec Loss 5.7370 LearningRate 0.0620 Epoch: 18 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:23,808-Speed 13105.97 samples/sec Loss 5.7532 LearningRate 0.0619 Epoch: 18 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:25,375-Speed 13079.51 samples/sec Loss 5.6924 LearningRate 0.0619 Epoch: 18 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:26,953-Speed 12985.28 samples/sec Loss 5.6419 LearningRate 0.0619 Epoch: 18 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:28,528-Speed 13009.95 samples/sec Loss 5.6115 LearningRate 0.0619 Epoch: 18 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:30,096-Speed 13068.79 samples/sec Loss 5.7228 LearningRate 0.0618 Epoch: 18 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:31,670-Speed 13017.51 samples/sec Loss 5.7937 LearningRate 0.0618 Epoch: 18 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:33,248-Speed 12987.81 samples/sec Loss 5.7172 LearningRate 0.0618 Epoch: 18 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:34,806-Speed 13152.91 samples/sec Loss 5.7609 LearningRate 0.0618 Epoch: 18 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:02:36,367-Speed 13123.60 samples/sec Loss 5.7272 LearningRate 0.0618 Epoch: 18 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:37,935-Speed 13065.36 samples/sec Loss 5.7771 LearningRate 0.0617 Epoch: 18 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:39,504-Speed 13100.58 samples/sec Loss 5.5737 LearningRate 0.0617 Epoch: 18 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:41,079-Speed 13008.58 samples/sec Loss 5.6410 LearningRate 0.0617 Epoch: 18 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:42,667-Speed 12909.71 samples/sec Loss 5.6807 LearningRate 0.0617 Epoch: 18 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:44,237-Speed 13052.85 samples/sec Loss 5.6906 LearningRate 0.0616 Epoch: 18 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:45,821-Speed 12933.99 samples/sec Loss 5.7109 LearningRate 0.0616 Epoch: 18 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:47,401-Speed 12967.70 samples/sec Loss 5.6993 LearningRate 0.0616 Epoch: 18 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:48,965-Speed 13102.27 samples/sec Loss 5.6966 LearningRate 0.0616 Epoch: 18 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:50,530-Speed 13089.78 samples/sec Loss 5.7320 LearningRate 0.0615 Epoch: 18 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:52,105-Speed 13012.11 samples/sec Loss 5.6774 LearningRate 0.0615 Epoch: 18 Global Step: 47860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:02:53,657-Speed 13200.05 samples/sec Loss 5.6530 LearningRate 0.0615 Epoch: 18 Global Step: 47870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:02:55,220-Speed 13112.24 samples/sec Loss 5.7496 LearningRate 0.0615 Epoch: 18 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:56,798-Speed 12981.96 samples/sec Loss 5.7121 LearningRate 0.0615 Epoch: 18 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:58,381-Speed 12949.27 samples/sec Loss 5.6639 LearningRate 0.0614 Epoch: 18 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:02:59,954-Speed 13022.77 samples/sec Loss 5.7397 LearningRate 0.0614 Epoch: 18 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:01,516-Speed 13124.06 samples/sec Loss 5.7074 LearningRate 0.0614 Epoch: 18 Global Step: 47920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:03,078-Speed 13115.18 samples/sec Loss 5.6998 LearningRate 0.0614 Epoch: 18 Global Step: 47930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:04,660-Speed 12959.86 samples/sec Loss 5.7955 LearningRate 0.0613 Epoch: 18 Global Step: 47940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:06,226-Speed 13083.11 samples/sec Loss 5.7432 LearningRate 0.0613 Epoch: 18 Global Step: 47950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:07,787-Speed 13124.69 samples/sec Loss 5.7921 LearningRate 0.0613 Epoch: 18 Global Step: 47960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:09,340-Speed 13195.71 samples/sec Loss 5.7499 LearningRate 0.0613 Epoch: 18 Global Step: 47970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:10,925-Speed 12923.27 samples/sec Loss 5.7052 LearningRate 0.0612 Epoch: 18 Global Step: 47980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:12,501-Speed 13000.36 samples/sec Loss 5.7096 LearningRate 0.0612 Epoch: 18 Global Step: 47990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:14,078-Speed 12993.94 samples/sec Loss 5.6618 LearningRate 0.0612 Epoch: 18 Global Step: 48000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:15,647-Speed 13067.06 samples/sec Loss 5.6379 LearningRate 0.0612 Epoch: 18 Global Step: 48010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:03:17,208-Speed 13123.04 samples/sec Loss 5.6858 LearningRate 0.0612 Epoch: 18 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:18,792-Speed 12931.49 samples/sec Loss 5.7521 LearningRate 0.0611 Epoch: 18 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:20,432-Speed 12494.85 samples/sec Loss 5.6707 LearningRate 0.0611 Epoch: 18 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:21,943-Speed 13559.01 samples/sec Loss 5.6969 LearningRate 0.0611 Epoch: 18 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:36,049-Speed 1451.98 samples/sec Loss 5.0233 LearningRate 0.0611 Epoch: 19 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:37,642-Speed 12867.99 samples/sec Loss 4.9485 LearningRate 0.0610 Epoch: 19 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:39,216-Speed 13022.10 samples/sec Loss 4.8586 LearningRate 0.0610 Epoch: 19 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:40,808-Speed 12875.21 samples/sec Loss 5.0016 LearningRate 0.0610 Epoch: 19 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:42,418-Speed 12721.15 samples/sec Loss 4.9036 LearningRate 0.0610 Epoch: 19 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:44,001-Speed 12951.87 samples/sec Loss 4.7991 LearningRate 0.0609 Epoch: 19 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:45,553-Speed 13197.45 samples/sec Loss 4.8325 LearningRate 0.0609 Epoch: 19 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:47,122-Speed 13062.15 samples/sec Loss 4.9711 LearningRate 0.0609 Epoch: 19 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:48,695-Speed 13025.30 samples/sec Loss 4.9365 LearningRate 0.0609 Epoch: 19 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:50,289-Speed 12853.38 samples/sec Loss 4.8894 LearningRate 0.0609 Epoch: 19 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:51,881-Speed 12871.41 samples/sec Loss 4.9459 LearningRate 0.0608 Epoch: 19 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:53,434-Speed 13192.83 samples/sec Loss 4.9014 LearningRate 0.0608 Epoch: 19 Global Step: 48170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:55,026-Speed 12878.57 samples/sec Loss 4.9615 LearningRate 0.0608 Epoch: 19 Global Step: 48180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:56,625-Speed 12814.57 samples/sec Loss 4.9051 LearningRate 0.0608 Epoch: 19 Global Step: 48190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:58,185-Speed 13137.44 samples/sec Loss 4.9973 LearningRate 0.0607 Epoch: 19 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:03:59,773-Speed 12908.68 samples/sec Loss 5.0134 LearningRate 0.0607 Epoch: 19 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:01,355-Speed 12951.35 samples/sec Loss 4.9920 LearningRate 0.0607 Epoch: 19 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:02,962-Speed 12752.62 samples/sec Loss 4.9714 LearningRate 0.0607 Epoch: 19 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:04,544-Speed 12952.05 samples/sec Loss 5.1304 LearningRate 0.0606 Epoch: 19 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:06,122-Speed 12986.45 samples/sec Loss 5.0424 LearningRate 0.0606 Epoch: 19 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:07,663-Speed 13291.39 samples/sec Loss 4.9905 LearningRate 0.0606 Epoch: 19 Global Step: 48260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:09,243-Speed 12967.41 samples/sec Loss 5.1198 LearningRate 0.0606 Epoch: 19 Global Step: 48270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:10,818-Speed 13014.18 samples/sec Loss 5.1742 LearningRate 0.0606 Epoch: 19 Global Step: 48280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:12,412-Speed 12854.00 samples/sec Loss 5.1201 LearningRate 0.0605 Epoch: 19 Global Step: 48290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:13,991-Speed 12974.68 samples/sec Loss 5.0780 LearningRate 0.0605 Epoch: 19 Global Step: 48300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:15,582-Speed 12879.37 samples/sec Loss 5.1759 LearningRate 0.0605 Epoch: 19 Global Step: 48310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:17,149-Speed 13073.49 samples/sec Loss 5.1530 LearningRate 0.0605 Epoch: 19 Global Step: 48320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:18,718-Speed 13057.66 samples/sec Loss 5.1206 LearningRate 0.0604 Epoch: 19 Global Step: 48330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:20,280-Speed 13119.33 samples/sec Loss 5.1225 LearningRate 0.0604 Epoch: 19 Global Step: 48340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:21,879-Speed 12825.01 samples/sec Loss 5.0666 LearningRate 0.0604 Epoch: 19 Global Step: 48350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:23,435-Speed 13163.34 samples/sec Loss 5.1576 LearningRate 0.0604 Epoch: 19 Global Step: 48360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:25,016-Speed 12961.45 samples/sec Loss 5.1669 LearningRate 0.0603 Epoch: 19 Global Step: 48370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:26,592-Speed 13004.79 samples/sec Loss 5.1624 LearningRate 0.0603 Epoch: 19 Global Step: 48380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:28,152-Speed 13131.30 samples/sec Loss 5.1624 LearningRate 0.0603 Epoch: 19 Global Step: 48390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:29,717-Speed 13094.63 samples/sec Loss 5.1525 LearningRate 0.0603 Epoch: 19 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:31,323-Speed 12755.05 samples/sec Loss 5.1904 LearningRate 0.0603 Epoch: 19 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:32,921-Speed 12827.40 samples/sec Loss 5.1424 LearningRate 0.0602 Epoch: 19 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:34,472-Speed 13211.23 samples/sec Loss 5.2460 LearningRate 0.0602 Epoch: 19 Global Step: 48430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:36,049-Speed 12996.77 samples/sec Loss 5.1509 LearningRate 0.0602 Epoch: 19 Global Step: 48440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:37,626-Speed 13008.23 samples/sec Loss 5.2340 LearningRate 0.0602 Epoch: 19 Global Step: 48450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:39,197-Speed 13041.23 samples/sec Loss 5.2731 LearningRate 0.0601 Epoch: 19 Global Step: 48460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:40,753-Speed 13178.91 samples/sec Loss 5.2443 LearningRate 0.0601 Epoch: 19 Global Step: 48470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:42,328-Speed 13012.81 samples/sec Loss 5.1649 LearningRate 0.0601 Epoch: 19 Global Step: 48480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:43,907-Speed 12974.63 samples/sec Loss 5.2013 LearningRate 0.0601 Epoch: 19 Global Step: 48490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:45,462-Speed 13185.18 samples/sec Loss 5.2556 LearningRate 0.0601 Epoch: 19 Global Step: 48500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:47,038-Speed 13003.05 samples/sec Loss 5.2058 LearningRate 0.0600 Epoch: 19 Global Step: 48510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:48,603-Speed 13085.90 samples/sec Loss 5.2692 LearningRate 0.0600 Epoch: 19 Global Step: 48520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:04:50,170-Speed 13083.74 samples/sec Loss 5.1667 LearningRate 0.0600 Epoch: 19 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:51,757-Speed 12905.48 samples/sec Loss 5.3233 LearningRate 0.0600 Epoch: 19 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:53,308-Speed 13216.52 samples/sec Loss 5.2605 LearningRate 0.0599 Epoch: 19 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:54,882-Speed 13019.76 samples/sec Loss 5.3901 LearningRate 0.0599 Epoch: 19 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:56,458-Speed 13002.25 samples/sec Loss 5.3663 LearningRate 0.0599 Epoch: 19 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:58,021-Speed 13112.96 samples/sec Loss 5.2557 LearningRate 0.0599 Epoch: 19 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:04:59,585-Speed 13102.27 samples/sec Loss 5.2174 LearningRate 0.0598 Epoch: 19 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:01,198-Speed 12702.64 samples/sec Loss 5.3629 LearningRate 0.0598 Epoch: 19 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:02,762-Speed 13097.59 samples/sec Loss 5.3414 LearningRate 0.0598 Epoch: 19 Global Step: 48610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:04,334-Speed 13044.56 samples/sec Loss 5.2934 LearningRate 0.0598 Epoch: 19 Global Step: 48620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:05,918-Speed 12935.79 samples/sec Loss 5.2375 LearningRate 0.0598 Epoch: 19 Global Step: 48630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:07,462-Speed 13278.26 samples/sec Loss 5.2323 LearningRate 0.0597 Epoch: 19 Global Step: 48640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:09,044-Speed 12950.10 samples/sec Loss 5.3333 LearningRate 0.0597 Epoch: 19 Global Step: 48650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:10,616-Speed 13036.51 samples/sec Loss 5.3455 LearningRate 0.0597 Epoch: 19 Global Step: 48660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:12,191-Speed 13012.17 samples/sec Loss 5.3125 LearningRate 0.0597 Epoch: 19 Global Step: 48670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:13,759-Speed 13066.40 samples/sec Loss 5.3877 LearningRate 0.0596 Epoch: 19 Global Step: 48680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:15,327-Speed 13065.09 samples/sec Loss 5.3517 LearningRate 0.0596 Epoch: 19 Global Step: 48690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:16,879-Speed 13202.49 samples/sec Loss 5.3199 LearningRate 0.0596 Epoch: 19 Global Step: 48700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:18,447-Speed 13075.10 samples/sec Loss 5.2922 LearningRate 0.0596 Epoch: 19 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:20,014-Speed 13077.43 samples/sec Loss 5.3584 LearningRate 0.0596 Epoch: 19 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:21,583-Speed 13082.16 samples/sec Loss 5.2966 LearningRate 0.0595 Epoch: 19 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:23,209-Speed 12603.10 samples/sec Loss 5.3616 LearningRate 0.0595 Epoch: 19 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:24,767-Speed 13155.70 samples/sec Loss 5.3214 LearningRate 0.0595 Epoch: 19 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:26,329-Speed 13110.47 samples/sec Loss 5.3953 LearningRate 0.0595 Epoch: 19 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:27,892-Speed 13115.22 samples/sec Loss 5.3441 LearningRate 0.0594 Epoch: 19 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:29,505-Speed 12704.01 samples/sec Loss 5.3094 LearningRate 0.0594 Epoch: 19 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:31,101-Speed 12840.21 samples/sec Loss 5.3889 LearningRate 0.0594 Epoch: 19 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:32,664-Speed 13107.31 samples/sec Loss 5.3568 LearningRate 0.0594 Epoch: 19 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:34,257-Speed 12864.38 samples/sec Loss 5.2872 LearningRate 0.0593 Epoch: 19 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:05:35,832-Speed 13011.27 samples/sec Loss 5.4064 LearningRate 0.0593 Epoch: 19 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:37,395-Speed 13114.34 samples/sec Loss 5.3801 LearningRate 0.0593 Epoch: 19 Global Step: 48830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:38,989-Speed 12848.92 samples/sec Loss 5.2989 LearningRate 0.0593 Epoch: 19 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:40,560-Speed 13046.23 samples/sec Loss 5.3120 LearningRate 0.0593 Epoch: 19 Global Step: 48850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:42,159-Speed 12813.23 samples/sec Loss 5.4524 LearningRate 0.0592 Epoch: 19 Global Step: 48860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:43,727-Speed 13068.89 samples/sec Loss 5.3572 LearningRate 0.0592 Epoch: 19 Global Step: 48870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:45,294-Speed 13074.12 samples/sec Loss 5.4420 LearningRate 0.0592 Epoch: 19 Global Step: 48880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:46,858-Speed 13121.63 samples/sec Loss 5.4498 LearningRate 0.0592 Epoch: 19 Global Step: 48890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:48,422-Speed 13099.70 samples/sec Loss 5.3113 LearningRate 0.0591 Epoch: 19 Global Step: 48900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:50,003-Speed 12964.05 samples/sec Loss 5.3603 LearningRate 0.0591 Epoch: 19 Global Step: 48910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:51,581-Speed 12987.79 samples/sec Loss 5.4887 LearningRate 0.0591 Epoch: 19 Global Step: 48920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:05:53,179-Speed 12820.86 samples/sec Loss 5.4936 LearningRate 0.0591 Epoch: 19 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:54,774-Speed 12849.51 samples/sec Loss 5.4503 LearningRate 0.0591 Epoch: 19 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:56,345-Speed 13038.78 samples/sec Loss 5.4292 LearningRate 0.0590 Epoch: 19 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:57,932-Speed 12915.21 samples/sec Loss 5.3379 LearningRate 0.0590 Epoch: 19 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:05:59,502-Speed 13054.81 samples/sec Loss 5.4973 LearningRate 0.0590 Epoch: 19 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:01,092-Speed 12885.53 samples/sec Loss 5.4295 LearningRate 0.0590 Epoch: 19 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:02,653-Speed 13121.38 samples/sec Loss 5.4802 LearningRate 0.0589 Epoch: 19 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:04,226-Speed 13034.15 samples/sec Loss 5.4365 LearningRate 0.0589 Epoch: 19 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:05,800-Speed 13017.15 samples/sec Loss 5.5069 LearningRate 0.0589 Epoch: 19 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:07,369-Speed 13056.03 samples/sec Loss 5.4674 LearningRate 0.0589 Epoch: 19 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:08,948-Speed 12980.65 samples/sec Loss 5.3766 LearningRate 0.0588 Epoch: 19 Global Step: 49030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:06:10,529-Speed 12957.26 samples/sec Loss 5.3116 LearningRate 0.0588 Epoch: 19 Global Step: 49040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:06:12,136-Speed 12756.55 samples/sec Loss 5.4573 LearningRate 0.0588 Epoch: 19 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:13,719-Speed 12939.17 samples/sec Loss 5.4856 LearningRate 0.0588 Epoch: 19 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:15,274-Speed 13187.12 samples/sec Loss 5.3633 LearningRate 0.0588 Epoch: 19 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:16,854-Speed 12965.69 samples/sec Loss 5.4882 LearningRate 0.0587 Epoch: 19 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:18,419-Speed 13094.39 samples/sec Loss 5.4546 LearningRate 0.0587 Epoch: 19 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:20,005-Speed 12922.92 samples/sec Loss 5.4363 LearningRate 0.0587 Epoch: 19 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:21,590-Speed 12927.09 samples/sec Loss 5.5247 LearningRate 0.0587 Epoch: 19 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:23,162-Speed 13028.93 samples/sec Loss 5.3825 LearningRate 0.0586 Epoch: 19 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:24,725-Speed 13116.44 samples/sec Loss 5.5038 LearningRate 0.0586 Epoch: 19 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:26,287-Speed 13114.73 samples/sec Loss 5.5042 LearningRate 0.0586 Epoch: 19 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:27,849-Speed 13117.70 samples/sec Loss 5.4735 LearningRate 0.0586 Epoch: 19 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:06:29,398-Speed 13222.26 samples/sec Loss 5.4752 LearningRate 0.0586 Epoch: 19 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:06:30,965-Speed 13085.38 samples/sec Loss 5.4976 LearningRate 0.0585 Epoch: 19 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:32,509-Speed 13266.60 samples/sec Loss 5.4122 LearningRate 0.0585 Epoch: 19 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:34,088-Speed 12980.61 samples/sec Loss 5.5257 LearningRate 0.0585 Epoch: 19 Global Step: 49190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:35,667-Speed 12980.60 samples/sec Loss 5.4835 LearningRate 0.0585 Epoch: 19 Global Step: 49200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:37,230-Speed 13107.20 samples/sec Loss 5.4410 LearningRate 0.0584 Epoch: 19 Global Step: 49210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:38,829-Speed 12817.82 samples/sec Loss 5.5106 LearningRate 0.0584 Epoch: 19 Global Step: 49220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:40,417-Speed 12902.33 samples/sec Loss 5.5353 LearningRate 0.0584 Epoch: 19 Global Step: 49230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:41,999-Speed 12950.61 samples/sec Loss 5.5281 LearningRate 0.0584 Epoch: 19 Global Step: 49240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:43,588-Speed 12902.30 samples/sec Loss 5.4540 LearningRate 0.0584 Epoch: 19 Global Step: 49250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:45,173-Speed 12927.48 samples/sec Loss 5.5442 LearningRate 0.0583 Epoch: 19 Global Step: 49260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:46,739-Speed 13086.25 samples/sec Loss 5.5586 LearningRate 0.0583 Epoch: 19 Global Step: 49270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:48,312-Speed 13027.21 samples/sec Loss 5.4918 LearningRate 0.0583 Epoch: 19 Global Step: 49280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:06:49,897-Speed 12926.14 samples/sec Loss 5.5205 LearningRate 0.0583 Epoch: 19 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:51,497-Speed 12813.96 samples/sec Loss 5.5001 LearningRate 0.0582 Epoch: 19 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:53,035-Speed 13320.91 samples/sec Loss 5.4696 LearningRate 0.0582 Epoch: 19 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:54,610-Speed 13016.88 samples/sec Loss 5.5234 LearningRate 0.0582 Epoch: 19 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:56,189-Speed 12971.81 samples/sec Loss 5.4629 LearningRate 0.0582 Epoch: 19 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:57,753-Speed 13095.93 samples/sec Loss 5.5390 LearningRate 0.0582 Epoch: 19 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:06:59,302-Speed 13232.11 samples/sec Loss 5.5382 LearningRate 0.0581 Epoch: 19 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:00,887-Speed 12930.60 samples/sec Loss 5.5439 LearningRate 0.0581 Epoch: 19 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:02,440-Speed 13195.98 samples/sec Loss 5.5170 LearningRate 0.0581 Epoch: 19 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:04,007-Speed 13080.94 samples/sec Loss 5.5729 LearningRate 0.0581 Epoch: 19 Global Step: 49380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:05,595-Speed 12912.65 samples/sec Loss 5.5767 LearningRate 0.0580 Epoch: 19 Global Step: 49390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:07,153-Speed 13151.36 samples/sec Loss 5.4803 LearningRate 0.0580 Epoch: 19 Global Step: 49400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:08,718-Speed 13090.71 samples/sec Loss 5.4322 LearningRate 0.0580 Epoch: 19 Global Step: 49410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:10,304-Speed 12920.45 samples/sec Loss 5.4904 LearningRate 0.0580 Epoch: 19 Global Step: 49420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:11,865-Speed 13130.61 samples/sec Loss 5.5812 LearningRate 0.0579 Epoch: 19 Global Step: 49430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:13,422-Speed 13152.40 samples/sec Loss 5.5997 LearningRate 0.0579 Epoch: 19 Global Step: 49440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:14,991-Speed 13062.68 samples/sec Loss 5.5072 LearningRate 0.0579 Epoch: 19 Global Step: 49450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:16,573-Speed 12953.75 samples/sec Loss 5.5424 LearningRate 0.0579 Epoch: 19 Global Step: 49460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:18,168-Speed 12842.32 samples/sec Loss 5.5293 LearningRate 0.0579 Epoch: 19 Global Step: 49470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:19,754-Speed 12924.30 samples/sec Loss 5.5298 LearningRate 0.0578 Epoch: 19 Global Step: 49480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 16:07:21,324-Speed 13051.56 samples/sec Loss 5.5298 LearningRate 0.0578 Epoch: 19 Global Step: 49490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:22,890-Speed 13086.03 samples/sec Loss 5.6348 LearningRate 0.0578 Epoch: 19 Global Step: 49500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:24,458-Speed 13072.30 samples/sec Loss 5.5660 LearningRate 0.0578 Epoch: 19 Global Step: 49510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:26,011-Speed 13190.52 samples/sec Loss 5.5551 LearningRate 0.0577 Epoch: 19 Global Step: 49520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:27,567-Speed 13172.77 samples/sec Loss 5.5652 LearningRate 0.0577 Epoch: 19 Global Step: 49530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:29,115-Speed 13230.50 samples/sec Loss 5.4460 LearningRate 0.0577 Epoch: 19 Global Step: 49540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:30,687-Speed 13045.59 samples/sec Loss 5.5915 LearningRate 0.0577 Epoch: 19 Global Step: 49550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:32,271-Speed 12931.76 samples/sec Loss 5.5148 LearningRate 0.0577 Epoch: 19 Global Step: 49560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:33,831-Speed 13132.20 samples/sec Loss 5.4839 LearningRate 0.0576 Epoch: 19 Global Step: 49570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:35,413-Speed 12960.37 samples/sec Loss 5.4575 LearningRate 0.0576 Epoch: 19 Global Step: 49580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:07:36,970-Speed 13153.58 samples/sec Loss 5.6090 LearningRate 0.0576 Epoch: 19 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:38,535-Speed 13093.33 samples/sec Loss 5.4528 LearningRate 0.0576 Epoch: 19 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:40,109-Speed 13025.49 samples/sec Loss 5.5294 LearningRate 0.0575 Epoch: 19 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:41,670-Speed 13125.28 samples/sec Loss 5.4764 LearningRate 0.0575 Epoch: 19 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:43,230-Speed 13130.29 samples/sec Loss 5.5169 LearningRate 0.0575 Epoch: 19 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:44,803-Speed 13032.97 samples/sec Loss 5.4820 LearningRate 0.0575 Epoch: 19 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:46,366-Speed 13109.90 samples/sec Loss 5.5551 LearningRate 0.0575 Epoch: 19 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:47,954-Speed 12903.06 samples/sec Loss 5.5320 LearningRate 0.0574 Epoch: 19 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:49,521-Speed 13079.86 samples/sec Loss 5.5244 LearningRate 0.0574 Epoch: 19 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:51,088-Speed 13081.31 samples/sec Loss 5.5474 LearningRate 0.0574 Epoch: 19 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:07:52,655-Speed 13071.40 samples/sec Loss 5.5742 LearningRate 0.0574 Epoch: 19 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:07:54,219-Speed 13096.07 samples/sec Loss 5.4856 LearningRate 0.0573 Epoch: 19 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:07:55,762-Speed 13280.82 samples/sec Loss 5.5563 LearningRate 0.0573 Epoch: 19 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:07:57,338-Speed 13005.82 samples/sec Loss 5.4740 LearningRate 0.0573 Epoch: 19 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:07:58,890-Speed 13202.68 samples/sec Loss 5.5974 LearningRate 0.0573 Epoch: 19 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:08:00,476-Speed 12920.29 samples/sec Loss 5.6553 LearningRate 0.0573 Epoch: 19 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:08:02,017-Speed 13294.41 samples/sec Loss 5.5284 LearningRate 0.0572 Epoch: 19 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:08:03,581-Speed 13107.00 samples/sec Loss 5.6072 LearningRate 0.0572 Epoch: 19 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:08:05,166-Speed 12924.17 samples/sec Loss 5.5595 LearningRate 0.0572 Epoch: 19 Global Step: 49770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:08:06,753-Speed 12913.79 samples/sec Loss 5.6077 LearningRate 0.0572 Epoch: 19 Global Step: 49780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:08:08,333-Speed 12972.54 samples/sec Loss 5.5893 LearningRate 0.0571 Epoch: 19 Global Step: 49790 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-14 16:08:09,921-Speed 12901.90 samples/sec Loss 5.5785 LearningRate 0.0571 Epoch: 19 Global Step: 49800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:08:11,471-Speed 13220.79 samples/sec Loss 5.5679 LearningRate 0.0571 Epoch: 19 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:13,038-Speed 13072.20 samples/sec Loss 5.6012 LearningRate 0.0571 Epoch: 19 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:14,634-Speed 12840.84 samples/sec Loss 5.4906 LearningRate 0.0571 Epoch: 19 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:16,184-Speed 13225.66 samples/sec Loss 5.5792 LearningRate 0.0570 Epoch: 19 Global Step: 49840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:17,762-Speed 12983.16 samples/sec Loss 5.5211 LearningRate 0.0570 Epoch: 19 Global Step: 49850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:19,360-Speed 12823.27 samples/sec Loss 5.6318 LearningRate 0.0570 Epoch: 19 Global Step: 49860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:20,951-Speed 12885.43 samples/sec Loss 5.6383 LearningRate 0.0570 Epoch: 19 Global Step: 49870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:22,549-Speed 12820.19 samples/sec Loss 5.5252 LearningRate 0.0569 Epoch: 19 Global Step: 49880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:24,107-Speed 13155.84 samples/sec Loss 5.4907 LearningRate 0.0569 Epoch: 19 Global Step: 49890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:25,656-Speed 13252.36 samples/sec Loss 5.5923 LearningRate 0.0569 Epoch: 19 Global Step: 49900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:27,230-Speed 13019.29 samples/sec Loss 5.5714 LearningRate 0.0569 Epoch: 19 Global Step: 49910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:28,826-Speed 12840.22 samples/sec Loss 5.5669 LearningRate 0.0569 Epoch: 19 Global Step: 49920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:30,408-Speed 12958.78 samples/sec Loss 5.5232 LearningRate 0.0568 Epoch: 19 Global Step: 49930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:08:32,002-Speed 12850.97 samples/sec Loss 5.5628 LearningRate 0.0568 Epoch: 19 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:33,574-Speed 13041.59 samples/sec Loss 5.4946 LearningRate 0.0568 Epoch: 19 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:35,130-Speed 13168.01 samples/sec Loss 5.6443 LearningRate 0.0568 Epoch: 19 Global Step: 49960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:36,706-Speed 12996.98 samples/sec Loss 5.5646 LearningRate 0.0567 Epoch: 19 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:38,336-Speed 12574.68 samples/sec Loss 5.5679 LearningRate 0.0567 Epoch: 19 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:39,923-Speed 12922.32 samples/sec Loss 5.5996 LearningRate 0.0567 Epoch: 19 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:08:41,486-Speed 13104.43 samples/sec Loss 5.5486 LearningRate 0.0567 Epoch: 19 Global Step: 50000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:09:04,414-[lfw][50000]XNorm: 9.768951 Training: 2022-01-14 16:09:04,414-[lfw][50000]Accuracy-Flip: 0.99583+-0.00359 Training: 2022-01-14 16:09:04,415-[lfw][50000]Accuracy-Highest: 0.99583 Training: 2022-01-14 16:09:30,335-[cfp_fp][50000]XNorm: 8.214686 Training: 2022-01-14 16:09:30,336-[cfp_fp][50000]Accuracy-Flip: 0.95943+-0.01168 Training: 2022-01-14 16:09:30,336-[cfp_fp][50000]Accuracy-Highest: 0.95943 Training: 2022-01-14 16:09:52,477-[agedb_30][50000]XNorm: 9.468005 Training: 2022-01-14 16:09:52,477-[agedb_30][50000]Accuracy-Flip: 0.95967+-0.00823 Training: 2022-01-14 16:09:52,478-[agedb_30][50000]Accuracy-Highest: 0.96567 Training: 2022-01-14 16:09:54,036-Speed 282.29 samples/sec Loss 5.5360 LearningRate 0.0567 Epoch: 19 Global Step: 50010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:09:55,605-Speed 13057.99 samples/sec Loss 5.6842 LearningRate 0.0566 Epoch: 19 Global Step: 50020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:09:57,166-Speed 13126.39 samples/sec Loss 5.5619 LearningRate 0.0566 Epoch: 19 Global Step: 50030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:09:58,747-Speed 12958.68 samples/sec Loss 5.5491 LearningRate 0.0566 Epoch: 19 Global Step: 50040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:00,310-Speed 13115.29 samples/sec Loss 5.5532 LearningRate 0.0566 Epoch: 19 Global Step: 50050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:01,855-Speed 13264.68 samples/sec Loss 5.6200 LearningRate 0.0565 Epoch: 19 Global Step: 50060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:03,429-Speed 13019.00 samples/sec Loss 5.5760 LearningRate 0.0565 Epoch: 19 Global Step: 50070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:05,010-Speed 12961.09 samples/sec Loss 5.5445 LearningRate 0.0565 Epoch: 19 Global Step: 50080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:06,579-Speed 13059.76 samples/sec Loss 5.5387 LearningRate 0.0565 Epoch: 19 Global Step: 50090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:08,141-Speed 13121.73 samples/sec Loss 5.5720 LearningRate 0.0565 Epoch: 19 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:09,719-Speed 12977.38 samples/sec Loss 5.4742 LearningRate 0.0564 Epoch: 19 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:11,274-Speed 13179.21 samples/sec Loss 5.6141 LearningRate 0.0564 Epoch: 19 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:12,864-Speed 12891.66 samples/sec Loss 5.5502 LearningRate 0.0564 Epoch: 19 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:14,436-Speed 13034.08 samples/sec Loss 5.5399 LearningRate 0.0564 Epoch: 19 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:15,995-Speed 13137.45 samples/sec Loss 5.5668 LearningRate 0.0563 Epoch: 19 Global Step: 50150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:17,594-Speed 12816.72 samples/sec Loss 5.5791 LearningRate 0.0563 Epoch: 19 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:19,154-Speed 13133.60 samples/sec Loss 5.6367 LearningRate 0.0563 Epoch: 19 Global Step: 50170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:20,723-Speed 13063.05 samples/sec Loss 5.5504 LearningRate 0.0563 Epoch: 19 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:22,306-Speed 12942.92 samples/sec Loss 5.5679 LearningRate 0.0563 Epoch: 19 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:23,894-Speed 12901.09 samples/sec Loss 5.5760 LearningRate 0.0562 Epoch: 19 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:25,496-Speed 12789.16 samples/sec Loss 5.5224 LearningRate 0.0562 Epoch: 19 Global Step: 50210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:27,055-Speed 13145.49 samples/sec Loss 5.5776 LearningRate 0.0562 Epoch: 19 Global Step: 50220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:28,631-Speed 13009.15 samples/sec Loss 5.5578 LearningRate 0.0562 Epoch: 19 Global Step: 50230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:30,197-Speed 13083.06 samples/sec Loss 5.5151 LearningRate 0.0561 Epoch: 19 Global Step: 50240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 16:10:31,779-Speed 12953.93 samples/sec Loss 5.5466 LearningRate 0.0561 Epoch: 19 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:33,338-Speed 13150.95 samples/sec Loss 5.5360 LearningRate 0.0561 Epoch: 19 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:34,933-Speed 12859.58 samples/sec Loss 5.5275 LearningRate 0.0561 Epoch: 19 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:36,509-Speed 13008.61 samples/sec Loss 5.5946 LearningRate 0.0561 Epoch: 19 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:38,065-Speed 13168.09 samples/sec Loss 5.5348 LearningRate 0.0560 Epoch: 19 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:39,608-Speed 13275.17 samples/sec Loss 5.6094 LearningRate 0.0560 Epoch: 19 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:41,193-Speed 12933.12 samples/sec Loss 5.6400 LearningRate 0.0560 Epoch: 19 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:42,804-Speed 12741.24 samples/sec Loss 5.5607 LearningRate 0.0560 Epoch: 19 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:44,396-Speed 12874.15 samples/sec Loss 5.5688 LearningRate 0.0560 Epoch: 19 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:45,955-Speed 13139.27 samples/sec Loss 5.6005 LearningRate 0.0559 Epoch: 19 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 16:10:47,540-Speed 12929.80 samples/sec Loss 5.6412 LearningRate 0.0559 Epoch: 19 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 16:10:49,127-Speed 12913.73 samples/sec Loss 5.6315 LearningRate 0.0559 Epoch: 19 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:10:50,712-Speed 12925.52 samples/sec Loss 5.4868 LearningRate 0.0559 Epoch: 19 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:10:52,292-Speed 12972.27 samples/sec Loss 5.5421 LearningRate 0.0558 Epoch: 19 Global Step: 50380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:10:53,874-Speed 12954.12 samples/sec Loss 5.6099 LearningRate 0.0558 Epoch: 19 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:10:55,448-Speed 13017.29 samples/sec Loss 5.5121 LearningRate 0.0558 Epoch: 19 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:10:57,012-Speed 13096.37 samples/sec Loss 5.5533 LearningRate 0.0558 Epoch: 19 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:10:58,582-Speed 13057.63 samples/sec Loss 5.5916 LearningRate 0.0558 Epoch: 19 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:11:00,142-Speed 13131.32 samples/sec Loss 5.5651 LearningRate 0.0557 Epoch: 19 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:01,732-Speed 12888.98 samples/sec Loss 5.5077 LearningRate 0.0557 Epoch: 19 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:03,283-Speed 13209.02 samples/sec Loss 5.6030 LearningRate 0.0557 Epoch: 19 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:04,839-Speed 13176.44 samples/sec Loss 5.5549 LearningRate 0.0557 Epoch: 19 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:06,398-Speed 13138.60 samples/sec Loss 5.5256 LearningRate 0.0556 Epoch: 19 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:08,004-Speed 12761.16 samples/sec Loss 5.5622 LearningRate 0.0556 Epoch: 19 Global Step: 50480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:09,564-Speed 13133.93 samples/sec Loss 5.5687 LearningRate 0.0556 Epoch: 19 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:11,148-Speed 12941.31 samples/sec Loss 5.6041 LearningRate 0.0556 Epoch: 19 Global Step: 50500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:12,723-Speed 13013.85 samples/sec Loss 5.5449 LearningRate 0.0556 Epoch: 19 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:14,299-Speed 13000.46 samples/sec Loss 5.5696 LearningRate 0.0555 Epoch: 19 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:15,876-Speed 12991.35 samples/sec Loss 5.4570 LearningRate 0.0555 Epoch: 19 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:11:17,438-Speed 13120.51 samples/sec Loss 5.6101 LearningRate 0.0555 Epoch: 19 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:11:19,004-Speed 13082.68 samples/sec Loss 5.6098 LearningRate 0.0555 Epoch: 19 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:11:20,570-Speed 13077.83 samples/sec Loss 5.6375 LearningRate 0.0554 Epoch: 19 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:11:22,218-Speed 12430.91 samples/sec Loss 5.5557 LearningRate 0.0554 Epoch: 19 Global Step: 50570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:23,710-Speed 13738.74 samples/sec Loss 5.6616 LearningRate 0.0554 Epoch: 19 Global Step: 50580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:39,562-Speed 1292.09 samples/sec Loss 4.8237 LearningRate 0.0554 Epoch: 20 Global Step: 50590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:41,157-Speed 12851.17 samples/sec Loss 4.8015 LearningRate 0.0554 Epoch: 20 Global Step: 50600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:42,747-Speed 12889.53 samples/sec Loss 4.7015 LearningRate 0.0553 Epoch: 20 Global Step: 50610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:44,308-Speed 13125.41 samples/sec Loss 4.7335 LearningRate 0.0553 Epoch: 20 Global Step: 50620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:45,920-Speed 12708.37 samples/sec Loss 4.6949 LearningRate 0.0553 Epoch: 20 Global Step: 50630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:47,506-Speed 12922.08 samples/sec Loss 4.8034 LearningRate 0.0553 Epoch: 20 Global Step: 50640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:49,059-Speed 13194.52 samples/sec Loss 4.8827 LearningRate 0.0552 Epoch: 20 Global Step: 50650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:50,639-Speed 12969.13 samples/sec Loss 4.8027 LearningRate 0.0552 Epoch: 20 Global Step: 50660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:52,211-Speed 13034.25 samples/sec Loss 4.7627 LearningRate 0.0552 Epoch: 20 Global Step: 50670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:53,793-Speed 12955.52 samples/sec Loss 4.7446 LearningRate 0.0552 Epoch: 20 Global Step: 50680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:11:55,346-Speed 13194.42 samples/sec Loss 4.7971 LearningRate 0.0552 Epoch: 20 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:56,932-Speed 12915.75 samples/sec Loss 4.7944 LearningRate 0.0551 Epoch: 20 Global Step: 50700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:11:58,500-Speed 13073.07 samples/sec Loss 4.7699 LearningRate 0.0551 Epoch: 20 Global Step: 50710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:00,067-Speed 13078.99 samples/sec Loss 4.8148 LearningRate 0.0551 Epoch: 20 Global Step: 50720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:01,634-Speed 13078.49 samples/sec Loss 4.9100 LearningRate 0.0551 Epoch: 20 Global Step: 50730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:03,194-Speed 13141.00 samples/sec Loss 4.9949 LearningRate 0.0551 Epoch: 20 Global Step: 50740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:04,747-Speed 13188.27 samples/sec Loss 4.8980 LearningRate 0.0550 Epoch: 20 Global Step: 50750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:06,290-Speed 13284.36 samples/sec Loss 4.9220 LearningRate 0.0550 Epoch: 20 Global Step: 50760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:07,890-Speed 12806.42 samples/sec Loss 4.9448 LearningRate 0.0550 Epoch: 20 Global Step: 50770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:09,458-Speed 13069.04 samples/sec Loss 4.8520 LearningRate 0.0550 Epoch: 20 Global Step: 50780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:11,023-Speed 13091.65 samples/sec Loss 4.9807 LearningRate 0.0549 Epoch: 20 Global Step: 50790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:12,598-Speed 13015.97 samples/sec Loss 4.8873 LearningRate 0.0549 Epoch: 20 Global Step: 50800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:14,175-Speed 12987.57 samples/sec Loss 4.9795 LearningRate 0.0549 Epoch: 20 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:15,749-Speed 13015.00 samples/sec Loss 4.9716 LearningRate 0.0549 Epoch: 20 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:17,335-Speed 12927.86 samples/sec Loss 4.9742 LearningRate 0.0549 Epoch: 20 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:18,907-Speed 13032.68 samples/sec Loss 4.9337 LearningRate 0.0548 Epoch: 20 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:20,485-Speed 12987.69 samples/sec Loss 4.9471 LearningRate 0.0548 Epoch: 20 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:22,045-Speed 13131.71 samples/sec Loss 4.9796 LearningRate 0.0548 Epoch: 20 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:23,606-Speed 13129.43 samples/sec Loss 5.0117 LearningRate 0.0548 Epoch: 20 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:25,167-Speed 13123.00 samples/sec Loss 4.9419 LearningRate 0.0547 Epoch: 20 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:26,734-Speed 13077.60 samples/sec Loss 5.0359 LearningRate 0.0547 Epoch: 20 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:28,331-Speed 12832.95 samples/sec Loss 5.0125 LearningRate 0.0547 Epoch: 20 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:12:29,905-Speed 13019.14 samples/sec Loss 4.9545 LearningRate 0.0547 Epoch: 20 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:12:31,490-Speed 12927.85 samples/sec Loss 4.9814 LearningRate 0.0547 Epoch: 20 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:12:33,052-Speed 13115.76 samples/sec Loss 4.9964 LearningRate 0.0546 Epoch: 20 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:12:34,590-Speed 13318.83 samples/sec Loss 5.0197 LearningRate 0.0546 Epoch: 20 Global Step: 50940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:36,162-Speed 13037.42 samples/sec Loss 5.1154 LearningRate 0.0546 Epoch: 20 Global Step: 50950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:37,704-Speed 13323.40 samples/sec Loss 5.1495 LearningRate 0.0546 Epoch: 20 Global Step: 50960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:39,267-Speed 13107.61 samples/sec Loss 5.0298 LearningRate 0.0546 Epoch: 20 Global Step: 50970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:40,839-Speed 13032.99 samples/sec Loss 5.0733 LearningRate 0.0545 Epoch: 20 Global Step: 50980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:42,404-Speed 13099.92 samples/sec Loss 5.0309 LearningRate 0.0545 Epoch: 20 Global Step: 50990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:43,971-Speed 13071.35 samples/sec Loss 5.0717 LearningRate 0.0545 Epoch: 20 Global Step: 51000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:45,569-Speed 12828.34 samples/sec Loss 5.0075 LearningRate 0.0545 Epoch: 20 Global Step: 51010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:47,114-Speed 13260.62 samples/sec Loss 5.0712 LearningRate 0.0544 Epoch: 20 Global Step: 51020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:48,675-Speed 13131.14 samples/sec Loss 5.0519 LearningRate 0.0544 Epoch: 20 Global Step: 51030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:50,259-Speed 12928.13 samples/sec Loss 5.0918 LearningRate 0.0544 Epoch: 20 Global Step: 51040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:51,839-Speed 12968.63 samples/sec Loss 5.0605 LearningRate 0.0544 Epoch: 20 Global Step: 51050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:12:53,403-Speed 13106.81 samples/sec Loss 5.1023 LearningRate 0.0544 Epoch: 20 Global Step: 51060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:54,983-Speed 12968.30 samples/sec Loss 5.1225 LearningRate 0.0543 Epoch: 20 Global Step: 51070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:56,550-Speed 13079.38 samples/sec Loss 5.1352 LearningRate 0.0543 Epoch: 20 Global Step: 51080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:58,117-Speed 13103.72 samples/sec Loss 5.0615 LearningRate 0.0543 Epoch: 20 Global Step: 51090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:12:59,671-Speed 13183.86 samples/sec Loss 5.1262 LearningRate 0.0543 Epoch: 20 Global Step: 51100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:01,251-Speed 12973.26 samples/sec Loss 5.1389 LearningRate 0.0542 Epoch: 20 Global Step: 51110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:02,823-Speed 13035.55 samples/sec Loss 5.1812 LearningRate 0.0542 Epoch: 20 Global Step: 51120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:04,393-Speed 13049.54 samples/sec Loss 5.1008 LearningRate 0.0542 Epoch: 20 Global Step: 51130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:06,000-Speed 12749.72 samples/sec Loss 5.0928 LearningRate 0.0542 Epoch: 20 Global Step: 51140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:07,566-Speed 13090.46 samples/sec Loss 5.1338 LearningRate 0.0542 Epoch: 20 Global Step: 51150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:09,148-Speed 12950.37 samples/sec Loss 5.1783 LearningRate 0.0541 Epoch: 20 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:10,722-Speed 13021.52 samples/sec Loss 5.2448 LearningRate 0.0541 Epoch: 20 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:12,304-Speed 12949.01 samples/sec Loss 5.1744 LearningRate 0.0541 Epoch: 20 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:13,899-Speed 12851.48 samples/sec Loss 5.1045 LearningRate 0.0541 Epoch: 20 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:15,460-Speed 13126.13 samples/sec Loss 5.2142 LearningRate 0.0541 Epoch: 20 Global Step: 51200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:17,023-Speed 13107.80 samples/sec Loss 5.1677 LearningRate 0.0540 Epoch: 20 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:18,580-Speed 13162.64 samples/sec Loss 5.2795 LearningRate 0.0540 Epoch: 20 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:20,140-Speed 13136.23 samples/sec Loss 5.2822 LearningRate 0.0540 Epoch: 20 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:21,707-Speed 13082.08 samples/sec Loss 5.2243 LearningRate 0.0540 Epoch: 20 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:23,283-Speed 13004.37 samples/sec Loss 5.1706 LearningRate 0.0539 Epoch: 20 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:24,856-Speed 13022.17 samples/sec Loss 5.2041 LearningRate 0.0539 Epoch: 20 Global Step: 51260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:13:26,408-Speed 13199.43 samples/sec Loss 5.2853 LearningRate 0.0539 Epoch: 20 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:13:27,977-Speed 13061.34 samples/sec Loss 5.2783 LearningRate 0.0539 Epoch: 20 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:13:29,519-Speed 13294.36 samples/sec Loss 5.2059 LearningRate 0.0539 Epoch: 20 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:31,082-Speed 13105.25 samples/sec Loss 5.2511 LearningRate 0.0538 Epoch: 20 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:32,652-Speed 13059.57 samples/sec Loss 5.2694 LearningRate 0.0538 Epoch: 20 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:34,206-Speed 13177.96 samples/sec Loss 5.2920 LearningRate 0.0538 Epoch: 20 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:35,771-Speed 13096.27 samples/sec Loss 5.2714 LearningRate 0.0538 Epoch: 20 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:37,346-Speed 13007.80 samples/sec Loss 5.2283 LearningRate 0.0537 Epoch: 20 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:38,905-Speed 13152.67 samples/sec Loss 5.2668 LearningRate 0.0537 Epoch: 20 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:40,471-Speed 13076.19 samples/sec Loss 5.3123 LearningRate 0.0537 Epoch: 20 Global Step: 51360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:42,057-Speed 12920.70 samples/sec Loss 5.2479 LearningRate 0.0537 Epoch: 20 Global Step: 51370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:43,638-Speed 12967.45 samples/sec Loss 5.2449 LearningRate 0.0537 Epoch: 20 Global Step: 51380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:45,198-Speed 13132.13 samples/sec Loss 5.2237 LearningRate 0.0536 Epoch: 20 Global Step: 51390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:13:46,751-Speed 13191.37 samples/sec Loss 5.4033 LearningRate 0.0536 Epoch: 20 Global Step: 51400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:13:48,303-Speed 13207.72 samples/sec Loss 5.2758 LearningRate 0.0536 Epoch: 20 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:49,869-Speed 13081.58 samples/sec Loss 5.2475 LearningRate 0.0536 Epoch: 20 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:13:51,423-Speed 13184.26 samples/sec Loss 5.3393 LearningRate 0.0536 Epoch: 20 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:52,986-Speed 13115.32 samples/sec Loss 5.3225 LearningRate 0.0535 Epoch: 20 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:54,581-Speed 12845.97 samples/sec Loss 5.2261 LearningRate 0.0535 Epoch: 20 Global Step: 51450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:56,137-Speed 13169.02 samples/sec Loss 5.2739 LearningRate 0.0535 Epoch: 20 Global Step: 51460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:57,705-Speed 13073.35 samples/sec Loss 5.2167 LearningRate 0.0535 Epoch: 20 Global Step: 51470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:13:59,290-Speed 12927.30 samples/sec Loss 5.3068 LearningRate 0.0534 Epoch: 20 Global Step: 51480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:00,860-Speed 13046.53 samples/sec Loss 5.3529 LearningRate 0.0534 Epoch: 20 Global Step: 51490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:02,423-Speed 13106.01 samples/sec Loss 5.3021 LearningRate 0.0534 Epoch: 20 Global Step: 51500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:03,996-Speed 13028.27 samples/sec Loss 5.3109 LearningRate 0.0534 Epoch: 20 Global Step: 51510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:05,569-Speed 13027.71 samples/sec Loss 5.3017 LearningRate 0.0534 Epoch: 20 Global Step: 51520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:07,137-Speed 13071.68 samples/sec Loss 5.2420 LearningRate 0.0533 Epoch: 20 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:08,688-Speed 13214.59 samples/sec Loss 5.3481 LearningRate 0.0533 Epoch: 20 Global Step: 51540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:10,259-Speed 13043.96 samples/sec Loss 5.2752 LearningRate 0.0533 Epoch: 20 Global Step: 51550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:11,808-Speed 13223.86 samples/sec Loss 5.2956 LearningRate 0.0533 Epoch: 20 Global Step: 51560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:13,381-Speed 13030.33 samples/sec Loss 5.3942 LearningRate 0.0533 Epoch: 20 Global Step: 51570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:14,965-Speed 12933.36 samples/sec Loss 5.2305 LearningRate 0.0532 Epoch: 20 Global Step: 51580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:16,535-Speed 13049.55 samples/sec Loss 5.2656 LearningRate 0.0532 Epoch: 20 Global Step: 51590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:18,103-Speed 13072.26 samples/sec Loss 5.2580 LearningRate 0.0532 Epoch: 20 Global Step: 51600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:19,670-Speed 13077.96 samples/sec Loss 5.1795 LearningRate 0.0532 Epoch: 20 Global Step: 51610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:21,223-Speed 13194.74 samples/sec Loss 5.3108 LearningRate 0.0531 Epoch: 20 Global Step: 51620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:22,790-Speed 13081.47 samples/sec Loss 5.3633 LearningRate 0.0531 Epoch: 20 Global Step: 51630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:24,374-Speed 12931.80 samples/sec Loss 5.3227 LearningRate 0.0531 Epoch: 20 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:25,939-Speed 13094.31 samples/sec Loss 5.4593 LearningRate 0.0531 Epoch: 20 Global Step: 51650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:27,489-Speed 13218.12 samples/sec Loss 5.4224 LearningRate 0.0531 Epoch: 20 Global Step: 51660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:29,067-Speed 12987.54 samples/sec Loss 5.2997 LearningRate 0.0530 Epoch: 20 Global Step: 51670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:30,644-Speed 12989.83 samples/sec Loss 5.4126 LearningRate 0.0530 Epoch: 20 Global Step: 51680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:32,192-Speed 13241.62 samples/sec Loss 5.3329 LearningRate 0.0530 Epoch: 20 Global Step: 51690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:33,767-Speed 13005.34 samples/sec Loss 5.3067 LearningRate 0.0530 Epoch: 20 Global Step: 51700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:35,337-Speed 13057.63 samples/sec Loss 5.2696 LearningRate 0.0530 Epoch: 20 Global Step: 51710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:36,916-Speed 12970.58 samples/sec Loss 5.4136 LearningRate 0.0529 Epoch: 20 Global Step: 51720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:38,485-Speed 13068.89 samples/sec Loss 5.3190 LearningRate 0.0529 Epoch: 20 Global Step: 51730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:40,059-Speed 13018.88 samples/sec Loss 5.3463 LearningRate 0.0529 Epoch: 20 Global Step: 51740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:41,628-Speed 13059.73 samples/sec Loss 5.2869 LearningRate 0.0529 Epoch: 20 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:43,201-Speed 13031.79 samples/sec Loss 5.3759 LearningRate 0.0528 Epoch: 20 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:44,754-Speed 13185.88 samples/sec Loss 5.4060 LearningRate 0.0528 Epoch: 20 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:46,301-Speed 13244.37 samples/sec Loss 5.2567 LearningRate 0.0528 Epoch: 20 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:47,881-Speed 12971.16 samples/sec Loss 5.3305 LearningRate 0.0528 Epoch: 20 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:49,469-Speed 12906.70 samples/sec Loss 5.2889 LearningRate 0.0528 Epoch: 20 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:51,033-Speed 13098.26 samples/sec Loss 5.2916 LearningRate 0.0527 Epoch: 20 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:14:52,593-Speed 13157.57 samples/sec Loss 5.4109 LearningRate 0.0527 Epoch: 20 Global Step: 51820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:54,161-Speed 13064.00 samples/sec Loss 5.4316 LearningRate 0.0527 Epoch: 20 Global Step: 51830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:55,725-Speed 13101.96 samples/sec Loss 5.4078 LearningRate 0.0527 Epoch: 20 Global Step: 51840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:57,273-Speed 13231.57 samples/sec Loss 5.3178 LearningRate 0.0527 Epoch: 20 Global Step: 51850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:14:58,846-Speed 13029.51 samples/sec Loss 5.3183 LearningRate 0.0526 Epoch: 20 Global Step: 51860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:15:00,430-Speed 12933.56 samples/sec Loss 5.3015 LearningRate 0.0526 Epoch: 20 Global Step: 51870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:15:01,998-Speed 13072.90 samples/sec Loss 5.4310 LearningRate 0.0526 Epoch: 20 Global Step: 51880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:15:03,565-Speed 13076.62 samples/sec Loss 5.3386 LearningRate 0.0526 Epoch: 20 Global Step: 51890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:15:05,133-Speed 13070.84 samples/sec Loss 5.4005 LearningRate 0.0525 Epoch: 20 Global Step: 51900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:15:06,693-Speed 13133.01 samples/sec Loss 5.3195 LearningRate 0.0525 Epoch: 20 Global Step: 51910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:15:08,262-Speed 13056.55 samples/sec Loss 5.3182 LearningRate 0.0525 Epoch: 20 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:09,809-Speed 13245.67 samples/sec Loss 5.3856 LearningRate 0.0525 Epoch: 20 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:11,378-Speed 13061.05 samples/sec Loss 5.4456 LearningRate 0.0525 Epoch: 20 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:12,948-Speed 13050.03 samples/sec Loss 5.3336 LearningRate 0.0524 Epoch: 20 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:14,515-Speed 13081.76 samples/sec Loss 5.3513 LearningRate 0.0524 Epoch: 20 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:16,105-Speed 12880.16 samples/sec Loss 5.4427 LearningRate 0.0524 Epoch: 20 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:17,681-Speed 13003.06 samples/sec Loss 5.4471 LearningRate 0.0524 Epoch: 20 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:19,244-Speed 13114.63 samples/sec Loss 5.3937 LearningRate 0.0524 Epoch: 20 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:20,798-Speed 13185.00 samples/sec Loss 5.4123 LearningRate 0.0523 Epoch: 20 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:22,373-Speed 13007.11 samples/sec Loss 5.4084 LearningRate 0.0523 Epoch: 20 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:23,945-Speed 13037.52 samples/sec Loss 5.4605 LearningRate 0.0523 Epoch: 20 Global Step: 52020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:15:25,514-Speed 13058.92 samples/sec Loss 5.3593 LearningRate 0.0523 Epoch: 20 Global Step: 52030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:15:27,056-Speed 13288.75 samples/sec Loss 5.4648 LearningRate 0.0522 Epoch: 20 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:28,622-Speed 13112.90 samples/sec Loss 5.4195 LearningRate 0.0522 Epoch: 20 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:30,188-Speed 13087.46 samples/sec Loss 5.3737 LearningRate 0.0522 Epoch: 20 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:31,764-Speed 13002.23 samples/sec Loss 5.3284 LearningRate 0.0522 Epoch: 20 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:33,341-Speed 12992.60 samples/sec Loss 5.4491 LearningRate 0.0522 Epoch: 20 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:34,907-Speed 13079.83 samples/sec Loss 5.3935 LearningRate 0.0521 Epoch: 20 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:36,457-Speed 13223.54 samples/sec Loss 5.4817 LearningRate 0.0521 Epoch: 20 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:38,029-Speed 13038.23 samples/sec Loss 5.3960 LearningRate 0.0521 Epoch: 20 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:39,583-Speed 13178.82 samples/sec Loss 5.4890 LearningRate 0.0521 Epoch: 20 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:41,148-Speed 13094.26 samples/sec Loss 5.5114 LearningRate 0.0521 Epoch: 20 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:42,713-Speed 13108.77 samples/sec Loss 5.3290 LearningRate 0.0520 Epoch: 20 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:15:44,297-Speed 12930.28 samples/sec Loss 5.4381 LearningRate 0.0520 Epoch: 20 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:15:45,856-Speed 13142.50 samples/sec Loss 5.3921 LearningRate 0.0520 Epoch: 20 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:15:47,402-Speed 13255.14 samples/sec Loss 5.3675 LearningRate 0.0520 Epoch: 20 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:15:48,986-Speed 12941.19 samples/sec Loss 5.4166 LearningRate 0.0520 Epoch: 20 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:50,555-Speed 13052.52 samples/sec Loss 5.4008 LearningRate 0.0519 Epoch: 20 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:52,124-Speed 13064.18 samples/sec Loss 5.4519 LearningRate 0.0519 Epoch: 20 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:53,679-Speed 13175.38 samples/sec Loss 5.4949 LearningRate 0.0519 Epoch: 20 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:55,245-Speed 13087.23 samples/sec Loss 5.4430 LearningRate 0.0519 Epoch: 20 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:56,825-Speed 12967.88 samples/sec Loss 5.4750 LearningRate 0.0518 Epoch: 20 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:15:58,433-Speed 12743.33 samples/sec Loss 5.4199 LearningRate 0.0518 Epoch: 20 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:00,003-Speed 13052.72 samples/sec Loss 5.4439 LearningRate 0.0518 Epoch: 20 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:01,588-Speed 12920.79 samples/sec Loss 5.3942 LearningRate 0.0518 Epoch: 20 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:03,152-Speed 13109.97 samples/sec Loss 5.4995 LearningRate 0.0518 Epoch: 20 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:04,710-Speed 13148.95 samples/sec Loss 5.4945 LearningRate 0.0517 Epoch: 20 Global Step: 52280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:06,285-Speed 13009.10 samples/sec Loss 5.3650 LearningRate 0.0517 Epoch: 20 Global Step: 52290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:07,839-Speed 13186.35 samples/sec Loss 5.3862 LearningRate 0.0517 Epoch: 20 Global Step: 52300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:09,418-Speed 12974.68 samples/sec Loss 5.4253 LearningRate 0.0517 Epoch: 20 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:10,985-Speed 13078.47 samples/sec Loss 5.4879 LearningRate 0.0517 Epoch: 20 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:12,562-Speed 12997.93 samples/sec Loss 5.4892 LearningRate 0.0516 Epoch: 20 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:14,145-Speed 12948.10 samples/sec Loss 5.4109 LearningRate 0.0516 Epoch: 20 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:15,726-Speed 12963.78 samples/sec Loss 5.3795 LearningRate 0.0516 Epoch: 20 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:17,322-Speed 12837.72 samples/sec Loss 5.3755 LearningRate 0.0516 Epoch: 20 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:18,889-Speed 13073.44 samples/sec Loss 5.4687 LearningRate 0.0515 Epoch: 20 Global Step: 52370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:20,493-Speed 12777.26 samples/sec Loss 5.3179 LearningRate 0.0515 Epoch: 20 Global Step: 52380 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-14 16:16:22,056-Speed 13112.42 samples/sec Loss 5.4803 LearningRate 0.0515 Epoch: 20 Global Step: 52390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:23,617-Speed 13128.81 samples/sec Loss 5.5272 LearningRate 0.0515 Epoch: 20 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:25,174-Speed 13158.83 samples/sec Loss 5.4046 LearningRate 0.0515 Epoch: 20 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:26,728-Speed 13182.98 samples/sec Loss 5.4376 LearningRate 0.0514 Epoch: 20 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:28,294-Speed 13083.52 samples/sec Loss 5.4878 LearningRate 0.0514 Epoch: 20 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:29,873-Speed 12979.16 samples/sec Loss 5.4342 LearningRate 0.0514 Epoch: 20 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:31,455-Speed 12946.85 samples/sec Loss 5.4119 LearningRate 0.0514 Epoch: 20 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:33,016-Speed 13134.22 samples/sec Loss 5.4660 LearningRate 0.0514 Epoch: 20 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:34,598-Speed 12953.25 samples/sec Loss 5.4046 LearningRate 0.0513 Epoch: 20 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:36,169-Speed 13043.36 samples/sec Loss 5.4251 LearningRate 0.0513 Epoch: 20 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:37,732-Speed 13108.32 samples/sec Loss 5.5547 LearningRate 0.0513 Epoch: 20 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:39,297-Speed 13090.90 samples/sec Loss 5.4911 LearningRate 0.0513 Epoch: 20 Global Step: 52500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:40,857-Speed 13133.23 samples/sec Loss 5.4078 LearningRate 0.0513 Epoch: 20 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:42,420-Speed 13106.63 samples/sec Loss 5.4749 LearningRate 0.0512 Epoch: 20 Global Step: 52520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:43,985-Speed 13097.87 samples/sec Loss 5.4663 LearningRate 0.0512 Epoch: 20 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:45,539-Speed 13187.08 samples/sec Loss 5.5086 LearningRate 0.0512 Epoch: 20 Global Step: 52540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:47,127-Speed 12901.75 samples/sec Loss 5.4777 LearningRate 0.0512 Epoch: 20 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:48,694-Speed 13074.26 samples/sec Loss 5.3723 LearningRate 0.0511 Epoch: 20 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:50,274-Speed 12972.92 samples/sec Loss 5.3458 LearningRate 0.0511 Epoch: 20 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:51,845-Speed 13038.64 samples/sec Loss 5.5476 LearningRate 0.0511 Epoch: 20 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:53,412-Speed 13088.48 samples/sec Loss 5.4707 LearningRate 0.0511 Epoch: 20 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:54,979-Speed 13073.59 samples/sec Loss 5.4570 LearningRate 0.0511 Epoch: 20 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:56,543-Speed 13102.72 samples/sec Loss 5.4264 LearningRate 0.0510 Epoch: 20 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:16:58,101-Speed 13149.52 samples/sec Loss 5.4132 LearningRate 0.0510 Epoch: 20 Global Step: 52620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:16:59,712-Speed 12719.89 samples/sec Loss 5.4729 LearningRate 0.0510 Epoch: 20 Global Step: 52630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:17:01,258-Speed 13252.96 samples/sec Loss 5.4898 LearningRate 0.0510 Epoch: 20 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:02,845-Speed 12911.43 samples/sec Loss 5.5015 LearningRate 0.0510 Epoch: 20 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:04,416-Speed 13044.57 samples/sec Loss 5.4286 LearningRate 0.0509 Epoch: 20 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:06,000-Speed 12928.82 samples/sec Loss 5.5006 LearningRate 0.0509 Epoch: 20 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:07,571-Speed 13047.21 samples/sec Loss 5.3548 LearningRate 0.0509 Epoch: 20 Global Step: 52680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:09,144-Speed 13025.30 samples/sec Loss 5.5122 LearningRate 0.0509 Epoch: 20 Global Step: 52690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:10,711-Speed 13075.83 samples/sec Loss 5.5282 LearningRate 0.0509 Epoch: 20 Global Step: 52700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:12,297-Speed 12924.70 samples/sec Loss 5.4313 LearningRate 0.0508 Epoch: 20 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:13,865-Speed 13070.21 samples/sec Loss 5.5081 LearningRate 0.0508 Epoch: 20 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:15,426-Speed 13122.04 samples/sec Loss 5.3326 LearningRate 0.0508 Epoch: 20 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:16,992-Speed 13086.49 samples/sec Loss 5.4417 LearningRate 0.0508 Epoch: 20 Global Step: 52740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:18,567-Speed 13010.67 samples/sec Loss 5.4634 LearningRate 0.0507 Epoch: 20 Global Step: 52750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:20,127-Speed 13130.02 samples/sec Loss 5.4645 LearningRate 0.0507 Epoch: 20 Global Step: 52760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:21,701-Speed 13024.10 samples/sec Loss 5.4128 LearningRate 0.0507 Epoch: 20 Global Step: 52770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:23,300-Speed 12814.52 samples/sec Loss 5.3846 LearningRate 0.0507 Epoch: 20 Global Step: 52780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:24,878-Speed 12986.64 samples/sec Loss 5.4108 LearningRate 0.0507 Epoch: 20 Global Step: 52790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:26,451-Speed 13027.59 samples/sec Loss 5.4514 LearningRate 0.0506 Epoch: 20 Global Step: 52800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:28,025-Speed 13017.04 samples/sec Loss 5.5227 LearningRate 0.0506 Epoch: 20 Global Step: 52810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:29,607-Speed 12958.43 samples/sec Loss 5.4836 LearningRate 0.0506 Epoch: 20 Global Step: 52820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:31,197-Speed 12888.54 samples/sec Loss 5.4474 LearningRate 0.0506 Epoch: 20 Global Step: 52830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:32,778-Speed 12965.67 samples/sec Loss 5.5068 LearningRate 0.0506 Epoch: 20 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:34,348-Speed 13054.27 samples/sec Loss 5.5197 LearningRate 0.0505 Epoch: 20 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:35,908-Speed 13130.26 samples/sec Loss 5.3989 LearningRate 0.0505 Epoch: 20 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:37,474-Speed 13083.75 samples/sec Loss 5.5347 LearningRate 0.0505 Epoch: 20 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:39,063-Speed 12896.13 samples/sec Loss 5.5188 LearningRate 0.0505 Epoch: 20 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:40,634-Speed 13045.27 samples/sec Loss 5.4709 LearningRate 0.0505 Epoch: 20 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:42,201-Speed 13074.21 samples/sec Loss 5.5601 LearningRate 0.0504 Epoch: 20 Global Step: 52900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:43,767-Speed 13087.74 samples/sec Loss 5.4529 LearningRate 0.0504 Epoch: 20 Global Step: 52910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:45,333-Speed 13079.65 samples/sec Loss 5.4308 LearningRate 0.0504 Epoch: 20 Global Step: 52920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:46,911-Speed 12991.98 samples/sec Loss 5.5599 LearningRate 0.0504 Epoch: 20 Global Step: 52930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:48,477-Speed 13085.51 samples/sec Loss 5.4520 LearningRate 0.0504 Epoch: 20 Global Step: 52940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:50,043-Speed 13084.04 samples/sec Loss 5.4710 LearningRate 0.0503 Epoch: 20 Global Step: 52950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:51,601-Speed 13148.73 samples/sec Loss 5.4293 LearningRate 0.0503 Epoch: 20 Global Step: 52960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:53,176-Speed 13006.65 samples/sec Loss 5.3509 LearningRate 0.0503 Epoch: 20 Global Step: 52970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:54,731-Speed 13182.37 samples/sec Loss 5.4258 LearningRate 0.0503 Epoch: 20 Global Step: 52980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:56,293-Speed 13113.12 samples/sec Loss 5.4847 LearningRate 0.0502 Epoch: 20 Global Step: 52990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:17:57,866-Speed 13032.96 samples/sec Loss 5.4258 LearningRate 0.0502 Epoch: 20 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:17:59,435-Speed 13057.71 samples/sec Loss 5.5093 LearningRate 0.0502 Epoch: 20 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:00,989-Speed 13180.63 samples/sec Loss 5.4454 LearningRate 0.0502 Epoch: 20 Global Step: 53020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:02,558-Speed 13060.16 samples/sec Loss 5.4778 LearningRate 0.0502 Epoch: 20 Global Step: 53030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:04,111-Speed 13204.59 samples/sec Loss 5.4671 LearningRate 0.0501 Epoch: 20 Global Step: 53040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:05,666-Speed 13176.01 samples/sec Loss 5.4708 LearningRate 0.0501 Epoch: 20 Global Step: 53050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:07,211-Speed 13258.21 samples/sec Loss 5.5068 LearningRate 0.0501 Epoch: 20 Global Step: 53060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:08,771-Speed 13143.57 samples/sec Loss 5.4134 LearningRate 0.0501 Epoch: 20 Global Step: 53070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:10,353-Speed 12952.19 samples/sec Loss 5.3926 LearningRate 0.0501 Epoch: 20 Global Step: 53080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:11,909-Speed 13166.71 samples/sec Loss 5.4603 LearningRate 0.0500 Epoch: 20 Global Step: 53090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:13,494-Speed 12929.63 samples/sec Loss 5.5332 LearningRate 0.0500 Epoch: 20 Global Step: 53100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:24,780-Speed 1814.92 samples/sec Loss 5.3820 LearningRate 0.0500 Epoch: 21 Global Step: 53110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:26,394-Speed 12700.05 samples/sec Loss 4.7001 LearningRate 0.0500 Epoch: 21 Global Step: 53120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:28,018-Speed 12614.42 samples/sec Loss 4.7217 LearningRate 0.0500 Epoch: 21 Global Step: 53130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:29,600-Speed 12953.07 samples/sec Loss 4.7342 LearningRate 0.0499 Epoch: 21 Global Step: 53140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:31,190-Speed 12886.70 samples/sec Loss 4.6880 LearningRate 0.0499 Epoch: 21 Global Step: 53150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:32,774-Speed 12947.91 samples/sec Loss 4.6064 LearningRate 0.0499 Epoch: 21 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:34,344-Speed 13048.47 samples/sec Loss 4.7166 LearningRate 0.0499 Epoch: 21 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:35,926-Speed 12950.94 samples/sec Loss 4.8118 LearningRate 0.0499 Epoch: 21 Global Step: 53180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:37,485-Speed 13144.87 samples/sec Loss 4.7637 LearningRate 0.0498 Epoch: 21 Global Step: 53190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:39,046-Speed 13124.63 samples/sec Loss 4.6996 LearningRate 0.0498 Epoch: 21 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:40,624-Speed 12984.21 samples/sec Loss 4.7329 LearningRate 0.0498 Epoch: 21 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:42,204-Speed 12991.81 samples/sec Loss 4.7191 LearningRate 0.0498 Epoch: 21 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:43,789-Speed 12925.50 samples/sec Loss 4.7616 LearningRate 0.0497 Epoch: 21 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:45,348-Speed 13142.02 samples/sec Loss 4.7326 LearningRate 0.0497 Epoch: 21 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:46,916-Speed 13074.44 samples/sec Loss 4.7669 LearningRate 0.0497 Epoch: 21 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:18:48,468-Speed 13206.57 samples/sec Loss 4.7005 LearningRate 0.0497 Epoch: 21 Global Step: 53260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:50,059-Speed 12871.77 samples/sec Loss 4.7900 LearningRate 0.0497 Epoch: 21 Global Step: 53270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:51,623-Speed 13101.63 samples/sec Loss 4.7587 LearningRate 0.0496 Epoch: 21 Global Step: 53280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:53,210-Speed 12912.50 samples/sec Loss 4.7568 LearningRate 0.0496 Epoch: 21 Global Step: 53290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:54,783-Speed 13030.27 samples/sec Loss 4.8773 LearningRate 0.0496 Epoch: 21 Global Step: 53300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:56,343-Speed 13132.89 samples/sec Loss 4.7924 LearningRate 0.0496 Epoch: 21 Global Step: 53310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:57,919-Speed 13037.87 samples/sec Loss 4.9114 LearningRate 0.0496 Epoch: 21 Global Step: 53320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:18:59,492-Speed 13027.66 samples/sec Loss 4.8719 LearningRate 0.0495 Epoch: 21 Global Step: 53330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:01,068-Speed 12996.60 samples/sec Loss 4.7321 LearningRate 0.0495 Epoch: 21 Global Step: 53340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:02,620-Speed 13200.96 samples/sec Loss 4.8108 LearningRate 0.0495 Epoch: 21 Global Step: 53350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:04,179-Speed 13151.06 samples/sec Loss 4.9242 LearningRate 0.0495 Epoch: 21 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:05,739-Speed 13133.47 samples/sec Loss 4.8998 LearningRate 0.0495 Epoch: 21 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:07,299-Speed 13127.92 samples/sec Loss 4.8630 LearningRate 0.0494 Epoch: 21 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:08,887-Speed 12911.94 samples/sec Loss 4.8254 LearningRate 0.0494 Epoch: 21 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:10,451-Speed 13098.19 samples/sec Loss 4.9090 LearningRate 0.0494 Epoch: 21 Global Step: 53400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:12,030-Speed 12982.00 samples/sec Loss 4.9502 LearningRate 0.0494 Epoch: 21 Global Step: 53410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:13,600-Speed 13056.83 samples/sec Loss 4.8372 LearningRate 0.0494 Epoch: 21 Global Step: 53420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:15,186-Speed 12919.44 samples/sec Loss 4.8456 LearningRate 0.0493 Epoch: 21 Global Step: 53430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:16,805-Speed 12652.87 samples/sec Loss 4.9515 LearningRate 0.0493 Epoch: 21 Global Step: 53440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:18,377-Speed 13038.36 samples/sec Loss 4.8382 LearningRate 0.0493 Epoch: 21 Global Step: 53450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:19,938-Speed 13120.10 samples/sec Loss 4.9323 LearningRate 0.0493 Epoch: 21 Global Step: 53460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:21,508-Speed 13050.10 samples/sec Loss 4.9106 LearningRate 0.0493 Epoch: 21 Global Step: 53470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:23,106-Speed 12828.68 samples/sec Loss 5.0195 LearningRate 0.0492 Epoch: 21 Global Step: 53480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:24,681-Speed 13009.79 samples/sec Loss 4.9266 LearningRate 0.0492 Epoch: 21 Global Step: 53490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:26,249-Speed 13062.25 samples/sec Loss 4.9445 LearningRate 0.0492 Epoch: 21 Global Step: 53500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:27,813-Speed 13106.86 samples/sec Loss 5.0101 LearningRate 0.0492 Epoch: 21 Global Step: 53510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:29,366-Speed 13192.76 samples/sec Loss 4.8892 LearningRate 0.0491 Epoch: 21 Global Step: 53520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:30,933-Speed 13072.89 samples/sec Loss 5.0052 LearningRate 0.0491 Epoch: 21 Global Step: 53530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:32,517-Speed 12935.38 samples/sec Loss 4.9504 LearningRate 0.0491 Epoch: 21 Global Step: 53540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:34,085-Speed 13073.51 samples/sec Loss 5.0208 LearningRate 0.0491 Epoch: 21 Global Step: 53550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:35,651-Speed 13086.65 samples/sec Loss 4.9701 LearningRate 0.0491 Epoch: 21 Global Step: 53560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:37,220-Speed 13059.72 samples/sec Loss 4.9834 LearningRate 0.0490 Epoch: 21 Global Step: 53570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:38,796-Speed 13004.55 samples/sec Loss 4.9006 LearningRate 0.0490 Epoch: 21 Global Step: 53580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:40,373-Speed 12989.98 samples/sec Loss 5.0342 LearningRate 0.0490 Epoch: 21 Global Step: 53590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:41,954-Speed 12962.32 samples/sec Loss 4.9023 LearningRate 0.0490 Epoch: 21 Global Step: 53600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:43,528-Speed 13022.39 samples/sec Loss 5.0626 LearningRate 0.0490 Epoch: 21 Global Step: 53610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:45,117-Speed 12887.82 samples/sec Loss 5.1137 LearningRate 0.0489 Epoch: 21 Global Step: 53620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:46,698-Speed 12968.38 samples/sec Loss 5.0111 LearningRate 0.0489 Epoch: 21 Global Step: 53630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:48,269-Speed 13041.41 samples/sec Loss 4.9902 LearningRate 0.0489 Epoch: 21 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:49,832-Speed 13112.58 samples/sec Loss 5.0753 LearningRate 0.0489 Epoch: 21 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:51,399-Speed 13072.95 samples/sec Loss 5.1049 LearningRate 0.0489 Epoch: 21 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:19:52,956-Speed 13152.56 samples/sec Loss 4.9475 LearningRate 0.0488 Epoch: 21 Global Step: 53670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:54,522-Speed 13094.39 samples/sec Loss 5.0618 LearningRate 0.0488 Epoch: 21 Global Step: 53680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:56,090-Speed 13067.52 samples/sec Loss 5.0449 LearningRate 0.0488 Epoch: 21 Global Step: 53690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:57,651-Speed 13128.79 samples/sec Loss 5.0685 LearningRate 0.0488 Epoch: 21 Global Step: 53700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:19:59,210-Speed 13161.31 samples/sec Loss 5.0304 LearningRate 0.0488 Epoch: 21 Global Step: 53710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:20:00,762-Speed 13202.94 samples/sec Loss 4.9869 LearningRate 0.0487 Epoch: 21 Global Step: 53720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:20:02,318-Speed 13167.45 samples/sec Loss 5.0771 LearningRate 0.0487 Epoch: 21 Global Step: 53730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:20:03,878-Speed 13133.93 samples/sec Loss 5.0460 LearningRate 0.0487 Epoch: 21 Global Step: 53740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:20:05,449-Speed 13048.14 samples/sec Loss 5.0858 LearningRate 0.0487 Epoch: 21 Global Step: 53750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:20:07,036-Speed 12905.68 samples/sec Loss 5.0765 LearningRate 0.0487 Epoch: 21 Global Step: 53760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:20:08,611-Speed 13011.90 samples/sec Loss 5.0355 LearningRate 0.0486 Epoch: 21 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:10,167-Speed 13168.17 samples/sec Loss 5.0269 LearningRate 0.0486 Epoch: 21 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:11,738-Speed 13042.19 samples/sec Loss 5.0591 LearningRate 0.0486 Epoch: 21 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:13,293-Speed 13180.72 samples/sec Loss 5.1008 LearningRate 0.0486 Epoch: 21 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:14,848-Speed 13178.87 samples/sec Loss 5.0936 LearningRate 0.0486 Epoch: 21 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:16,422-Speed 13011.95 samples/sec Loss 5.1112 LearningRate 0.0485 Epoch: 21 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:17,993-Speed 13044.50 samples/sec Loss 5.1259 LearningRate 0.0485 Epoch: 21 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:19,564-Speed 13043.58 samples/sec Loss 5.1310 LearningRate 0.0485 Epoch: 21 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:21,147-Speed 12952.19 samples/sec Loss 5.0869 LearningRate 0.0485 Epoch: 21 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:22,716-Speed 13058.14 samples/sec Loss 5.1324 LearningRate 0.0484 Epoch: 21 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:24,290-Speed 13015.53 samples/sec Loss 5.0566 LearningRate 0.0484 Epoch: 21 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:25,841-Speed 13213.93 samples/sec Loss 5.1292 LearningRate 0.0484 Epoch: 21 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:27,444-Speed 12777.55 samples/sec Loss 5.1747 LearningRate 0.0484 Epoch: 21 Global Step: 53890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:29,028-Speed 12937.97 samples/sec Loss 5.1362 LearningRate 0.0484 Epoch: 21 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:30,629-Speed 12797.59 samples/sec Loss 5.1630 LearningRate 0.0483 Epoch: 21 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:32,193-Speed 13100.51 samples/sec Loss 5.0865 LearningRate 0.0483 Epoch: 21 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:33,790-Speed 12831.51 samples/sec Loss 5.1168 LearningRate 0.0483 Epoch: 21 Global Step: 53930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:35,351-Speed 13133.24 samples/sec Loss 5.1375 LearningRate 0.0483 Epoch: 21 Global Step: 53940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:36,903-Speed 13206.88 samples/sec Loss 5.1219 LearningRate 0.0483 Epoch: 21 Global Step: 53950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:38,473-Speed 13049.68 samples/sec Loss 5.1418 LearningRate 0.0482 Epoch: 21 Global Step: 53960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:40,036-Speed 13115.56 samples/sec Loss 5.1396 LearningRate 0.0482 Epoch: 21 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:41,595-Speed 13143.56 samples/sec Loss 5.0856 LearningRate 0.0482 Epoch: 21 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:43,169-Speed 13015.57 samples/sec Loss 5.0787 LearningRate 0.0482 Epoch: 21 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:44,722-Speed 13193.78 samples/sec Loss 5.1383 LearningRate 0.0482 Epoch: 21 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:46,299-Speed 12991.69 samples/sec Loss 5.2011 LearningRate 0.0481 Epoch: 21 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:47,877-Speed 12980.54 samples/sec Loss 5.2031 LearningRate 0.0481 Epoch: 21 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:49,457-Speed 12977.28 samples/sec Loss 5.1944 LearningRate 0.0481 Epoch: 21 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:51,055-Speed 12819.77 samples/sec Loss 5.1482 LearningRate 0.0481 Epoch: 21 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:52,657-Speed 12802.81 samples/sec Loss 5.1241 LearningRate 0.0481 Epoch: 21 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:20:54,220-Speed 13112.54 samples/sec Loss 5.1397 LearningRate 0.0480 Epoch: 21 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:55,790-Speed 13053.50 samples/sec Loss 5.2826 LearningRate 0.0480 Epoch: 21 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:57,391-Speed 12801.86 samples/sec Loss 5.1080 LearningRate 0.0480 Epoch: 21 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:20:58,937-Speed 13257.78 samples/sec Loss 5.1779 LearningRate 0.0480 Epoch: 21 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:00,508-Speed 13045.48 samples/sec Loss 5.0889 LearningRate 0.0480 Epoch: 21 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:02,075-Speed 13072.11 samples/sec Loss 5.2268 LearningRate 0.0479 Epoch: 21 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:03,669-Speed 12855.88 samples/sec Loss 5.2342 LearningRate 0.0479 Epoch: 21 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:05,231-Speed 13125.06 samples/sec Loss 5.1952 LearningRate 0.0479 Epoch: 21 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:06,800-Speed 13053.87 samples/sec Loss 5.1636 LearningRate 0.0479 Epoch: 21 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:08,396-Speed 12841.59 samples/sec Loss 5.2252 LearningRate 0.0479 Epoch: 21 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:09,960-Speed 13104.94 samples/sec Loss 5.2341 LearningRate 0.0478 Epoch: 21 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:21:11,525-Speed 13088.26 samples/sec Loss 5.1640 LearningRate 0.0478 Epoch: 21 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:21:13,065-Speed 13307.81 samples/sec Loss 5.2200 LearningRate 0.0478 Epoch: 21 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:14,645-Speed 12974.59 samples/sec Loss 5.1985 LearningRate 0.0478 Epoch: 21 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:16,215-Speed 13046.71 samples/sec Loss 5.2604 LearningRate 0.0478 Epoch: 21 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:17,799-Speed 12931.59 samples/sec Loss 5.2352 LearningRate 0.0477 Epoch: 21 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:19,398-Speed 12818.04 samples/sec Loss 5.2039 LearningRate 0.0477 Epoch: 21 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:20,951-Speed 13191.40 samples/sec Loss 5.2194 LearningRate 0.0477 Epoch: 21 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:22,522-Speed 13041.99 samples/sec Loss 5.1759 LearningRate 0.0477 Epoch: 21 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:24,106-Speed 12937.53 samples/sec Loss 5.1976 LearningRate 0.0477 Epoch: 21 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:25,662-Speed 13172.55 samples/sec Loss 5.2027 LearningRate 0.0476 Epoch: 21 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:27,258-Speed 12832.67 samples/sec Loss 5.1283 LearningRate 0.0476 Epoch: 21 Global Step: 54270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:28,830-Speed 13043.51 samples/sec Loss 5.2324 LearningRate 0.0476 Epoch: 21 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:30,389-Speed 13140.56 samples/sec Loss 5.1889 LearningRate 0.0476 Epoch: 21 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:31,970-Speed 12985.71 samples/sec Loss 5.2483 LearningRate 0.0476 Epoch: 21 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:33,525-Speed 13183.39 samples/sec Loss 5.2836 LearningRate 0.0475 Epoch: 21 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:35,084-Speed 13144.06 samples/sec Loss 5.3928 LearningRate 0.0475 Epoch: 21 Global Step: 54320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:36,649-Speed 13085.65 samples/sec Loss 5.1732 LearningRate 0.0475 Epoch: 21 Global Step: 54330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:38,223-Speed 13021.58 samples/sec Loss 5.2065 LearningRate 0.0475 Epoch: 21 Global Step: 54340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:39,808-Speed 12934.54 samples/sec Loss 5.2180 LearningRate 0.0475 Epoch: 21 Global Step: 54350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:41,401-Speed 12858.80 samples/sec Loss 5.2351 LearningRate 0.0474 Epoch: 21 Global Step: 54360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:42,977-Speed 12999.59 samples/sec Loss 5.2335 LearningRate 0.0474 Epoch: 21 Global Step: 54370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:44,553-Speed 13001.82 samples/sec Loss 5.1789 LearningRate 0.0474 Epoch: 21 Global Step: 54380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:46,135-Speed 12956.75 samples/sec Loss 5.1930 LearningRate 0.0474 Epoch: 21 Global Step: 54390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:47,690-Speed 13174.20 samples/sec Loss 5.2231 LearningRate 0.0473 Epoch: 21 Global Step: 54400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:49,276-Speed 12917.24 samples/sec Loss 5.1747 LearningRate 0.0473 Epoch: 21 Global Step: 54410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:21:50,846-Speed 13056.46 samples/sec Loss 5.2744 LearningRate 0.0473 Epoch: 21 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:52,421-Speed 13004.77 samples/sec Loss 5.3039 LearningRate 0.0473 Epoch: 21 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:53,983-Speed 13125.42 samples/sec Loss 5.2263 LearningRate 0.0473 Epoch: 21 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:55,555-Speed 13031.83 samples/sec Loss 5.1979 LearningRate 0.0472 Epoch: 21 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:57,160-Speed 12766.16 samples/sec Loss 5.2337 LearningRate 0.0472 Epoch: 21 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:21:58,727-Speed 13099.06 samples/sec Loss 5.2583 LearningRate 0.0472 Epoch: 21 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:00,320-Speed 12864.98 samples/sec Loss 5.2656 LearningRate 0.0472 Epoch: 21 Global Step: 54480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:01,884-Speed 13107.39 samples/sec Loss 5.2808 LearningRate 0.0472 Epoch: 21 Global Step: 54490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:03,469-Speed 12932.05 samples/sec Loss 5.1854 LearningRate 0.0471 Epoch: 21 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:05,027-Speed 13149.86 samples/sec Loss 5.3249 LearningRate 0.0471 Epoch: 21 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:06,573-Speed 13248.20 samples/sec Loss 5.2471 LearningRate 0.0471 Epoch: 21 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:22:08,114-Speed 13296.57 samples/sec Loss 5.2598 LearningRate 0.0471 Epoch: 21 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:09,678-Speed 13107.34 samples/sec Loss 5.3029 LearningRate 0.0471 Epoch: 21 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:11,255-Speed 12995.75 samples/sec Loss 5.3445 LearningRate 0.0470 Epoch: 21 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:12,816-Speed 13123.85 samples/sec Loss 5.2850 LearningRate 0.0470 Epoch: 21 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:14,396-Speed 12972.65 samples/sec Loss 5.2432 LearningRate 0.0470 Epoch: 21 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:15,965-Speed 13059.39 samples/sec Loss 5.2927 LearningRate 0.0470 Epoch: 21 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:17,538-Speed 13021.72 samples/sec Loss 5.2527 LearningRate 0.0470 Epoch: 21 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:19,119-Speed 12965.11 samples/sec Loss 5.2396 LearningRate 0.0469 Epoch: 21 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:20,699-Speed 12968.23 samples/sec Loss 5.2607 LearningRate 0.0469 Epoch: 21 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:22,273-Speed 13016.49 samples/sec Loss 5.2158 LearningRate 0.0469 Epoch: 21 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:23,881-Speed 12747.25 samples/sec Loss 5.3072 LearningRate 0.0469 Epoch: 21 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:22:25,456-Speed 13010.82 samples/sec Loss 5.2232 LearningRate 0.0469 Epoch: 21 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:22:27,032-Speed 12998.50 samples/sec Loss 5.3593 LearningRate 0.0468 Epoch: 21 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:22:28,591-Speed 13146.25 samples/sec Loss 5.1794 LearningRate 0.0468 Epoch: 21 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:22:30,176-Speed 12925.72 samples/sec Loss 5.2624 LearningRate 0.0468 Epoch: 21 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:22:31,791-Speed 12688.00 samples/sec Loss 5.2845 LearningRate 0.0468 Epoch: 21 Global Step: 54680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:33,369-Speed 12987.18 samples/sec Loss 5.2724 LearningRate 0.0468 Epoch: 21 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:34,928-Speed 13147.64 samples/sec Loss 5.3370 LearningRate 0.0467 Epoch: 21 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:36,490-Speed 13117.54 samples/sec Loss 5.3054 LearningRate 0.0467 Epoch: 21 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:38,078-Speed 12922.10 samples/sec Loss 5.3200 LearningRate 0.0467 Epoch: 21 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:39,655-Speed 12993.23 samples/sec Loss 5.3636 LearningRate 0.0467 Epoch: 21 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:41,246-Speed 12882.09 samples/sec Loss 5.2628 LearningRate 0.0467 Epoch: 21 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:42,840-Speed 12851.43 samples/sec Loss 5.2578 LearningRate 0.0466 Epoch: 21 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:44,423-Speed 12949.41 samples/sec Loss 5.2626 LearningRate 0.0466 Epoch: 21 Global Step: 54760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:45,996-Speed 13028.27 samples/sec Loss 5.3143 LearningRate 0.0466 Epoch: 21 Global Step: 54770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:47,571-Speed 13009.89 samples/sec Loss 5.3516 LearningRate 0.0466 Epoch: 21 Global Step: 54780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:49,132-Speed 13127.23 samples/sec Loss 5.2835 LearningRate 0.0466 Epoch: 21 Global Step: 54790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:50,708-Speed 13003.51 samples/sec Loss 5.3392 LearningRate 0.0465 Epoch: 21 Global Step: 54800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:52,282-Speed 13014.77 samples/sec Loss 5.4131 LearningRate 0.0465 Epoch: 21 Global Step: 54810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:53,859-Speed 12994.71 samples/sec Loss 5.3474 LearningRate 0.0465 Epoch: 21 Global Step: 54820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:22:55,429-Speed 13048.06 samples/sec Loss 5.3193 LearningRate 0.0465 Epoch: 21 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:56,993-Speed 13100.62 samples/sec Loss 5.2375 LearningRate 0.0465 Epoch: 21 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:22:58,551-Speed 13159.98 samples/sec Loss 5.3164 LearningRate 0.0464 Epoch: 21 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:00,131-Speed 12960.84 samples/sec Loss 5.2889 LearningRate 0.0464 Epoch: 21 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:01,687-Speed 13172.01 samples/sec Loss 5.2346 LearningRate 0.0464 Epoch: 21 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:03,251-Speed 13095.46 samples/sec Loss 5.3346 LearningRate 0.0464 Epoch: 21 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:04,834-Speed 12952.00 samples/sec Loss 5.2903 LearningRate 0.0464 Epoch: 21 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:06,398-Speed 13103.92 samples/sec Loss 5.3591 LearningRate 0.0463 Epoch: 21 Global Step: 54900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:07,979-Speed 12960.75 samples/sec Loss 5.3641 LearningRate 0.0463 Epoch: 21 Global Step: 54910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:09,566-Speed 12914.07 samples/sec Loss 5.2587 LearningRate 0.0463 Epoch: 21 Global Step: 54920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:11,120-Speed 13179.54 samples/sec Loss 5.2900 LearningRate 0.0463 Epoch: 21 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:23:12,724-Speed 12777.41 samples/sec Loss 5.3039 LearningRate 0.0463 Epoch: 21 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:23:14,284-Speed 13136.99 samples/sec Loss 5.2997 LearningRate 0.0462 Epoch: 21 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:15,866-Speed 12948.73 samples/sec Loss 5.2818 LearningRate 0.0462 Epoch: 21 Global Step: 54960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:17,433-Speed 13082.14 samples/sec Loss 5.2881 LearningRate 0.0462 Epoch: 21 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:19,000-Speed 13075.85 samples/sec Loss 5.2577 LearningRate 0.0462 Epoch: 21 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:20,599-Speed 12813.39 samples/sec Loss 5.2910 LearningRate 0.0462 Epoch: 21 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:22,172-Speed 13025.76 samples/sec Loss 5.2856 LearningRate 0.0461 Epoch: 21 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:23:44,055-[lfw][55000]XNorm: 9.375085 Training: 2022-01-14 16:23:44,056-[lfw][55000]Accuracy-Flip: 0.99583+-0.00261 Training: 2022-01-14 16:23:44,057-[lfw][55000]Accuracy-Highest: 0.99583 Training: 2022-01-14 16:24:09,336-[cfp_fp][55000]XNorm: 7.977271 Training: 2022-01-14 16:24:09,337-[cfp_fp][55000]Accuracy-Flip: 0.96343+-0.01257 Training: 2022-01-14 16:24:09,337-[cfp_fp][55000]Accuracy-Highest: 0.96343 Training: 2022-01-14 16:24:31,378-[agedb_30][55000]XNorm: 9.103124 Training: 2022-01-14 16:24:31,379-[agedb_30][55000]Accuracy-Flip: 0.96700+-0.00865 Training: 2022-01-14 16:24:31,380-[agedb_30][55000]Accuracy-Highest: 0.96700 Training: 2022-01-14 16:24:32,953-Speed 289.35 samples/sec Loss 5.3033 LearningRate 0.0461 Epoch: 21 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:34,528-Speed 13010.31 samples/sec Loss 5.2698 LearningRate 0.0461 Epoch: 21 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:36,118-Speed 12889.15 samples/sec Loss 5.3207 LearningRate 0.0461 Epoch: 21 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:37,701-Speed 12939.24 samples/sec Loss 5.3186 LearningRate 0.0461 Epoch: 21 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:39,290-Speed 12895.34 samples/sec Loss 5.3147 LearningRate 0.0460 Epoch: 21 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:24:40,856-Speed 13083.02 samples/sec Loss 5.2737 LearningRate 0.0460 Epoch: 21 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:24:42,401-Speed 13272.80 samples/sec Loss 5.4029 LearningRate 0.0460 Epoch: 21 Global Step: 55070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:43,965-Speed 13098.49 samples/sec Loss 5.2632 LearningRate 0.0460 Epoch: 21 Global Step: 55080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:45,576-Speed 12716.38 samples/sec Loss 5.3558 LearningRate 0.0460 Epoch: 21 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:47,157-Speed 12969.69 samples/sec Loss 5.3213 LearningRate 0.0459 Epoch: 21 Global Step: 55100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:48,703-Speed 13253.08 samples/sec Loss 5.3278 LearningRate 0.0459 Epoch: 21 Global Step: 55110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:50,285-Speed 12946.09 samples/sec Loss 5.2948 LearningRate 0.0459 Epoch: 21 Global Step: 55120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:51,869-Speed 12942.51 samples/sec Loss 5.3544 LearningRate 0.0459 Epoch: 21 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:53,438-Speed 13060.38 samples/sec Loss 5.2570 LearningRate 0.0459 Epoch: 21 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:55,004-Speed 13081.86 samples/sec Loss 5.2707 LearningRate 0.0458 Epoch: 21 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:56,601-Speed 12836.28 samples/sec Loss 5.3118 LearningRate 0.0458 Epoch: 21 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:24:58,156-Speed 13171.42 samples/sec Loss 5.3477 LearningRate 0.0458 Epoch: 21 Global Step: 55170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:24:59,714-Speed 13152.04 samples/sec Loss 5.2640 LearningRate 0.0458 Epoch: 21 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:01,292-Speed 12984.48 samples/sec Loss 5.4177 LearningRate 0.0458 Epoch: 21 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:02,902-Speed 12727.21 samples/sec Loss 5.3081 LearningRate 0.0457 Epoch: 21 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:04,477-Speed 13010.06 samples/sec Loss 5.3657 LearningRate 0.0457 Epoch: 21 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:06,044-Speed 13089.54 samples/sec Loss 5.3305 LearningRate 0.0457 Epoch: 21 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:07,633-Speed 12899.34 samples/sec Loss 5.4399 LearningRate 0.0457 Epoch: 21 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:09,213-Speed 12967.67 samples/sec Loss 5.3524 LearningRate 0.0457 Epoch: 21 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:10,782-Speed 13050.61 samples/sec Loss 5.2978 LearningRate 0.0456 Epoch: 21 Global Step: 55250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:12,343-Speed 13128.23 samples/sec Loss 5.3252 LearningRate 0.0456 Epoch: 21 Global Step: 55260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:13,897-Speed 13184.04 samples/sec Loss 5.3856 LearningRate 0.0456 Epoch: 21 Global Step: 55270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:15,491-Speed 12857.20 samples/sec Loss 5.3042 LearningRate 0.0456 Epoch: 21 Global Step: 55280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:17,038-Speed 13244.51 samples/sec Loss 5.3244 LearningRate 0.0456 Epoch: 21 Global Step: 55290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:18,606-Speed 13071.06 samples/sec Loss 5.2929 LearningRate 0.0455 Epoch: 21 Global Step: 55300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:20,176-Speed 13050.90 samples/sec Loss 5.2998 LearningRate 0.0455 Epoch: 21 Global Step: 55310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:21,733-Speed 13162.52 samples/sec Loss 5.4162 LearningRate 0.0455 Epoch: 21 Global Step: 55320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:23,295-Speed 13111.27 samples/sec Loss 5.3813 LearningRate 0.0455 Epoch: 21 Global Step: 55330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:24,898-Speed 12789.74 samples/sec Loss 5.3028 LearningRate 0.0455 Epoch: 21 Global Step: 55340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:26,462-Speed 13102.16 samples/sec Loss 5.3432 LearningRate 0.0454 Epoch: 21 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:28,032-Speed 13050.19 samples/sec Loss 5.3407 LearningRate 0.0454 Epoch: 21 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:29,592-Speed 13135.57 samples/sec Loss 5.3968 LearningRate 0.0454 Epoch: 21 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:31,167-Speed 13015.86 samples/sec Loss 5.3278 LearningRate 0.0454 Epoch: 21 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:32,753-Speed 12917.22 samples/sec Loss 5.2995 LearningRate 0.0454 Epoch: 21 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:34,323-Speed 13047.01 samples/sec Loss 5.3697 LearningRate 0.0453 Epoch: 21 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:35,897-Speed 13018.13 samples/sec Loss 5.3676 LearningRate 0.0453 Epoch: 21 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:37,490-Speed 12867.07 samples/sec Loss 5.2951 LearningRate 0.0453 Epoch: 21 Global Step: 55420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:39,091-Speed 12800.00 samples/sec Loss 5.4271 LearningRate 0.0453 Epoch: 21 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:40,679-Speed 12899.61 samples/sec Loss 5.3914 LearningRate 0.0453 Epoch: 21 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:42,261-Speed 12960.04 samples/sec Loss 5.3330 LearningRate 0.0452 Epoch: 21 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:25:43,838-Speed 12987.86 samples/sec Loss 5.3357 LearningRate 0.0452 Epoch: 21 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:25:45,406-Speed 13066.23 samples/sec Loss 5.2835 LearningRate 0.0452 Epoch: 21 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:46,998-Speed 12873.76 samples/sec Loss 5.3828 LearningRate 0.0452 Epoch: 21 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:48,558-Speed 13133.22 samples/sec Loss 5.2944 LearningRate 0.0452 Epoch: 21 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:50,148-Speed 12885.01 samples/sec Loss 5.2736 LearningRate 0.0451 Epoch: 21 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:51,742-Speed 12863.54 samples/sec Loss 5.2892 LearningRate 0.0451 Epoch: 21 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:53,286-Speed 13270.23 samples/sec Loss 5.3508 LearningRate 0.0451 Epoch: 21 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:25:54,883-Speed 12828.19 samples/sec Loss 5.3369 LearningRate 0.0451 Epoch: 21 Global Step: 55530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:56,436-Speed 13199.09 samples/sec Loss 5.3206 LearningRate 0.0451 Epoch: 21 Global Step: 55540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:58,017-Speed 12963.50 samples/sec Loss 5.3117 LearningRate 0.0450 Epoch: 21 Global Step: 55550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:25:59,587-Speed 13046.22 samples/sec Loss 5.3510 LearningRate 0.0450 Epoch: 21 Global Step: 55560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:01,177-Speed 12892.53 samples/sec Loss 5.3300 LearningRate 0.0450 Epoch: 21 Global Step: 55570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:02,778-Speed 12790.81 samples/sec Loss 5.3937 LearningRate 0.0450 Epoch: 21 Global Step: 55580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:04,350-Speed 13061.06 samples/sec Loss 5.3439 LearningRate 0.0450 Epoch: 21 Global Step: 55590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:05,946-Speed 12839.42 samples/sec Loss 5.3186 LearningRate 0.0450 Epoch: 21 Global Step: 55600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:07,533-Speed 12911.74 samples/sec Loss 5.2398 LearningRate 0.0449 Epoch: 21 Global Step: 55610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:09,083-Speed 13221.78 samples/sec Loss 5.3938 LearningRate 0.0449 Epoch: 21 Global Step: 55620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:10,727-Speed 12459.64 samples/sec Loss 5.3599 LearningRate 0.0449 Epoch: 21 Global Step: 55630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:25,164-Speed 1418.84 samples/sec Loss 5.1816 LearningRate 0.0449 Epoch: 22 Global Step: 55640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:26,755-Speed 12881.72 samples/sec Loss 4.5198 LearningRate 0.0449 Epoch: 22 Global Step: 55650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:28,444-Speed 12134.16 samples/sec Loss 4.5136 LearningRate 0.0448 Epoch: 22 Global Step: 55660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:30,027-Speed 12942.15 samples/sec Loss 4.5955 LearningRate 0.0448 Epoch: 22 Global Step: 55670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:31,590-Speed 13113.77 samples/sec Loss 4.4937 LearningRate 0.0448 Epoch: 22 Global Step: 55680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:33,156-Speed 13087.90 samples/sec Loss 4.5624 LearningRate 0.0448 Epoch: 22 Global Step: 55690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:34,754-Speed 12820.51 samples/sec Loss 4.5429 LearningRate 0.0448 Epoch: 22 Global Step: 55700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:36,327-Speed 13024.48 samples/sec Loss 4.6227 LearningRate 0.0447 Epoch: 22 Global Step: 55710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:26:37,898-Speed 13046.08 samples/sec Loss 4.6401 LearningRate 0.0447 Epoch: 22 Global Step: 55720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:39,522-Speed 12621.89 samples/sec Loss 4.6077 LearningRate 0.0447 Epoch: 22 Global Step: 55730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:41,119-Speed 12829.89 samples/sec Loss 4.6028 LearningRate 0.0447 Epoch: 22 Global Step: 55740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:42,726-Speed 12746.83 samples/sec Loss 4.6465 LearningRate 0.0447 Epoch: 22 Global Step: 55750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:44,295-Speed 13060.30 samples/sec Loss 4.6429 LearningRate 0.0446 Epoch: 22 Global Step: 55760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:45,878-Speed 12948.96 samples/sec Loss 4.5485 LearningRate 0.0446 Epoch: 22 Global Step: 55770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:47,432-Speed 13188.04 samples/sec Loss 4.6844 LearningRate 0.0446 Epoch: 22 Global Step: 55780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:49,008-Speed 13000.08 samples/sec Loss 4.5980 LearningRate 0.0446 Epoch: 22 Global Step: 55790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:50,585-Speed 12995.36 samples/sec Loss 4.6812 LearningRate 0.0446 Epoch: 22 Global Step: 55800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:52,154-Speed 13056.58 samples/sec Loss 4.7276 LearningRate 0.0445 Epoch: 22 Global Step: 55810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:26:53,709-Speed 13177.37 samples/sec Loss 4.6324 LearningRate 0.0445 Epoch: 22 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:26:55,285-Speed 13004.36 samples/sec Loss 4.7615 LearningRate 0.0445 Epoch: 22 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:26:56,851-Speed 13087.94 samples/sec Loss 4.7065 LearningRate 0.0445 Epoch: 22 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:26:58,442-Speed 12873.56 samples/sec Loss 4.6816 LearningRate 0.0445 Epoch: 22 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:00,009-Speed 13079.73 samples/sec Loss 4.7324 LearningRate 0.0444 Epoch: 22 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:01,590-Speed 12960.03 samples/sec Loss 4.7836 LearningRate 0.0444 Epoch: 22 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:03,177-Speed 12918.89 samples/sec Loss 4.7463 LearningRate 0.0444 Epoch: 22 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:04,757-Speed 12969.19 samples/sec Loss 4.7206 LearningRate 0.0444 Epoch: 22 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:06,333-Speed 12993.57 samples/sec Loss 4.7323 LearningRate 0.0444 Epoch: 22 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:07,907-Speed 13025.82 samples/sec Loss 4.6848 LearningRate 0.0443 Epoch: 22 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:09,522-Speed 12688.57 samples/sec Loss 4.7183 LearningRate 0.0443 Epoch: 22 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:27:11,076-Speed 13185.67 samples/sec Loss 4.7451 LearningRate 0.0443 Epoch: 22 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:12,655-Speed 12973.09 samples/sec Loss 4.7152 LearningRate 0.0443 Epoch: 22 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:14,236-Speed 12960.87 samples/sec Loss 4.7384 LearningRate 0.0443 Epoch: 22 Global Step: 55950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:15,821-Speed 12953.20 samples/sec Loss 4.8668 LearningRate 0.0442 Epoch: 22 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:17,378-Speed 13162.78 samples/sec Loss 4.7646 LearningRate 0.0442 Epoch: 22 Global Step: 55970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:18,950-Speed 13035.33 samples/sec Loss 4.8783 LearningRate 0.0442 Epoch: 22 Global Step: 55980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:20,519-Speed 13053.07 samples/sec Loss 4.8711 LearningRate 0.0442 Epoch: 22 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:22,074-Speed 13181.73 samples/sec Loss 4.7378 LearningRate 0.0442 Epoch: 22 Global Step: 56000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:23,638-Speed 13099.34 samples/sec Loss 4.8366 LearningRate 0.0441 Epoch: 22 Global Step: 56010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:25,210-Speed 13040.78 samples/sec Loss 4.8329 LearningRate 0.0441 Epoch: 22 Global Step: 56020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:26,792-Speed 12957.96 samples/sec Loss 4.6709 LearningRate 0.0441 Epoch: 22 Global Step: 56030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:28,354-Speed 13121.48 samples/sec Loss 4.7648 LearningRate 0.0441 Epoch: 22 Global Step: 56040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:29,941-Speed 12908.20 samples/sec Loss 4.8359 LearningRate 0.0441 Epoch: 22 Global Step: 56050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:31,503-Speed 13119.68 samples/sec Loss 4.8504 LearningRate 0.0440 Epoch: 22 Global Step: 56060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:33,071-Speed 13074.12 samples/sec Loss 4.9816 LearningRate 0.0440 Epoch: 22 Global Step: 56070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:34,659-Speed 12899.45 samples/sec Loss 4.8559 LearningRate 0.0440 Epoch: 22 Global Step: 56080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:36,233-Speed 13013.88 samples/sec Loss 4.8504 LearningRate 0.0440 Epoch: 22 Global Step: 56090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:37,825-Speed 12878.00 samples/sec Loss 4.8965 LearningRate 0.0440 Epoch: 22 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:39,393-Speed 13065.73 samples/sec Loss 4.8474 LearningRate 0.0439 Epoch: 22 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:40,943-Speed 13218.18 samples/sec Loss 4.8968 LearningRate 0.0439 Epoch: 22 Global Step: 56120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:42,523-Speed 12979.12 samples/sec Loss 4.8742 LearningRate 0.0439 Epoch: 22 Global Step: 56130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:44,108-Speed 12927.27 samples/sec Loss 4.8849 LearningRate 0.0439 Epoch: 22 Global Step: 56140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:45,672-Speed 13096.03 samples/sec Loss 4.8227 LearningRate 0.0439 Epoch: 22 Global Step: 56150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:47,264-Speed 12876.96 samples/sec Loss 4.8966 LearningRate 0.0439 Epoch: 22 Global Step: 56160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:48,823-Speed 13146.23 samples/sec Loss 4.9162 LearningRate 0.0438 Epoch: 22 Global Step: 56170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:50,392-Speed 13061.76 samples/sec Loss 4.9849 LearningRate 0.0438 Epoch: 22 Global Step: 56180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:51,967-Speed 13011.06 samples/sec Loss 4.9398 LearningRate 0.0438 Epoch: 22 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:53,546-Speed 12976.26 samples/sec Loss 4.9375 LearningRate 0.0438 Epoch: 22 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:55,122-Speed 13001.22 samples/sec Loss 4.9530 LearningRate 0.0438 Epoch: 22 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:27:56,690-Speed 13065.15 samples/sec Loss 4.8845 LearningRate 0.0437 Epoch: 22 Global Step: 56220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:58,258-Speed 13075.72 samples/sec Loss 4.8590 LearningRate 0.0437 Epoch: 22 Global Step: 56230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:27:59,852-Speed 12850.92 samples/sec Loss 4.9615 LearningRate 0.0437 Epoch: 22 Global Step: 56240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:01,436-Speed 12932.19 samples/sec Loss 4.9818 LearningRate 0.0437 Epoch: 22 Global Step: 56250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:03,007-Speed 13047.87 samples/sec Loss 4.9720 LearningRate 0.0437 Epoch: 22 Global Step: 56260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:04,591-Speed 12938.40 samples/sec Loss 5.0731 LearningRate 0.0436 Epoch: 22 Global Step: 56270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:06,159-Speed 13068.26 samples/sec Loss 4.9589 LearningRate 0.0436 Epoch: 22 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:07,741-Speed 12955.48 samples/sec Loss 4.9000 LearningRate 0.0436 Epoch: 22 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:09,306-Speed 13093.24 samples/sec Loss 4.9174 LearningRate 0.0436 Epoch: 22 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:10,885-Speed 12979.23 samples/sec Loss 4.9676 LearningRate 0.0436 Epoch: 22 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:12,435-Speed 13217.96 samples/sec Loss 5.0547 LearningRate 0.0435 Epoch: 22 Global Step: 56320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:28:13,988-Speed 13193.49 samples/sec Loss 4.9029 LearningRate 0.0435 Epoch: 22 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:15,588-Speed 12802.60 samples/sec Loss 4.9956 LearningRate 0.0435 Epoch: 22 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:17,205-Speed 12681.68 samples/sec Loss 4.9293 LearningRate 0.0435 Epoch: 22 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:18,777-Speed 13033.71 samples/sec Loss 4.9796 LearningRate 0.0435 Epoch: 22 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:20,331-Speed 13185.19 samples/sec Loss 4.9589 LearningRate 0.0434 Epoch: 22 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:21,888-Speed 13165.88 samples/sec Loss 4.9492 LearningRate 0.0434 Epoch: 22 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:23,455-Speed 13074.94 samples/sec Loss 4.9984 LearningRate 0.0434 Epoch: 22 Global Step: 56390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:25,018-Speed 13110.18 samples/sec Loss 4.9793 LearningRate 0.0434 Epoch: 22 Global Step: 56400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:26,583-Speed 13094.86 samples/sec Loss 4.9273 LearningRate 0.0434 Epoch: 22 Global Step: 56410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:28,143-Speed 13136.79 samples/sec Loss 4.9909 LearningRate 0.0433 Epoch: 22 Global Step: 56420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:29,721-Speed 12985.17 samples/sec Loss 4.9999 LearningRate 0.0433 Epoch: 22 Global Step: 56430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:31,293-Speed 13032.85 samples/sec Loss 4.9493 LearningRate 0.0433 Epoch: 22 Global Step: 56440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:32,891-Speed 12823.04 samples/sec Loss 5.0091 LearningRate 0.0433 Epoch: 22 Global Step: 56450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:34,470-Speed 12977.31 samples/sec Loss 5.0238 LearningRate 0.0433 Epoch: 22 Global Step: 56460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:36,036-Speed 13089.17 samples/sec Loss 4.9897 LearningRate 0.0432 Epoch: 22 Global Step: 56470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:37,622-Speed 12920.66 samples/sec Loss 4.8942 LearningRate 0.0432 Epoch: 22 Global Step: 56480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:39,196-Speed 13013.06 samples/sec Loss 5.0553 LearningRate 0.0432 Epoch: 22 Global Step: 56490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:40,783-Speed 12912.99 samples/sec Loss 4.9166 LearningRate 0.0432 Epoch: 22 Global Step: 56500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:42,353-Speed 13055.65 samples/sec Loss 5.0734 LearningRate 0.0432 Epoch: 22 Global Step: 56510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:28:43,935-Speed 12956.96 samples/sec Loss 4.9952 LearningRate 0.0432 Epoch: 22 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:45,500-Speed 13094.25 samples/sec Loss 5.0221 LearningRate 0.0431 Epoch: 22 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:47,079-Speed 12982.84 samples/sec Loss 5.0762 LearningRate 0.0431 Epoch: 22 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:48,674-Speed 12870.83 samples/sec Loss 5.1545 LearningRate 0.0431 Epoch: 22 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:50,234-Speed 13133.24 samples/sec Loss 5.0395 LearningRate 0.0431 Epoch: 22 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:51,838-Speed 12777.43 samples/sec Loss 5.0489 LearningRate 0.0431 Epoch: 22 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:53,446-Speed 12739.08 samples/sec Loss 5.0411 LearningRate 0.0430 Epoch: 22 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:55,013-Speed 13084.89 samples/sec Loss 4.9318 LearningRate 0.0430 Epoch: 22 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:56,603-Speed 12886.68 samples/sec Loss 5.0918 LearningRate 0.0430 Epoch: 22 Global Step: 56600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:58,171-Speed 13071.24 samples/sec Loss 5.0403 LearningRate 0.0430 Epoch: 22 Global Step: 56610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:28:59,714-Speed 13271.58 samples/sec Loss 5.0254 LearningRate 0.0430 Epoch: 22 Global Step: 56620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:29:01,251-Speed 13333.82 samples/sec Loss 5.1416 LearningRate 0.0429 Epoch: 22 Global Step: 56630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:02,833-Speed 12953.63 samples/sec Loss 4.9953 LearningRate 0.0429 Epoch: 22 Global Step: 56640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:04,414-Speed 12957.37 samples/sec Loss 5.0710 LearningRate 0.0429 Epoch: 22 Global Step: 56650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:05,989-Speed 13011.43 samples/sec Loss 5.0241 LearningRate 0.0429 Epoch: 22 Global Step: 56660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:07,577-Speed 12906.66 samples/sec Loss 5.0168 LearningRate 0.0429 Epoch: 22 Global Step: 56670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:09,146-Speed 13055.46 samples/sec Loss 5.1036 LearningRate 0.0428 Epoch: 22 Global Step: 56680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:10,735-Speed 12894.59 samples/sec Loss 5.0429 LearningRate 0.0428 Epoch: 22 Global Step: 56690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:12,316-Speed 12962.25 samples/sec Loss 5.0465 LearningRate 0.0428 Epoch: 22 Global Step: 56700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:13,929-Speed 12711.42 samples/sec Loss 4.9895 LearningRate 0.0428 Epoch: 22 Global Step: 56710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:15,502-Speed 13030.74 samples/sec Loss 5.1248 LearningRate 0.0428 Epoch: 22 Global Step: 56720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:17,064-Speed 13122.87 samples/sec Loss 5.1134 LearningRate 0.0427 Epoch: 22 Global Step: 56730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:18,644-Speed 12968.32 samples/sec Loss 5.0996 LearningRate 0.0427 Epoch: 22 Global Step: 56740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:20,211-Speed 13072.26 samples/sec Loss 5.0963 LearningRate 0.0427 Epoch: 22 Global Step: 56750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:21,759-Speed 13239.47 samples/sec Loss 5.0162 LearningRate 0.0427 Epoch: 22 Global Step: 56760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:23,330-Speed 13046.83 samples/sec Loss 5.0419 LearningRate 0.0427 Epoch: 22 Global Step: 56770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:24,905-Speed 13005.61 samples/sec Loss 5.1173 LearningRate 0.0427 Epoch: 22 Global Step: 56780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:26,486-Speed 12957.83 samples/sec Loss 5.0806 LearningRate 0.0426 Epoch: 22 Global Step: 56790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:28,044-Speed 13159.14 samples/sec Loss 5.0925 LearningRate 0.0426 Epoch: 22 Global Step: 56800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:29,610-Speed 13081.05 samples/sec Loss 5.0931 LearningRate 0.0426 Epoch: 22 Global Step: 56810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:31,170-Speed 13133.78 samples/sec Loss 5.0796 LearningRate 0.0426 Epoch: 22 Global Step: 56820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:32,750-Speed 12972.31 samples/sec Loss 5.1142 LearningRate 0.0426 Epoch: 22 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:29:34,273-Speed 13454.79 samples/sec Loss 5.1156 LearningRate 0.0425 Epoch: 22 Global Step: 56840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:35,847-Speed 13019.13 samples/sec Loss 5.1302 LearningRate 0.0425 Epoch: 22 Global Step: 56850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:37,428-Speed 12962.94 samples/sec Loss 5.1247 LearningRate 0.0425 Epoch: 22 Global Step: 56860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:38,983-Speed 13181.63 samples/sec Loss 4.9872 LearningRate 0.0425 Epoch: 22 Global Step: 56870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:40,592-Speed 12734.93 samples/sec Loss 5.1100 LearningRate 0.0425 Epoch: 22 Global Step: 56880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:42,148-Speed 13172.58 samples/sec Loss 5.1066 LearningRate 0.0424 Epoch: 22 Global Step: 56890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:43,715-Speed 13072.16 samples/sec Loss 5.0751 LearningRate 0.0424 Epoch: 22 Global Step: 56900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:45,278-Speed 13114.74 samples/sec Loss 5.0741 LearningRate 0.0424 Epoch: 22 Global Step: 56910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:46,842-Speed 13095.26 samples/sec Loss 5.0582 LearningRate 0.0424 Epoch: 22 Global Step: 56920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:48,410-Speed 13067.17 samples/sec Loss 5.1174 LearningRate 0.0424 Epoch: 22 Global Step: 56930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:29:50,004-Speed 12855.73 samples/sec Loss 5.1073 LearningRate 0.0423 Epoch: 22 Global Step: 56940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:51,546-Speed 13289.05 samples/sec Loss 5.0860 LearningRate 0.0423 Epoch: 22 Global Step: 56950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:53,143-Speed 12833.66 samples/sec Loss 5.0959 LearningRate 0.0423 Epoch: 22 Global Step: 56960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:54,711-Speed 13065.12 samples/sec Loss 5.1108 LearningRate 0.0423 Epoch: 22 Global Step: 56970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:56,283-Speed 13032.20 samples/sec Loss 5.2041 LearningRate 0.0423 Epoch: 22 Global Step: 56980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:57,845-Speed 13120.94 samples/sec Loss 5.1490 LearningRate 0.0422 Epoch: 22 Global Step: 56990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:29:59,410-Speed 13097.68 samples/sec Loss 5.1610 LearningRate 0.0422 Epoch: 22 Global Step: 57000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:30:00,986-Speed 13002.29 samples/sec Loss 5.0758 LearningRate 0.0422 Epoch: 22 Global Step: 57010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:30:02,560-Speed 13022.69 samples/sec Loss 5.1476 LearningRate 0.0422 Epoch: 22 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:30:04,135-Speed 13007.22 samples/sec Loss 5.0211 LearningRate 0.0422 Epoch: 22 Global Step: 57030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:30:05,715-Speed 12963.16 samples/sec Loss 5.0652 LearningRate 0.0422 Epoch: 22 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:07,261-Speed 13259.80 samples/sec Loss 5.1431 LearningRate 0.0421 Epoch: 22 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:08,836-Speed 13009.27 samples/sec Loss 5.1318 LearningRate 0.0421 Epoch: 22 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:10,396-Speed 13130.03 samples/sec Loss 5.1968 LearningRate 0.0421 Epoch: 22 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:11,962-Speed 13089.91 samples/sec Loss 5.0882 LearningRate 0.0421 Epoch: 22 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:13,538-Speed 13003.77 samples/sec Loss 5.1417 LearningRate 0.0421 Epoch: 22 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:15,106-Speed 13065.42 samples/sec Loss 5.0833 LearningRate 0.0420 Epoch: 22 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:16,681-Speed 13007.74 samples/sec Loss 5.1801 LearningRate 0.0420 Epoch: 22 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:18,270-Speed 12894.71 samples/sec Loss 5.1870 LearningRate 0.0420 Epoch: 22 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:19,852-Speed 12958.08 samples/sec Loss 5.2120 LearningRate 0.0420 Epoch: 22 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:21,439-Speed 12907.23 samples/sec Loss 5.1821 LearningRate 0.0420 Epoch: 22 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:30:23,013-Speed 13025.03 samples/sec Loss 5.1718 LearningRate 0.0419 Epoch: 22 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:30:24,574-Speed 13128.25 samples/sec Loss 5.1826 LearningRate 0.0419 Epoch: 22 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:26,142-Speed 13067.09 samples/sec Loss 5.1787 LearningRate 0.0419 Epoch: 22 Global Step: 57170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:27,705-Speed 13114.72 samples/sec Loss 5.1213 LearningRate 0.0419 Epoch: 22 Global Step: 57180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:29,257-Speed 13194.61 samples/sec Loss 5.1195 LearningRate 0.0419 Epoch: 22 Global Step: 57190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:30,836-Speed 12983.55 samples/sec Loss 5.1937 LearningRate 0.0418 Epoch: 22 Global Step: 57200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:32,407-Speed 13045.56 samples/sec Loss 5.1021 LearningRate 0.0418 Epoch: 22 Global Step: 57210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:33,975-Speed 13061.39 samples/sec Loss 5.2108 LearningRate 0.0418 Epoch: 22 Global Step: 57220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:35,535-Speed 13138.17 samples/sec Loss 5.1231 LearningRate 0.0418 Epoch: 22 Global Step: 57230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:37,104-Speed 13058.96 samples/sec Loss 5.1178 LearningRate 0.0418 Epoch: 22 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:38,680-Speed 13001.06 samples/sec Loss 5.0804 LearningRate 0.0418 Epoch: 22 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:40,246-Speed 13080.44 samples/sec Loss 5.2903 LearningRate 0.0417 Epoch: 22 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:30:41,815-Speed 13062.57 samples/sec Loss 5.2639 LearningRate 0.0417 Epoch: 22 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:43,390-Speed 13015.18 samples/sec Loss 5.1957 LearningRate 0.0417 Epoch: 22 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:44,985-Speed 12847.33 samples/sec Loss 5.1411 LearningRate 0.0417 Epoch: 22 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:46,516-Speed 13379.40 samples/sec Loss 5.1509 LearningRate 0.0417 Epoch: 22 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:48,084-Speed 13071.57 samples/sec Loss 5.2554 LearningRate 0.0416 Epoch: 22 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:49,672-Speed 12906.59 samples/sec Loss 5.2631 LearningRate 0.0416 Epoch: 22 Global Step: 57320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:51,240-Speed 13062.75 samples/sec Loss 5.1924 LearningRate 0.0416 Epoch: 22 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:52,830-Speed 12891.13 samples/sec Loss 5.2036 LearningRate 0.0416 Epoch: 22 Global Step: 57340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:54,384-Speed 13180.54 samples/sec Loss 5.1600 LearningRate 0.0416 Epoch: 22 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:55,977-Speed 12862.16 samples/sec Loss 5.0666 LearningRate 0.0415 Epoch: 22 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:30:57,554-Speed 12995.58 samples/sec Loss 5.1135 LearningRate 0.0415 Epoch: 22 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:30:59,132-Speed 12986.72 samples/sec Loss 5.1160 LearningRate 0.0415 Epoch: 22 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:31:00,684-Speed 13198.80 samples/sec Loss 5.1168 LearningRate 0.0415 Epoch: 22 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:31:02,255-Speed 13055.08 samples/sec Loss 5.1617 LearningRate 0.0415 Epoch: 22 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:31:03,831-Speed 13003.46 samples/sec Loss 5.2270 LearningRate 0.0414 Epoch: 22 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:05,385-Speed 13182.42 samples/sec Loss 5.1350 LearningRate 0.0414 Epoch: 22 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:06,954-Speed 13058.01 samples/sec Loss 5.2144 LearningRate 0.0414 Epoch: 22 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:08,535-Speed 12959.03 samples/sec Loss 5.2206 LearningRate 0.0414 Epoch: 22 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:10,106-Speed 13040.52 samples/sec Loss 5.1433 LearningRate 0.0414 Epoch: 22 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:11,702-Speed 12844.84 samples/sec Loss 5.0601 LearningRate 0.0414 Epoch: 22 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:13,277-Speed 13012.83 samples/sec Loss 5.2117 LearningRate 0.0413 Epoch: 22 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:14,828-Speed 13208.31 samples/sec Loss 5.1164 LearningRate 0.0413 Epoch: 22 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:16,398-Speed 13049.18 samples/sec Loss 5.1116 LearningRate 0.0413 Epoch: 22 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:17,967-Speed 13064.60 samples/sec Loss 5.1724 LearningRate 0.0413 Epoch: 22 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:19,552-Speed 12924.55 samples/sec Loss 5.2352 LearningRate 0.0413 Epoch: 22 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:31:21,105-Speed 13201.97 samples/sec Loss 5.1470 LearningRate 0.0412 Epoch: 22 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:22,660-Speed 13174.77 samples/sec Loss 5.2305 LearningRate 0.0412 Epoch: 22 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:24,238-Speed 12989.55 samples/sec Loss 5.2076 LearningRate 0.0412 Epoch: 22 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:25,814-Speed 13000.55 samples/sec Loss 5.1580 LearningRate 0.0412 Epoch: 22 Global Step: 57550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:27,382-Speed 13064.28 samples/sec Loss 5.2056 LearningRate 0.0412 Epoch: 22 Global Step: 57560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:28,968-Speed 12922.35 samples/sec Loss 5.2484 LearningRate 0.0411 Epoch: 22 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:30,600-Speed 12555.44 samples/sec Loss 5.2360 LearningRate 0.0411 Epoch: 22 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:32,173-Speed 13024.19 samples/sec Loss 5.0858 LearningRate 0.0411 Epoch: 22 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:33,723-Speed 13225.19 samples/sec Loss 5.2001 LearningRate 0.0411 Epoch: 22 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:35,298-Speed 13010.52 samples/sec Loss 5.1895 LearningRate 0.0411 Epoch: 22 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:36,876-Speed 12984.20 samples/sec Loss 5.2046 LearningRate 0.0411 Epoch: 22 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:31:38,448-Speed 13039.71 samples/sec Loss 5.2092 LearningRate 0.0410 Epoch: 22 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:40,023-Speed 13005.73 samples/sec Loss 5.1401 LearningRate 0.0410 Epoch: 22 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:41,602-Speed 12980.08 samples/sec Loss 5.1967 LearningRate 0.0410 Epoch: 22 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:43,174-Speed 13036.78 samples/sec Loss 5.1874 LearningRate 0.0410 Epoch: 22 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:44,752-Speed 12980.25 samples/sec Loss 5.2349 LearningRate 0.0410 Epoch: 22 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:46,321-Speed 13059.95 samples/sec Loss 5.2554 LearningRate 0.0409 Epoch: 22 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:47,900-Speed 12982.51 samples/sec Loss 5.2151 LearningRate 0.0409 Epoch: 22 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:49,471-Speed 13043.55 samples/sec Loss 5.1558 LearningRate 0.0409 Epoch: 22 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:51,030-Speed 13142.00 samples/sec Loss 5.2315 LearningRate 0.0409 Epoch: 22 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:52,637-Speed 12753.21 samples/sec Loss 5.1936 LearningRate 0.0409 Epoch: 22 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:54,196-Speed 13148.18 samples/sec Loss 5.1840 LearningRate 0.0408 Epoch: 22 Global Step: 57730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:31:55,772-Speed 13000.99 samples/sec Loss 5.2342 LearningRate 0.0408 Epoch: 22 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:31:57,319-Speed 13254.93 samples/sec Loss 5.1962 LearningRate 0.0408 Epoch: 22 Global Step: 57750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:31:58,888-Speed 13056.12 samples/sec Loss 5.1863 LearningRate 0.0408 Epoch: 22 Global Step: 57760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:00,446-Speed 13155.06 samples/sec Loss 5.1754 LearningRate 0.0408 Epoch: 22 Global Step: 57770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:02,015-Speed 13056.76 samples/sec Loss 5.2376 LearningRate 0.0408 Epoch: 22 Global Step: 57780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:03,582-Speed 13075.35 samples/sec Loss 5.2289 LearningRate 0.0407 Epoch: 22 Global Step: 57790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:05,177-Speed 12846.11 samples/sec Loss 5.1754 LearningRate 0.0407 Epoch: 22 Global Step: 57800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:06,764-Speed 12910.42 samples/sec Loss 5.2127 LearningRate 0.0407 Epoch: 22 Global Step: 57810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:08,356-Speed 12879.67 samples/sec Loss 5.1253 LearningRate 0.0407 Epoch: 22 Global Step: 57820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:09,930-Speed 13019.85 samples/sec Loss 5.2058 LearningRate 0.0407 Epoch: 22 Global Step: 57830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:11,487-Speed 13163.76 samples/sec Loss 5.2007 LearningRate 0.0406 Epoch: 22 Global Step: 57840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:32:13,066-Speed 12974.58 samples/sec Loss 5.2776 LearningRate 0.0406 Epoch: 22 Global Step: 57850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:14,637-Speed 13045.24 samples/sec Loss 5.1884 LearningRate 0.0406 Epoch: 22 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:16,193-Speed 13171.40 samples/sec Loss 5.2179 LearningRate 0.0406 Epoch: 22 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:17,768-Speed 13010.60 samples/sec Loss 5.1859 LearningRate 0.0406 Epoch: 22 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:19,322-Speed 13188.68 samples/sec Loss 5.1797 LearningRate 0.0405 Epoch: 22 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:20,921-Speed 12812.32 samples/sec Loss 5.2427 LearningRate 0.0405 Epoch: 22 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:22,505-Speed 12934.66 samples/sec Loss 5.2238 LearningRate 0.0405 Epoch: 22 Global Step: 57910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:24,069-Speed 13102.06 samples/sec Loss 5.2481 LearningRate 0.0405 Epoch: 22 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:25,642-Speed 13027.62 samples/sec Loss 5.2435 LearningRate 0.0405 Epoch: 22 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:27,206-Speed 13104.51 samples/sec Loss 5.3014 LearningRate 0.0405 Epoch: 22 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:28,767-Speed 13125.68 samples/sec Loss 5.2443 LearningRate 0.0404 Epoch: 22 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:30,328-Speed 13124.70 samples/sec Loss 5.1872 LearningRate 0.0404 Epoch: 22 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:31,901-Speed 13025.55 samples/sec Loss 5.2508 LearningRate 0.0404 Epoch: 22 Global Step: 57970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:33,473-Speed 13038.68 samples/sec Loss 5.1298 LearningRate 0.0404 Epoch: 22 Global Step: 57980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:35,044-Speed 13040.41 samples/sec Loss 5.2786 LearningRate 0.0404 Epoch: 22 Global Step: 57990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:36,612-Speed 13064.39 samples/sec Loss 5.2492 LearningRate 0.0403 Epoch: 22 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:38,171-Speed 13153.70 samples/sec Loss 5.2427 LearningRate 0.0403 Epoch: 22 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:39,741-Speed 13047.24 samples/sec Loss 5.2240 LearningRate 0.0403 Epoch: 22 Global Step: 58020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:41,302-Speed 13130.54 samples/sec Loss 5.2396 LearningRate 0.0403 Epoch: 22 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:42,855-Speed 13195.76 samples/sec Loss 5.2510 LearningRate 0.0403 Epoch: 22 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:44,430-Speed 13012.88 samples/sec Loss 5.1995 LearningRate 0.0402 Epoch: 22 Global Step: 58050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:46,021-Speed 12871.13 samples/sec Loss 5.1077 LearningRate 0.0402 Epoch: 22 Global Step: 58060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:47,604-Speed 12953.26 samples/sec Loss 5.2399 LearningRate 0.0402 Epoch: 22 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:49,177-Speed 13026.17 samples/sec Loss 5.1981 LearningRate 0.0402 Epoch: 22 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:50,765-Speed 12899.39 samples/sec Loss 5.1692 LearningRate 0.0402 Epoch: 22 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:52,332-Speed 13084.21 samples/sec Loss 5.2420 LearningRate 0.0402 Epoch: 22 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:53,911-Speed 12971.24 samples/sec Loss 5.2064 LearningRate 0.0401 Epoch: 22 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:55,486-Speed 13014.18 samples/sec Loss 5.2769 LearningRate 0.0401 Epoch: 22 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:57,045-Speed 13139.41 samples/sec Loss 5.1885 LearningRate 0.0401 Epoch: 22 Global Step: 58130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:32:58,623-Speed 12994.52 samples/sec Loss 5.1928 LearningRate 0.0401 Epoch: 22 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:00,192-Speed 13052.18 samples/sec Loss 5.2491 LearningRate 0.0401 Epoch: 22 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:33:01,847-Speed 12396.86 samples/sec Loss 5.1894 LearningRate 0.0400 Epoch: 22 Global Step: 58160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:33:17,011-Speed 1350.79 samples/sec Loss 4.9247 LearningRate 0.0400 Epoch: 23 Global Step: 58170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:18,700-Speed 12135.13 samples/sec Loss 4.4248 LearningRate 0.0400 Epoch: 23 Global Step: 58180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:20,507-Speed 11343.50 samples/sec Loss 4.5582 LearningRate 0.0400 Epoch: 23 Global Step: 58190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:22,118-Speed 12712.39 samples/sec Loss 4.4555 LearningRate 0.0400 Epoch: 23 Global Step: 58200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:23,700-Speed 12955.20 samples/sec Loss 4.4696 LearningRate 0.0399 Epoch: 23 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:25,265-Speed 13092.48 samples/sec Loss 4.4902 LearningRate 0.0399 Epoch: 23 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:26,848-Speed 12945.05 samples/sec Loss 4.4945 LearningRate 0.0399 Epoch: 23 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:28,402-Speed 13196.33 samples/sec Loss 4.4560 LearningRate 0.0399 Epoch: 23 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:29,983-Speed 12958.78 samples/sec Loss 4.5154 LearningRate 0.0399 Epoch: 23 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:31,572-Speed 12891.04 samples/sec Loss 4.4947 LearningRate 0.0399 Epoch: 23 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:33,132-Speed 13146.45 samples/sec Loss 4.5643 LearningRate 0.0398 Epoch: 23 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:33:34,709-Speed 12987.36 samples/sec Loss 4.5718 LearningRate 0.0398 Epoch: 23 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:33:36,284-Speed 13014.71 samples/sec Loss 4.4708 LearningRate 0.0398 Epoch: 23 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:33:37,854-Speed 13051.97 samples/sec Loss 4.5215 LearningRate 0.0398 Epoch: 23 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:33:39,398-Speed 13271.93 samples/sec Loss 4.5820 LearningRate 0.0398 Epoch: 23 Global Step: 58310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:40,972-Speed 13013.75 samples/sec Loss 4.5898 LearningRate 0.0397 Epoch: 23 Global Step: 58320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:42,545-Speed 13034.46 samples/sec Loss 4.4680 LearningRate 0.0397 Epoch: 23 Global Step: 58330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:44,133-Speed 12901.68 samples/sec Loss 4.5221 LearningRate 0.0397 Epoch: 23 Global Step: 58340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:45,714-Speed 12959.80 samples/sec Loss 4.4978 LearningRate 0.0397 Epoch: 23 Global Step: 58350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:47,286-Speed 13037.66 samples/sec Loss 4.7031 LearningRate 0.0397 Epoch: 23 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:48,844-Speed 13147.75 samples/sec Loss 4.5319 LearningRate 0.0397 Epoch: 23 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:50,410-Speed 13088.29 samples/sec Loss 4.5734 LearningRate 0.0396 Epoch: 23 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:52,005-Speed 12844.89 samples/sec Loss 4.6127 LearningRate 0.0396 Epoch: 23 Global Step: 58390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:53,606-Speed 12802.48 samples/sec Loss 4.5829 LearningRate 0.0396 Epoch: 23 Global Step: 58400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:55,164-Speed 13156.27 samples/sec Loss 4.6112 LearningRate 0.0396 Epoch: 23 Global Step: 58410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:56,727-Speed 13105.55 samples/sec Loss 4.5916 LearningRate 0.0396 Epoch: 23 Global Step: 58420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:58,278-Speed 13217.22 samples/sec Loss 4.6538 LearningRate 0.0395 Epoch: 23 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:33:59,843-Speed 13090.53 samples/sec Loss 4.5380 LearningRate 0.0395 Epoch: 23 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:01,413-Speed 13049.92 samples/sec Loss 4.6572 LearningRate 0.0395 Epoch: 23 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:02,980-Speed 13081.07 samples/sec Loss 4.5956 LearningRate 0.0395 Epoch: 23 Global Step: 58460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:04,556-Speed 13003.19 samples/sec Loss 4.5812 LearningRate 0.0395 Epoch: 23 Global Step: 58470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:06,110-Speed 13190.75 samples/sec Loss 4.6065 LearningRate 0.0394 Epoch: 23 Global Step: 58480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:07,692-Speed 12951.78 samples/sec Loss 4.7114 LearningRate 0.0394 Epoch: 23 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:09,262-Speed 13052.19 samples/sec Loss 4.7261 LearningRate 0.0394 Epoch: 23 Global Step: 58500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:10,841-Speed 12977.21 samples/sec Loss 4.6524 LearningRate 0.0394 Epoch: 23 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:34:12,400-Speed 13140.37 samples/sec Loss 4.6540 LearningRate 0.0394 Epoch: 23 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:34:13,969-Speed 13063.99 samples/sec Loss 4.6155 LearningRate 0.0394 Epoch: 23 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:34:15,527-Speed 13149.85 samples/sec Loss 4.6933 LearningRate 0.0393 Epoch: 23 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:34:17,091-Speed 13099.43 samples/sec Loss 4.7140 LearningRate 0.0393 Epoch: 23 Global Step: 58550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:18,676-Speed 12934.95 samples/sec Loss 4.7078 LearningRate 0.0393 Epoch: 23 Global Step: 58560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:20,249-Speed 13027.01 samples/sec Loss 4.5957 LearningRate 0.0393 Epoch: 23 Global Step: 58570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:21,825-Speed 12998.16 samples/sec Loss 4.7290 LearningRate 0.0393 Epoch: 23 Global Step: 58580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:23,389-Speed 13106.96 samples/sec Loss 4.6376 LearningRate 0.0392 Epoch: 23 Global Step: 58590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:24,964-Speed 13003.85 samples/sec Loss 4.7717 LearningRate 0.0392 Epoch: 23 Global Step: 58600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:26,535-Speed 13045.20 samples/sec Loss 4.7433 LearningRate 0.0392 Epoch: 23 Global Step: 58610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:28,097-Speed 13114.99 samples/sec Loss 4.6860 LearningRate 0.0392 Epoch: 23 Global Step: 58620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:29,675-Speed 12982.84 samples/sec Loss 4.6993 LearningRate 0.0392 Epoch: 23 Global Step: 58630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:31,248-Speed 13031.77 samples/sec Loss 4.6519 LearningRate 0.0392 Epoch: 23 Global Step: 58640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:32,839-Speed 12877.03 samples/sec Loss 4.7023 LearningRate 0.0391 Epoch: 23 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:34:34,402-Speed 13114.38 samples/sec Loss 4.8185 LearningRate 0.0391 Epoch: 23 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:35,953-Speed 13211.15 samples/sec Loss 4.7415 LearningRate 0.0391 Epoch: 23 Global Step: 58670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:37,514-Speed 13126.13 samples/sec Loss 4.7292 LearningRate 0.0391 Epoch: 23 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:39,079-Speed 13101.14 samples/sec Loss 4.7488 LearningRate 0.0391 Epoch: 23 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:40,652-Speed 13027.25 samples/sec Loss 4.7963 LearningRate 0.0390 Epoch: 23 Global Step: 58700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:42,217-Speed 13092.03 samples/sec Loss 4.6989 LearningRate 0.0390 Epoch: 23 Global Step: 58710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:43,816-Speed 12810.03 samples/sec Loss 4.7809 LearningRate 0.0390 Epoch: 23 Global Step: 58720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:45,386-Speed 13055.62 samples/sec Loss 4.7544 LearningRate 0.0390 Epoch: 23 Global Step: 58730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:46,953-Speed 13070.96 samples/sec Loss 4.7369 LearningRate 0.0390 Epoch: 23 Global Step: 58740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:48,518-Speed 13094.67 samples/sec Loss 4.7495 LearningRate 0.0389 Epoch: 23 Global Step: 58750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:50,083-Speed 13097.15 samples/sec Loss 4.7974 LearningRate 0.0389 Epoch: 23 Global Step: 58760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:51,677-Speed 12846.45 samples/sec Loss 4.8115 LearningRate 0.0389 Epoch: 23 Global Step: 58770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:53,263-Speed 12930.47 samples/sec Loss 4.8316 LearningRate 0.0389 Epoch: 23 Global Step: 58780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:54,849-Speed 12916.97 samples/sec Loss 4.7536 LearningRate 0.0389 Epoch: 23 Global Step: 58790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:56,422-Speed 13027.18 samples/sec Loss 4.7500 LearningRate 0.0389 Epoch: 23 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:34:57,983-Speed 13126.70 samples/sec Loss 4.7821 LearningRate 0.0388 Epoch: 23 Global Step: 58810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:34:59,531-Speed 13240.61 samples/sec Loss 4.7459 LearningRate 0.0388 Epoch: 23 Global Step: 58820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:01,105-Speed 13013.56 samples/sec Loss 4.8996 LearningRate 0.0388 Epoch: 23 Global Step: 58830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:02,679-Speed 13021.73 samples/sec Loss 4.8958 LearningRate 0.0388 Epoch: 23 Global Step: 58840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:04,247-Speed 13070.59 samples/sec Loss 4.8809 LearningRate 0.0388 Epoch: 23 Global Step: 58850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:05,834-Speed 12915.61 samples/sec Loss 4.8389 LearningRate 0.0387 Epoch: 23 Global Step: 58860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:07,397-Speed 13107.44 samples/sec Loss 4.8023 LearningRate 0.0387 Epoch: 23 Global Step: 58870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:08,963-Speed 13088.33 samples/sec Loss 4.8270 LearningRate 0.0387 Epoch: 23 Global Step: 58880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:10,530-Speed 13072.04 samples/sec Loss 4.8993 LearningRate 0.0387 Epoch: 23 Global Step: 58890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:12,117-Speed 12913.75 samples/sec Loss 4.8177 LearningRate 0.0387 Epoch: 23 Global Step: 58900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:13,685-Speed 13065.10 samples/sec Loss 4.9084 LearningRate 0.0387 Epoch: 23 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:15,266-Speed 12962.02 samples/sec Loss 4.8370 LearningRate 0.0386 Epoch: 23 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:16,830-Speed 13095.32 samples/sec Loss 4.8616 LearningRate 0.0386 Epoch: 23 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:18,392-Speed 13123.55 samples/sec Loss 4.9359 LearningRate 0.0386 Epoch: 23 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:19,973-Speed 12957.51 samples/sec Loss 4.8306 LearningRate 0.0386 Epoch: 23 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:21,527-Speed 13190.87 samples/sec Loss 4.8913 LearningRate 0.0386 Epoch: 23 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:23,089-Speed 13117.16 samples/sec Loss 4.9169 LearningRate 0.0385 Epoch: 23 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:24,666-Speed 12993.65 samples/sec Loss 4.8804 LearningRate 0.0385 Epoch: 23 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:26,240-Speed 13020.68 samples/sec Loss 4.9436 LearningRate 0.0385 Epoch: 23 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:27,805-Speed 13094.39 samples/sec Loss 4.8858 LearningRate 0.0385 Epoch: 23 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:29,369-Speed 13104.52 samples/sec Loss 4.8900 LearningRate 0.0385 Epoch: 23 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:30,940-Speed 13041.17 samples/sec Loss 4.8964 LearningRate 0.0385 Epoch: 23 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:32,518-Speed 12984.97 samples/sec Loss 4.8054 LearningRate 0.0384 Epoch: 23 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:34,083-Speed 13088.70 samples/sec Loss 4.8435 LearningRate 0.0384 Epoch: 23 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:35,644-Speed 13130.78 samples/sec Loss 4.9415 LearningRate 0.0384 Epoch: 23 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:37,211-Speed 13077.28 samples/sec Loss 4.8793 LearningRate 0.0384 Epoch: 23 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:38,763-Speed 13202.09 samples/sec Loss 4.8793 LearningRate 0.0384 Epoch: 23 Global Step: 59070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:40,348-Speed 12925.97 samples/sec Loss 4.9110 LearningRate 0.0383 Epoch: 23 Global Step: 59080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:41,916-Speed 13069.86 samples/sec Loss 4.9503 LearningRate 0.0383 Epoch: 23 Global Step: 59090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:43,489-Speed 13025.92 samples/sec Loss 4.8715 LearningRate 0.0383 Epoch: 23 Global Step: 59100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:45,080-Speed 12883.85 samples/sec Loss 4.9418 LearningRate 0.0383 Epoch: 23 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:35:46,640-Speed 13134.17 samples/sec Loss 4.8735 LearningRate 0.0383 Epoch: 23 Global Step: 59120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:48,189-Speed 13232.42 samples/sec Loss 4.9334 LearningRate 0.0383 Epoch: 23 Global Step: 59130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:49,774-Speed 12923.01 samples/sec Loss 4.8907 LearningRate 0.0382 Epoch: 23 Global Step: 59140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:51,328-Speed 13186.05 samples/sec Loss 4.8470 LearningRate 0.0382 Epoch: 23 Global Step: 59150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:52,902-Speed 13020.55 samples/sec Loss 4.8850 LearningRate 0.0382 Epoch: 23 Global Step: 59160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:54,484-Speed 12950.12 samples/sec Loss 4.9165 LearningRate 0.0382 Epoch: 23 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:56,070-Speed 12918.25 samples/sec Loss 4.9547 LearningRate 0.0382 Epoch: 23 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:35:57,606-Speed 13339.92 samples/sec Loss 4.8500 LearningRate 0.0381 Epoch: 23 Global Step: 59190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:35:59,166-Speed 13141.11 samples/sec Loss 4.9573 LearningRate 0.0381 Epoch: 23 Global Step: 59200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:00,723-Speed 13159.00 samples/sec Loss 4.9493 LearningRate 0.0381 Epoch: 23 Global Step: 59210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:02,344-Speed 12637.81 samples/sec Loss 4.9133 LearningRate 0.0381 Epoch: 23 Global Step: 59220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:03,928-Speed 12940.79 samples/sec Loss 4.9286 LearningRate 0.0381 Epoch: 23 Global Step: 59230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:05,493-Speed 13088.61 samples/sec Loss 4.9739 LearningRate 0.0381 Epoch: 23 Global Step: 59240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:07,034-Speed 13297.63 samples/sec Loss 4.9178 LearningRate 0.0380 Epoch: 23 Global Step: 59250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:08,613-Speed 12983.78 samples/sec Loss 4.9200 LearningRate 0.0380 Epoch: 23 Global Step: 59260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:10,185-Speed 13027.26 samples/sec Loss 4.9763 LearningRate 0.0380 Epoch: 23 Global Step: 59270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:11,744-Speed 13142.82 samples/sec Loss 4.8664 LearningRate 0.0380 Epoch: 23 Global Step: 59280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:13,315-Speed 13045.45 samples/sec Loss 4.9397 LearningRate 0.0380 Epoch: 23 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:14,900-Speed 12926.48 samples/sec Loss 4.9476 LearningRate 0.0379 Epoch: 23 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:16,483-Speed 12942.83 samples/sec Loss 5.0219 LearningRate 0.0379 Epoch: 23 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:18,053-Speed 13053.09 samples/sec Loss 4.9041 LearningRate 0.0379 Epoch: 23 Global Step: 59320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:19,635-Speed 12948.57 samples/sec Loss 4.9820 LearningRate 0.0379 Epoch: 23 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:21,202-Speed 13075.36 samples/sec Loss 4.9819 LearningRate 0.0379 Epoch: 23 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:22,776-Speed 13023.27 samples/sec Loss 5.0623 LearningRate 0.0379 Epoch: 23 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:24,337-Speed 13128.58 samples/sec Loss 4.9470 LearningRate 0.0378 Epoch: 23 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:25,886-Speed 13224.13 samples/sec Loss 5.0734 LearningRate 0.0378 Epoch: 23 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:27,437-Speed 13209.70 samples/sec Loss 4.9236 LearningRate 0.0378 Epoch: 23 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:29,039-Speed 12797.89 samples/sec Loss 4.9834 LearningRate 0.0378 Epoch: 23 Global Step: 59390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:36:30,583-Speed 13264.09 samples/sec Loss 4.9364 LearningRate 0.0378 Epoch: 23 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:32,162-Speed 12984.07 samples/sec Loss 4.9796 LearningRate 0.0377 Epoch: 23 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:33,726-Speed 13097.59 samples/sec Loss 5.0147 LearningRate 0.0377 Epoch: 23 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:35,304-Speed 12985.85 samples/sec Loss 5.0003 LearningRate 0.0377 Epoch: 23 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:36,900-Speed 12838.89 samples/sec Loss 5.0317 LearningRate 0.0377 Epoch: 23 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:38,479-Speed 12979.94 samples/sec Loss 4.9096 LearningRate 0.0377 Epoch: 23 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:40,052-Speed 13022.38 samples/sec Loss 4.9706 LearningRate 0.0377 Epoch: 23 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:41,648-Speed 12837.97 samples/sec Loss 5.0166 LearningRate 0.0376 Epoch: 23 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:43,221-Speed 13032.80 samples/sec Loss 4.9731 LearningRate 0.0376 Epoch: 23 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:44,799-Speed 12982.03 samples/sec Loss 5.0600 LearningRate 0.0376 Epoch: 23 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:46,358-Speed 13144.41 samples/sec Loss 5.0179 LearningRate 0.0376 Epoch: 23 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:36:47,937-Speed 12986.54 samples/sec Loss 5.0033 LearningRate 0.0376 Epoch: 23 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:49,516-Speed 12976.83 samples/sec Loss 5.0311 LearningRate 0.0375 Epoch: 23 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:51,094-Speed 12983.71 samples/sec Loss 5.0396 LearningRate 0.0375 Epoch: 23 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:52,646-Speed 13201.88 samples/sec Loss 4.9991 LearningRate 0.0375 Epoch: 23 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:54,237-Speed 12887.95 samples/sec Loss 4.9177 LearningRate 0.0375 Epoch: 23 Global Step: 59550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:55,803-Speed 13080.86 samples/sec Loss 5.0390 LearningRate 0.0375 Epoch: 23 Global Step: 59560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:36:57,368-Speed 13095.75 samples/sec Loss 5.0272 LearningRate 0.0375 Epoch: 23 Global Step: 59570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:36:58,928-Speed 13138.76 samples/sec Loss 5.0334 LearningRate 0.0374 Epoch: 23 Global Step: 59580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:00,483-Speed 13172.42 samples/sec Loss 5.0863 LearningRate 0.0374 Epoch: 23 Global Step: 59590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:02,044-Speed 13122.21 samples/sec Loss 4.9422 LearningRate 0.0374 Epoch: 23 Global Step: 59600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:03,634-Speed 12893.32 samples/sec Loss 5.0139 LearningRate 0.0374 Epoch: 23 Global Step: 59610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:05,203-Speed 13055.43 samples/sec Loss 4.9864 LearningRate 0.0374 Epoch: 23 Global Step: 59620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:06,815-Speed 12715.62 samples/sec Loss 5.0041 LearningRate 0.0373 Epoch: 23 Global Step: 59630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:08,391-Speed 13002.06 samples/sec Loss 5.0075 LearningRate 0.0373 Epoch: 23 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:09,961-Speed 13051.79 samples/sec Loss 5.0373 LearningRate 0.0373 Epoch: 23 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:11,531-Speed 13048.02 samples/sec Loss 5.1247 LearningRate 0.0373 Epoch: 23 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:13,120-Speed 12896.12 samples/sec Loss 4.9322 LearningRate 0.0373 Epoch: 23 Global Step: 59670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:14,673-Speed 13198.57 samples/sec Loss 5.0203 LearningRate 0.0373 Epoch: 23 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:16,249-Speed 12992.71 samples/sec Loss 5.0840 LearningRate 0.0372 Epoch: 23 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:17,831-Speed 12955.93 samples/sec Loss 5.0471 LearningRate 0.0372 Epoch: 23 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:19,397-Speed 13092.23 samples/sec Loss 5.0014 LearningRate 0.0372 Epoch: 23 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:20,974-Speed 12993.02 samples/sec Loss 5.0045 LearningRate 0.0372 Epoch: 23 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:22,535-Speed 13120.59 samples/sec Loss 4.9553 LearningRate 0.0372 Epoch: 23 Global Step: 59730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:24,098-Speed 13112.21 samples/sec Loss 4.9780 LearningRate 0.0372 Epoch: 23 Global Step: 59740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:25,681-Speed 12950.62 samples/sec Loss 4.9665 LearningRate 0.0371 Epoch: 23 Global Step: 59750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:27,244-Speed 13108.08 samples/sec Loss 5.0183 LearningRate 0.0371 Epoch: 23 Global Step: 59760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:28,845-Speed 12806.41 samples/sec Loss 5.0081 LearningRate 0.0371 Epoch: 23 Global Step: 59770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:30,421-Speed 12998.53 samples/sec Loss 5.0548 LearningRate 0.0371 Epoch: 23 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:31,998-Speed 12993.19 samples/sec Loss 5.0679 LearningRate 0.0371 Epoch: 23 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:33,564-Speed 13086.09 samples/sec Loss 5.0050 LearningRate 0.0370 Epoch: 23 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:35,153-Speed 12896.62 samples/sec Loss 4.9555 LearningRate 0.0370 Epoch: 23 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:36,747-Speed 12852.80 samples/sec Loss 5.1225 LearningRate 0.0370 Epoch: 23 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:38,305-Speed 13158.30 samples/sec Loss 5.0186 LearningRate 0.0370 Epoch: 23 Global Step: 59830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:39,887-Speed 12947.53 samples/sec Loss 5.0000 LearningRate 0.0370 Epoch: 23 Global Step: 59840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:41,458-Speed 13048.52 samples/sec Loss 4.9898 LearningRate 0.0370 Epoch: 23 Global Step: 59850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:43,028-Speed 13050.86 samples/sec Loss 4.9975 LearningRate 0.0369 Epoch: 23 Global Step: 59860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:44,597-Speed 13062.72 samples/sec Loss 5.0790 LearningRate 0.0369 Epoch: 23 Global Step: 59870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:46,172-Speed 13008.56 samples/sec Loss 5.0896 LearningRate 0.0369 Epoch: 23 Global Step: 59880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:47,725-Speed 13195.18 samples/sec Loss 4.9330 LearningRate 0.0369 Epoch: 23 Global Step: 59890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:49,301-Speed 12998.63 samples/sec Loss 5.0455 LearningRate 0.0369 Epoch: 23 Global Step: 59900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:50,871-Speed 13048.89 samples/sec Loss 5.0280 LearningRate 0.0368 Epoch: 23 Global Step: 59910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:52,446-Speed 13009.29 samples/sec Loss 5.0001 LearningRate 0.0368 Epoch: 23 Global Step: 59920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:37:54,003-Speed 13166.27 samples/sec Loss 5.0580 LearningRate 0.0368 Epoch: 23 Global Step: 59930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:55,582-Speed 12979.79 samples/sec Loss 5.0816 LearningRate 0.0368 Epoch: 23 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:57,157-Speed 13005.19 samples/sec Loss 5.0738 LearningRate 0.0368 Epoch: 23 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:37:58,712-Speed 13181.33 samples/sec Loss 5.1169 LearningRate 0.0368 Epoch: 23 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:38:00,314-Speed 12784.76 samples/sec Loss 5.1235 LearningRate 0.0367 Epoch: 23 Global Step: 59970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:38:01,876-Speed 13117.91 samples/sec Loss 5.1203 LearningRate 0.0367 Epoch: 23 Global Step: 59980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:38:03,445-Speed 13065.08 samples/sec Loss 5.1189 LearningRate 0.0367 Epoch: 23 Global Step: 59990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:38:05,038-Speed 12858.49 samples/sec Loss 5.0292 LearningRate 0.0367 Epoch: 23 Global Step: 60000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:38:27,196-[lfw][60000]XNorm: 8.864315 Training: 2022-01-14 16:38:27,197-[lfw][60000]Accuracy-Flip: 0.99617+-0.00334 Training: 2022-01-14 16:38:27,197-[lfw][60000]Accuracy-Highest: 0.99617 Training: 2022-01-14 16:38:52,918-[cfp_fp][60000]XNorm: 7.479175 Training: 2022-01-14 16:38:52,919-[cfp_fp][60000]Accuracy-Flip: 0.96471+-0.01161 Training: 2022-01-14 16:38:52,920-[cfp_fp][60000]Accuracy-Highest: 0.96471 Training: 2022-01-14 16:39:14,730-[agedb_30][60000]XNorm: 8.588790 Training: 2022-01-14 16:39:14,731-[agedb_30][60000]Accuracy-Flip: 0.96450+-0.00730 Training: 2022-01-14 16:39:14,732-[agedb_30][60000]Accuracy-Highest: 0.96700 Training: 2022-01-14 16:39:16,299-Speed 287.40 samples/sec Loss 5.1065 LearningRate 0.0367 Epoch: 23 Global Step: 60010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:17,860-Speed 13125.61 samples/sec Loss 5.0283 LearningRate 0.0367 Epoch: 23 Global Step: 60020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:19,416-Speed 13169.87 samples/sec Loss 5.0681 LearningRate 0.0366 Epoch: 23 Global Step: 60030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:21,017-Speed 12797.55 samples/sec Loss 5.0416 LearningRate 0.0366 Epoch: 23 Global Step: 60040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:22,598-Speed 12980.21 samples/sec Loss 5.1033 LearningRate 0.0366 Epoch: 23 Global Step: 60050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:24,184-Speed 12916.62 samples/sec Loss 5.0653 LearningRate 0.0366 Epoch: 23 Global Step: 60060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:25,749-Speed 13098.24 samples/sec Loss 5.0367 LearningRate 0.0366 Epoch: 23 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:39:27,321-Speed 13033.41 samples/sec Loss 5.0350 LearningRate 0.0365 Epoch: 23 Global Step: 60080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:39:28,880-Speed 13148.98 samples/sec Loss 5.1157 LearningRate 0.0365 Epoch: 23 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:39:30,441-Speed 13130.33 samples/sec Loss 5.1331 LearningRate 0.0365 Epoch: 23 Global Step: 60100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:32,007-Speed 13084.84 samples/sec Loss 5.0730 LearningRate 0.0365 Epoch: 23 Global Step: 60110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:33,622-Speed 12703.59 samples/sec Loss 5.0000 LearningRate 0.0365 Epoch: 23 Global Step: 60120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:35,173-Speed 13212.22 samples/sec Loss 5.0618 LearningRate 0.0365 Epoch: 23 Global Step: 60130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:36,742-Speed 13062.05 samples/sec Loss 5.0157 LearningRate 0.0364 Epoch: 23 Global Step: 60140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:38,315-Speed 13030.55 samples/sec Loss 4.9499 LearningRate 0.0364 Epoch: 23 Global Step: 60150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:39,884-Speed 13062.55 samples/sec Loss 5.0305 LearningRate 0.0364 Epoch: 23 Global Step: 60160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:41,452-Speed 13060.96 samples/sec Loss 4.9602 LearningRate 0.0364 Epoch: 23 Global Step: 60170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:43,009-Speed 13160.07 samples/sec Loss 5.0402 LearningRate 0.0364 Epoch: 23 Global Step: 60180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:44,582-Speed 13035.43 samples/sec Loss 5.0414 LearningRate 0.0363 Epoch: 23 Global Step: 60190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:46,170-Speed 12898.61 samples/sec Loss 5.0999 LearningRate 0.0363 Epoch: 23 Global Step: 60200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:39:47,732-Speed 13113.48 samples/sec Loss 5.0707 LearningRate 0.0363 Epoch: 23 Global Step: 60210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:49,304-Speed 13037.88 samples/sec Loss 5.0923 LearningRate 0.0363 Epoch: 23 Global Step: 60220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:50,859-Speed 13180.71 samples/sec Loss 5.0560 LearningRate 0.0363 Epoch: 23 Global Step: 60230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:52,427-Speed 13064.93 samples/sec Loss 5.1126 LearningRate 0.0363 Epoch: 23 Global Step: 60240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:54,006-Speed 12975.59 samples/sec Loss 5.0374 LearningRate 0.0362 Epoch: 23 Global Step: 60250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:55,561-Speed 13191.58 samples/sec Loss 5.0904 LearningRate 0.0362 Epoch: 23 Global Step: 60260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:57,130-Speed 13057.88 samples/sec Loss 5.1364 LearningRate 0.0362 Epoch: 23 Global Step: 60270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:39:58,697-Speed 13077.32 samples/sec Loss 5.0663 LearningRate 0.0362 Epoch: 23 Global Step: 60280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:00,269-Speed 13034.59 samples/sec Loss 5.0488 LearningRate 0.0362 Epoch: 23 Global Step: 60290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:01,852-Speed 12942.29 samples/sec Loss 5.1017 LearningRate 0.0362 Epoch: 23 Global Step: 60300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:03,406-Speed 13188.96 samples/sec Loss 5.0462 LearningRate 0.0361 Epoch: 23 Global Step: 60310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:04,985-Speed 12972.61 samples/sec Loss 5.1071 LearningRate 0.0361 Epoch: 23 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:06,553-Speed 13068.59 samples/sec Loss 5.0081 LearningRate 0.0361 Epoch: 23 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:08,129-Speed 13002.87 samples/sec Loss 5.0511 LearningRate 0.0361 Epoch: 23 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:09,715-Speed 12925.31 samples/sec Loss 5.0940 LearningRate 0.0361 Epoch: 23 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:11,281-Speed 13079.86 samples/sec Loss 5.0701 LearningRate 0.0360 Epoch: 23 Global Step: 60360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:12,849-Speed 13067.82 samples/sec Loss 5.0830 LearningRate 0.0360 Epoch: 23 Global Step: 60370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:14,442-Speed 12868.60 samples/sec Loss 5.0487 LearningRate 0.0360 Epoch: 23 Global Step: 60380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:16,010-Speed 13067.49 samples/sec Loss 5.1344 LearningRate 0.0360 Epoch: 23 Global Step: 60390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:17,577-Speed 13074.65 samples/sec Loss 5.1861 LearningRate 0.0360 Epoch: 23 Global Step: 60400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:19,162-Speed 12924.92 samples/sec Loss 5.0190 LearningRate 0.0360 Epoch: 23 Global Step: 60410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:20,723-Speed 13132.02 samples/sec Loss 4.9864 LearningRate 0.0359 Epoch: 23 Global Step: 60420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:22,308-Speed 12930.75 samples/sec Loss 5.0744 LearningRate 0.0359 Epoch: 23 Global Step: 60430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:23,879-Speed 13048.13 samples/sec Loss 5.0919 LearningRate 0.0359 Epoch: 23 Global Step: 60440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:25,454-Speed 13007.41 samples/sec Loss 5.0967 LearningRate 0.0359 Epoch: 23 Global Step: 60450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:27,016-Speed 13122.16 samples/sec Loss 5.1210 LearningRate 0.0359 Epoch: 23 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:28,574-Speed 13147.91 samples/sec Loss 5.0213 LearningRate 0.0359 Epoch: 23 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:30,152-Speed 12989.82 samples/sec Loss 5.1420 LearningRate 0.0358 Epoch: 23 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:31,727-Speed 13005.98 samples/sec Loss 5.0849 LearningRate 0.0358 Epoch: 23 Global Step: 60490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:33,336-Speed 12735.84 samples/sec Loss 5.0863 LearningRate 0.0358 Epoch: 23 Global Step: 60500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:34,936-Speed 12811.85 samples/sec Loss 5.0944 LearningRate 0.0358 Epoch: 23 Global Step: 60510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:36,519-Speed 12945.21 samples/sec Loss 5.1676 LearningRate 0.0358 Epoch: 23 Global Step: 60520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:38,067-Speed 13232.87 samples/sec Loss 5.1227 LearningRate 0.0357 Epoch: 23 Global Step: 60530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:39,618-Speed 13209.47 samples/sec Loss 5.0425 LearningRate 0.0357 Epoch: 23 Global Step: 60540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:41,205-Speed 12912.69 samples/sec Loss 5.0997 LearningRate 0.0357 Epoch: 23 Global Step: 60550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:42,783-Speed 12990.60 samples/sec Loss 5.0684 LearningRate 0.0357 Epoch: 23 Global Step: 60560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:44,357-Speed 13014.19 samples/sec Loss 5.0888 LearningRate 0.0357 Epoch: 23 Global Step: 60570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:45,935-Speed 12992.27 samples/sec Loss 5.1915 LearningRate 0.0357 Epoch: 23 Global Step: 60580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:47,502-Speed 13072.22 samples/sec Loss 5.0581 LearningRate 0.0356 Epoch: 23 Global Step: 60590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:49,062-Speed 13134.81 samples/sec Loss 5.1261 LearningRate 0.0356 Epoch: 23 Global Step: 60600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:50,635-Speed 13027.96 samples/sec Loss 5.0911 LearningRate 0.0356 Epoch: 23 Global Step: 60610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:52,217-Speed 12955.81 samples/sec Loss 5.1217 LearningRate 0.0356 Epoch: 23 Global Step: 60620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:40:53,772-Speed 13172.85 samples/sec Loss 5.2209 LearningRate 0.0356 Epoch: 23 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:55,332-Speed 13142.13 samples/sec Loss 5.1222 LearningRate 0.0356 Epoch: 23 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:56,924-Speed 12870.88 samples/sec Loss 5.1346 LearningRate 0.0355 Epoch: 23 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:40:58,505-Speed 12955.38 samples/sec Loss 5.0329 LearningRate 0.0355 Epoch: 23 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:00,072-Speed 13083.42 samples/sec Loss 5.0614 LearningRate 0.0355 Epoch: 23 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:01,652-Speed 12963.54 samples/sec Loss 5.0810 LearningRate 0.0355 Epoch: 23 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:03,215-Speed 13105.82 samples/sec Loss 5.1010 LearningRate 0.0355 Epoch: 23 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:17,542-Speed 1429.66 samples/sec Loss 4.8326 LearningRate 0.0355 Epoch: 24 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:19,135-Speed 12866.93 samples/sec Loss 4.3337 LearningRate 0.0354 Epoch: 24 Global Step: 60710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:20,772-Speed 12523.78 samples/sec Loss 4.4121 LearningRate 0.0354 Epoch: 24 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:22,338-Speed 13086.91 samples/sec Loss 4.4070 LearningRate 0.0354 Epoch: 24 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:41:23,976-Speed 12506.14 samples/sec Loss 4.2958 LearningRate 0.0354 Epoch: 24 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:41:25,558-Speed 12954.84 samples/sec Loss 4.3658 LearningRate 0.0354 Epoch: 24 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:27,117-Speed 13145.63 samples/sec Loss 4.4191 LearningRate 0.0353 Epoch: 24 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:28,733-Speed 12678.25 samples/sec Loss 4.3978 LearningRate 0.0353 Epoch: 24 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:30,321-Speed 12907.93 samples/sec Loss 4.4586 LearningRate 0.0353 Epoch: 24 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:31,879-Speed 13154.94 samples/sec Loss 4.3723 LearningRate 0.0353 Epoch: 24 Global Step: 60790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:33,529-Speed 12421.94 samples/sec Loss 4.4125 LearningRate 0.0353 Epoch: 24 Global Step: 60800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:35,113-Speed 12945.75 samples/sec Loss 4.4792 LearningRate 0.0353 Epoch: 24 Global Step: 60810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:36,688-Speed 13010.04 samples/sec Loss 4.4243 LearningRate 0.0352 Epoch: 24 Global Step: 60820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:38,291-Speed 12785.56 samples/sec Loss 4.4026 LearningRate 0.0352 Epoch: 24 Global Step: 60830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:39,880-Speed 12893.16 samples/sec Loss 4.4662 LearningRate 0.0352 Epoch: 24 Global Step: 60840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:41,474-Speed 12859.08 samples/sec Loss 4.5065 LearningRate 0.0352 Epoch: 24 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:41:43,032-Speed 13147.94 samples/sec Loss 4.4427 LearningRate 0.0352 Epoch: 24 Global Step: 60860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:44,593-Speed 13131.58 samples/sec Loss 4.4897 LearningRate 0.0352 Epoch: 24 Global Step: 60870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:46,157-Speed 13104.56 samples/sec Loss 4.4524 LearningRate 0.0351 Epoch: 24 Global Step: 60880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:47,725-Speed 13066.91 samples/sec Loss 4.5102 LearningRate 0.0351 Epoch: 24 Global Step: 60890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:49,289-Speed 13100.68 samples/sec Loss 4.5197 LearningRate 0.0351 Epoch: 24 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:50,872-Speed 12939.27 samples/sec Loss 4.3909 LearningRate 0.0351 Epoch: 24 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:52,457-Speed 12933.10 samples/sec Loss 4.4796 LearningRate 0.0351 Epoch: 24 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:54,042-Speed 12923.59 samples/sec Loss 4.4798 LearningRate 0.0350 Epoch: 24 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:55,619-Speed 12999.29 samples/sec Loss 4.4566 LearningRate 0.0350 Epoch: 24 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:57,167-Speed 13230.47 samples/sec Loss 4.4546 LearningRate 0.0350 Epoch: 24 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:41:58,735-Speed 13073.66 samples/sec Loss 4.5350 LearningRate 0.0350 Epoch: 24 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:42:00,317-Speed 12969.25 samples/sec Loss 4.4188 LearningRate 0.0350 Epoch: 24 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:42:01,887-Speed 13048.32 samples/sec Loss 4.4543 LearningRate 0.0350 Epoch: 24 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:42:03,424-Speed 13332.15 samples/sec Loss 4.5146 LearningRate 0.0349 Epoch: 24 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:04,993-Speed 13060.56 samples/sec Loss 4.4581 LearningRate 0.0349 Epoch: 24 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:06,567-Speed 13022.20 samples/sec Loss 4.5897 LearningRate 0.0349 Epoch: 24 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:08,138-Speed 13041.58 samples/sec Loss 4.5041 LearningRate 0.0349 Epoch: 24 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:09,739-Speed 12796.96 samples/sec Loss 4.5685 LearningRate 0.0349 Epoch: 24 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:11,310-Speed 13046.91 samples/sec Loss 4.5465 LearningRate 0.0349 Epoch: 24 Global Step: 61040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:12,892-Speed 12948.34 samples/sec Loss 4.5556 LearningRate 0.0348 Epoch: 24 Global Step: 61050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:14,479-Speed 12912.52 samples/sec Loss 4.5683 LearningRate 0.0348 Epoch: 24 Global Step: 61060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:16,069-Speed 12891.17 samples/sec Loss 4.5631 LearningRate 0.0348 Epoch: 24 Global Step: 61070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:17,665-Speed 12834.95 samples/sec Loss 4.5307 LearningRate 0.0348 Epoch: 24 Global Step: 61080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:19,215-Speed 13222.91 samples/sec Loss 4.5729 LearningRate 0.0348 Epoch: 24 Global Step: 61090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:20,778-Speed 13104.79 samples/sec Loss 4.5957 LearningRate 0.0348 Epoch: 24 Global Step: 61100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:22,338-Speed 13139.01 samples/sec Loss 4.5928 LearningRate 0.0347 Epoch: 24 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:23,932-Speed 12851.60 samples/sec Loss 4.5767 LearningRate 0.0347 Epoch: 24 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:25,520-Speed 12906.92 samples/sec Loss 4.5777 LearningRate 0.0347 Epoch: 24 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:27,073-Speed 13191.50 samples/sec Loss 4.5966 LearningRate 0.0347 Epoch: 24 Global Step: 61140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:28,649-Speed 13001.37 samples/sec Loss 4.5515 LearningRate 0.0347 Epoch: 24 Global Step: 61150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:30,215-Speed 13086.95 samples/sec Loss 4.5417 LearningRate 0.0346 Epoch: 24 Global Step: 61160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:31,791-Speed 13001.29 samples/sec Loss 4.6049 LearningRate 0.0346 Epoch: 24 Global Step: 61170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:33,370-Speed 12976.79 samples/sec Loss 4.5716 LearningRate 0.0346 Epoch: 24 Global Step: 61180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:34,957-Speed 12907.97 samples/sec Loss 4.5529 LearningRate 0.0346 Epoch: 24 Global Step: 61190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:36,515-Speed 13157.01 samples/sec Loss 4.5988 LearningRate 0.0346 Epoch: 24 Global Step: 61200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:38,082-Speed 13070.00 samples/sec Loss 4.5972 LearningRate 0.0346 Epoch: 24 Global Step: 61210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:39,657-Speed 13010.03 samples/sec Loss 4.6974 LearningRate 0.0345 Epoch: 24 Global Step: 61220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:42:41,240-Speed 12947.55 samples/sec Loss 4.5417 LearningRate 0.0345 Epoch: 24 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:42,801-Speed 13127.57 samples/sec Loss 4.7085 LearningRate 0.0345 Epoch: 24 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:44,383-Speed 12967.89 samples/sec Loss 4.6537 LearningRate 0.0345 Epoch: 24 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:45,947-Speed 13111.60 samples/sec Loss 4.5984 LearningRate 0.0345 Epoch: 24 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:47,516-Speed 13051.02 samples/sec Loss 4.7272 LearningRate 0.0345 Epoch: 24 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:49,094-Speed 12983.37 samples/sec Loss 4.6985 LearningRate 0.0344 Epoch: 24 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:50,674-Speed 12973.95 samples/sec Loss 4.7121 LearningRate 0.0344 Epoch: 24 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:52,263-Speed 12895.02 samples/sec Loss 4.6274 LearningRate 0.0344 Epoch: 24 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:53,846-Speed 12937.01 samples/sec Loss 4.7315 LearningRate 0.0344 Epoch: 24 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:55,442-Speed 12842.56 samples/sec Loss 4.5745 LearningRate 0.0344 Epoch: 24 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:57,013-Speed 13041.25 samples/sec Loss 4.6746 LearningRate 0.0344 Epoch: 24 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:42:58,587-Speed 13021.10 samples/sec Loss 4.7097 LearningRate 0.0343 Epoch: 24 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:43:00,143-Speed 13169.95 samples/sec Loss 4.6449 LearningRate 0.0343 Epoch: 24 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:43:01,710-Speed 13077.98 samples/sec Loss 4.7169 LearningRate 0.0343 Epoch: 24 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:43:03,267-Speed 13154.32 samples/sec Loss 4.7561 LearningRate 0.0343 Epoch: 24 Global Step: 61370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:04,857-Speed 12895.30 samples/sec Loss 4.7078 LearningRate 0.0343 Epoch: 24 Global Step: 61380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:06,417-Speed 13133.06 samples/sec Loss 4.7025 LearningRate 0.0343 Epoch: 24 Global Step: 61390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:08,014-Speed 12827.05 samples/sec Loss 4.6875 LearningRate 0.0342 Epoch: 24 Global Step: 61400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:09,571-Speed 13163.39 samples/sec Loss 4.6516 LearningRate 0.0342 Epoch: 24 Global Step: 61410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:11,147-Speed 13002.92 samples/sec Loss 4.7423 LearningRate 0.0342 Epoch: 24 Global Step: 61420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:12,713-Speed 13084.51 samples/sec Loss 4.7601 LearningRate 0.0342 Epoch: 24 Global Step: 61430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:14,277-Speed 13098.46 samples/sec Loss 4.7450 LearningRate 0.0342 Epoch: 24 Global Step: 61440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:15,893-Speed 12678.52 samples/sec Loss 4.7163 LearningRate 0.0341 Epoch: 24 Global Step: 61450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:17,471-Speed 13004.74 samples/sec Loss 4.6720 LearningRate 0.0341 Epoch: 24 Global Step: 61460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:19,030-Speed 13144.75 samples/sec Loss 4.7519 LearningRate 0.0341 Epoch: 24 Global Step: 61470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:20,604-Speed 13013.78 samples/sec Loss 4.6388 LearningRate 0.0341 Epoch: 24 Global Step: 61480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:22,177-Speed 13030.47 samples/sec Loss 4.7625 LearningRate 0.0341 Epoch: 24 Global Step: 61490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:23,764-Speed 12911.32 samples/sec Loss 4.7396 LearningRate 0.0341 Epoch: 24 Global Step: 61500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:25,324-Speed 13137.23 samples/sec Loss 4.7191 LearningRate 0.0340 Epoch: 24 Global Step: 61510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:26,871-Speed 13243.55 samples/sec Loss 4.7309 LearningRate 0.0340 Epoch: 24 Global Step: 61520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:28,464-Speed 12866.13 samples/sec Loss 4.7614 LearningRate 0.0340 Epoch: 24 Global Step: 61530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:30,035-Speed 13051.69 samples/sec Loss 4.7343 LearningRate 0.0340 Epoch: 24 Global Step: 61540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:31,614-Speed 12972.95 samples/sec Loss 4.8194 LearningRate 0.0340 Epoch: 24 Global Step: 61550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:33,230-Speed 12682.36 samples/sec Loss 4.7553 LearningRate 0.0340 Epoch: 24 Global Step: 61560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:34,782-Speed 13199.86 samples/sec Loss 4.7579 LearningRate 0.0339 Epoch: 24 Global Step: 61570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:36,367-Speed 12929.55 samples/sec Loss 4.7852 LearningRate 0.0339 Epoch: 24 Global Step: 61580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:37,926-Speed 13139.83 samples/sec Loss 4.7007 LearningRate 0.0339 Epoch: 24 Global Step: 61590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:39,484-Speed 13153.41 samples/sec Loss 4.7608 LearningRate 0.0339 Epoch: 24 Global Step: 61600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:43:41,082-Speed 12822.33 samples/sec Loss 4.7283 LearningRate 0.0339 Epoch: 24 Global Step: 61610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:42,687-Speed 12764.34 samples/sec Loss 4.8115 LearningRate 0.0339 Epoch: 24 Global Step: 61620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:44,254-Speed 13082.17 samples/sec Loss 4.8378 LearningRate 0.0338 Epoch: 24 Global Step: 61630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:45,835-Speed 12965.12 samples/sec Loss 4.7545 LearningRate 0.0338 Epoch: 24 Global Step: 61640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:47,401-Speed 13081.11 samples/sec Loss 4.7197 LearningRate 0.0338 Epoch: 24 Global Step: 61650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:48,968-Speed 13074.91 samples/sec Loss 4.7745 LearningRate 0.0338 Epoch: 24 Global Step: 61660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:50,562-Speed 12861.94 samples/sec Loss 4.8428 LearningRate 0.0338 Epoch: 24 Global Step: 61670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:52,125-Speed 13106.00 samples/sec Loss 4.8369 LearningRate 0.0338 Epoch: 24 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:53,703-Speed 12986.33 samples/sec Loss 4.7468 LearningRate 0.0337 Epoch: 24 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:55,291-Speed 12900.31 samples/sec Loss 4.7856 LearningRate 0.0337 Epoch: 24 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:43:56,832-Speed 13301.82 samples/sec Loss 4.8058 LearningRate 0.0337 Epoch: 24 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:43:58,417-Speed 12922.81 samples/sec Loss 4.7945 LearningRate 0.0337 Epoch: 24 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:43:59,987-Speed 13052.65 samples/sec Loss 4.8233 LearningRate 0.0337 Epoch: 24 Global Step: 61730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:01,573-Speed 12928.23 samples/sec Loss 4.7690 LearningRate 0.0337 Epoch: 24 Global Step: 61740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:03,144-Speed 13039.83 samples/sec Loss 4.8327 LearningRate 0.0336 Epoch: 24 Global Step: 61750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:04,713-Speed 13058.81 samples/sec Loss 4.7601 LearningRate 0.0336 Epoch: 24 Global Step: 61760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:06,286-Speed 13030.73 samples/sec Loss 4.8052 LearningRate 0.0336 Epoch: 24 Global Step: 61770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:07,859-Speed 13024.28 samples/sec Loss 4.7734 LearningRate 0.0336 Epoch: 24 Global Step: 61780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:09,438-Speed 12975.95 samples/sec Loss 4.8079 LearningRate 0.0336 Epoch: 24 Global Step: 61790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:10,996-Speed 13154.48 samples/sec Loss 4.8095 LearningRate 0.0335 Epoch: 24 Global Step: 61800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:12,560-Speed 13100.13 samples/sec Loss 4.8395 LearningRate 0.0335 Epoch: 24 Global Step: 61810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:14,146-Speed 12918.93 samples/sec Loss 4.7993 LearningRate 0.0335 Epoch: 24 Global Step: 61820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:15,719-Speed 13031.38 samples/sec Loss 4.8217 LearningRate 0.0335 Epoch: 24 Global Step: 61830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:17,282-Speed 13102.01 samples/sec Loss 4.8689 LearningRate 0.0335 Epoch: 24 Global Step: 61840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:18,876-Speed 12854.76 samples/sec Loss 4.8398 LearningRate 0.0335 Epoch: 24 Global Step: 61850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:20,472-Speed 12841.82 samples/sec Loss 4.8442 LearningRate 0.0334 Epoch: 24 Global Step: 61860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:22,044-Speed 13031.36 samples/sec Loss 4.7841 LearningRate 0.0334 Epoch: 24 Global Step: 61870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:23,633-Speed 12923.34 samples/sec Loss 4.8027 LearningRate 0.0334 Epoch: 24 Global Step: 61880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:25,191-Speed 13160.61 samples/sec Loss 4.8109 LearningRate 0.0334 Epoch: 24 Global Step: 61890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:44:26,742-Speed 13206.42 samples/sec Loss 4.8649 LearningRate 0.0334 Epoch: 24 Global Step: 61900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:28,316-Speed 13022.13 samples/sec Loss 4.8686 LearningRate 0.0334 Epoch: 24 Global Step: 61910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:29,901-Speed 12930.60 samples/sec Loss 4.7627 LearningRate 0.0333 Epoch: 24 Global Step: 61920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:31,468-Speed 13068.89 samples/sec Loss 4.7480 LearningRate 0.0333 Epoch: 24 Global Step: 61930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:33,048-Speed 12969.93 samples/sec Loss 4.8717 LearningRate 0.0333 Epoch: 24 Global Step: 61940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:34,632-Speed 12934.66 samples/sec Loss 4.8850 LearningRate 0.0333 Epoch: 24 Global Step: 61950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:36,216-Speed 12941.41 samples/sec Loss 4.8525 LearningRate 0.0333 Epoch: 24 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:37,791-Speed 13001.79 samples/sec Loss 4.8348 LearningRate 0.0333 Epoch: 24 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:39,340-Speed 13228.17 samples/sec Loss 4.8179 LearningRate 0.0332 Epoch: 24 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:40,911-Speed 13043.63 samples/sec Loss 4.9214 LearningRate 0.0332 Epoch: 24 Global Step: 61990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:44:42,505-Speed 12857.81 samples/sec Loss 4.8592 LearningRate 0.0332 Epoch: 24 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:44,083-Speed 12984.79 samples/sec Loss 4.7880 LearningRate 0.0332 Epoch: 24 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:45,663-Speed 12971.85 samples/sec Loss 4.9628 LearningRate 0.0332 Epoch: 24 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:47,230-Speed 13077.93 samples/sec Loss 4.8880 LearningRate 0.0332 Epoch: 24 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:48,809-Speed 12977.60 samples/sec Loss 4.8448 LearningRate 0.0331 Epoch: 24 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:50,411-Speed 12795.04 samples/sec Loss 4.8369 LearningRate 0.0331 Epoch: 24 Global Step: 62050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:51,978-Speed 13076.24 samples/sec Loss 4.8262 LearningRate 0.0331 Epoch: 24 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:53,526-Speed 13231.57 samples/sec Loss 4.8273 LearningRate 0.0331 Epoch: 24 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:55,106-Speed 12973.71 samples/sec Loss 4.8306 LearningRate 0.0331 Epoch: 24 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:56,692-Speed 12915.52 samples/sec Loss 4.9570 LearningRate 0.0331 Epoch: 24 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:44:58,254-Speed 13121.87 samples/sec Loss 4.9230 LearningRate 0.0330 Epoch: 24 Global Step: 62100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:44:59,817-Speed 13106.57 samples/sec Loss 4.9145 LearningRate 0.0330 Epoch: 24 Global Step: 62110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:45:01,409-Speed 12878.74 samples/sec Loss 4.9324 LearningRate 0.0330 Epoch: 24 Global Step: 62120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:45:02,980-Speed 13043.34 samples/sec Loss 4.8552 LearningRate 0.0330 Epoch: 24 Global Step: 62130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:45:04,555-Speed 13005.03 samples/sec Loss 4.8795 LearningRate 0.0330 Epoch: 24 Global Step: 62140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:45:06,130-Speed 13016.86 samples/sec Loss 4.7952 LearningRate 0.0330 Epoch: 24 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:07,704-Speed 13015.86 samples/sec Loss 4.9270 LearningRate 0.0329 Epoch: 24 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:09,297-Speed 12862.01 samples/sec Loss 4.9070 LearningRate 0.0329 Epoch: 24 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:10,841-Speed 13279.22 samples/sec Loss 4.8525 LearningRate 0.0329 Epoch: 24 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:12,418-Speed 12992.67 samples/sec Loss 4.8432 LearningRate 0.0329 Epoch: 24 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:13,985-Speed 13067.82 samples/sec Loss 5.0035 LearningRate 0.0329 Epoch: 24 Global Step: 62200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:15,555-Speed 13058.85 samples/sec Loss 4.9384 LearningRate 0.0329 Epoch: 24 Global Step: 62210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:17,110-Speed 13176.80 samples/sec Loss 4.8362 LearningRate 0.0328 Epoch: 24 Global Step: 62220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:18,682-Speed 13029.27 samples/sec Loss 4.9099 LearningRate 0.0328 Epoch: 24 Global Step: 62230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:20,260-Speed 12987.96 samples/sec Loss 4.9453 LearningRate 0.0328 Epoch: 24 Global Step: 62240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:21,829-Speed 13054.36 samples/sec Loss 4.9083 LearningRate 0.0328 Epoch: 24 Global Step: 62250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:23,398-Speed 13062.71 samples/sec Loss 4.8180 LearningRate 0.0328 Epoch: 24 Global Step: 62260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:24,974-Speed 13002.00 samples/sec Loss 4.8275 LearningRate 0.0328 Epoch: 24 Global Step: 62270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:26,548-Speed 13024.85 samples/sec Loss 4.9010 LearningRate 0.0327 Epoch: 24 Global Step: 62280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:28,119-Speed 13043.77 samples/sec Loss 4.9131 LearningRate 0.0327 Epoch: 24 Global Step: 62290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:29,687-Speed 13065.78 samples/sec Loss 4.8864 LearningRate 0.0327 Epoch: 24 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:31,227-Speed 13304.79 samples/sec Loss 4.9020 LearningRate 0.0327 Epoch: 24 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:32,805-Speed 12985.48 samples/sec Loss 4.9052 LearningRate 0.0327 Epoch: 24 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:34,375-Speed 13045.46 samples/sec Loss 4.8822 LearningRate 0.0327 Epoch: 24 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:35,959-Speed 12945.12 samples/sec Loss 4.9156 LearningRate 0.0326 Epoch: 24 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:37,523-Speed 13100.44 samples/sec Loss 4.9338 LearningRate 0.0326 Epoch: 24 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:39,082-Speed 13139.63 samples/sec Loss 5.0117 LearningRate 0.0326 Epoch: 24 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:40,686-Speed 12775.64 samples/sec Loss 4.8769 LearningRate 0.0326 Epoch: 24 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:42,279-Speed 12862.80 samples/sec Loss 5.0296 LearningRate 0.0326 Epoch: 24 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:43,859-Speed 12969.24 samples/sec Loss 4.9425 LearningRate 0.0326 Epoch: 24 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:45,418-Speed 13145.09 samples/sec Loss 4.8732 LearningRate 0.0325 Epoch: 24 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:46,997-Speed 12977.82 samples/sec Loss 4.8737 LearningRate 0.0325 Epoch: 24 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:45:48,553-Speed 13171.48 samples/sec Loss 4.9792 LearningRate 0.0325 Epoch: 24 Global Step: 62420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:50,132-Speed 12976.48 samples/sec Loss 4.8778 LearningRate 0.0325 Epoch: 24 Global Step: 62430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:51,705-Speed 13026.28 samples/sec Loss 4.9464 LearningRate 0.0325 Epoch: 24 Global Step: 62440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:53,271-Speed 13083.01 samples/sec Loss 4.8765 LearningRate 0.0324 Epoch: 24 Global Step: 62450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:54,840-Speed 13055.65 samples/sec Loss 4.8565 LearningRate 0.0324 Epoch: 24 Global Step: 62460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:56,405-Speed 13093.27 samples/sec Loss 4.8907 LearningRate 0.0324 Epoch: 24 Global Step: 62470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:58,014-Speed 12734.74 samples/sec Loss 4.8951 LearningRate 0.0324 Epoch: 24 Global Step: 62480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:45:59,594-Speed 12968.66 samples/sec Loss 4.9203 LearningRate 0.0324 Epoch: 24 Global Step: 62490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:01,159-Speed 13101.21 samples/sec Loss 4.9911 LearningRate 0.0324 Epoch: 24 Global Step: 62500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:02,744-Speed 12920.60 samples/sec Loss 4.9242 LearningRate 0.0323 Epoch: 24 Global Step: 62510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:04,340-Speed 12845.91 samples/sec Loss 4.8497 LearningRate 0.0323 Epoch: 24 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:05,935-Speed 12842.97 samples/sec Loss 4.9338 LearningRate 0.0323 Epoch: 24 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:07,518-Speed 12945.30 samples/sec Loss 4.8789 LearningRate 0.0323 Epoch: 24 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:09,095-Speed 12990.92 samples/sec Loss 4.9037 LearningRate 0.0323 Epoch: 24 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:10,679-Speed 12937.47 samples/sec Loss 4.8992 LearningRate 0.0323 Epoch: 24 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:12,253-Speed 13022.42 samples/sec Loss 4.9207 LearningRate 0.0322 Epoch: 24 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:13,820-Speed 13070.62 samples/sec Loss 4.9437 LearningRate 0.0322 Epoch: 24 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:15,389-Speed 13064.52 samples/sec Loss 4.8866 LearningRate 0.0322 Epoch: 24 Global Step: 62590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:16,963-Speed 13022.50 samples/sec Loss 4.9462 LearningRate 0.0322 Epoch: 24 Global Step: 62600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:18,532-Speed 13053.90 samples/sec Loss 4.9660 LearningRate 0.0322 Epoch: 24 Global Step: 62610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:20,118-Speed 12930.32 samples/sec Loss 5.0341 LearningRate 0.0322 Epoch: 24 Global Step: 62620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:46:21,675-Speed 13157.77 samples/sec Loss 4.9134 LearningRate 0.0321 Epoch: 24 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:23,243-Speed 13064.54 samples/sec Loss 4.8168 LearningRate 0.0321 Epoch: 24 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:24,831-Speed 12907.18 samples/sec Loss 4.8545 LearningRate 0.0321 Epoch: 24 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:26,401-Speed 13050.07 samples/sec Loss 4.9031 LearningRate 0.0321 Epoch: 24 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:27,987-Speed 12924.99 samples/sec Loss 4.8545 LearningRate 0.0321 Epoch: 24 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:29,547-Speed 13130.98 samples/sec Loss 4.8404 LearningRate 0.0321 Epoch: 24 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:31,101-Speed 13187.08 samples/sec Loss 4.8599 LearningRate 0.0320 Epoch: 24 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:32,679-Speed 12986.00 samples/sec Loss 4.9685 LearningRate 0.0320 Epoch: 24 Global Step: 62700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:34,241-Speed 13117.47 samples/sec Loss 4.9594 LearningRate 0.0320 Epoch: 24 Global Step: 62710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:35,828-Speed 12912.46 samples/sec Loss 4.9398 LearningRate 0.0320 Epoch: 24 Global Step: 62720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:37,391-Speed 13103.47 samples/sec Loss 4.9444 LearningRate 0.0320 Epoch: 24 Global Step: 62730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:38,964-Speed 13029.29 samples/sec Loss 4.9673 LearningRate 0.0320 Epoch: 24 Global Step: 62740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:40,539-Speed 13010.75 samples/sec Loss 4.9364 LearningRate 0.0319 Epoch: 24 Global Step: 62750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:42,106-Speed 13076.38 samples/sec Loss 4.8635 LearningRate 0.0319 Epoch: 24 Global Step: 62760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:43,678-Speed 13031.99 samples/sec Loss 4.8946 LearningRate 0.0319 Epoch: 24 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:45,241-Speed 13111.01 samples/sec Loss 4.8901 LearningRate 0.0319 Epoch: 24 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:46:46,808-Speed 13076.35 samples/sec Loss 4.9386 LearningRate 0.0319 Epoch: 24 Global Step: 62790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:48,382-Speed 13016.92 samples/sec Loss 5.0146 LearningRate 0.0319 Epoch: 24 Global Step: 62800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:49,949-Speed 13080.70 samples/sec Loss 4.9929 LearningRate 0.0318 Epoch: 24 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:51,529-Speed 12963.14 samples/sec Loss 4.9535 LearningRate 0.0318 Epoch: 24 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:53,100-Speed 13044.93 samples/sec Loss 4.9432 LearningRate 0.0318 Epoch: 24 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:54,675-Speed 13010.48 samples/sec Loss 4.9445 LearningRate 0.0318 Epoch: 24 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:56,240-Speed 13090.92 samples/sec Loss 4.9804 LearningRate 0.0318 Epoch: 24 Global Step: 62850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:57,811-Speed 13044.26 samples/sec Loss 4.9770 LearningRate 0.0318 Epoch: 24 Global Step: 62860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:46:59,403-Speed 12875.72 samples/sec Loss 4.8904 LearningRate 0.0317 Epoch: 24 Global Step: 62870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:00,973-Speed 13056.03 samples/sec Loss 4.9805 LearningRate 0.0317 Epoch: 24 Global Step: 62880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:02,544-Speed 13036.17 samples/sec Loss 4.9541 LearningRate 0.0317 Epoch: 24 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:47:04,096-Speed 13203.77 samples/sec Loss 4.9773 LearningRate 0.0317 Epoch: 24 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:47:05,677-Speed 12962.14 samples/sec Loss 4.9490 LearningRate 0.0317 Epoch: 24 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:47:07,269-Speed 12866.26 samples/sec Loss 4.9360 LearningRate 0.0317 Epoch: 24 Global Step: 62920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:08,835-Speed 13088.14 samples/sec Loss 4.9704 LearningRate 0.0317 Epoch: 24 Global Step: 62930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:10,394-Speed 13141.62 samples/sec Loss 4.9940 LearningRate 0.0316 Epoch: 24 Global Step: 62940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:11,960-Speed 13090.15 samples/sec Loss 4.8678 LearningRate 0.0316 Epoch: 24 Global Step: 62950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:13,535-Speed 13003.19 samples/sec Loss 4.9422 LearningRate 0.0316 Epoch: 24 Global Step: 62960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:15,086-Speed 13220.49 samples/sec Loss 4.9555 LearningRate 0.0316 Epoch: 24 Global Step: 62970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:16,640-Speed 13179.68 samples/sec Loss 4.9673 LearningRate 0.0316 Epoch: 24 Global Step: 62980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:18,216-Speed 12998.40 samples/sec Loss 4.9965 LearningRate 0.0316 Epoch: 24 Global Step: 62990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:19,795-Speed 12976.49 samples/sec Loss 4.9984 LearningRate 0.0315 Epoch: 24 Global Step: 63000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:21,387-Speed 12872.94 samples/sec Loss 4.9225 LearningRate 0.0315 Epoch: 24 Global Step: 63010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:22,968-Speed 12960.54 samples/sec Loss 4.9666 LearningRate 0.0315 Epoch: 24 Global Step: 63020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:24,517-Speed 13229.03 samples/sec Loss 5.0553 LearningRate 0.0315 Epoch: 24 Global Step: 63030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:47:26,087-Speed 13052.97 samples/sec Loss 5.0095 LearningRate 0.0315 Epoch: 24 Global Step: 63040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:27,679-Speed 12869.47 samples/sec Loss 4.9001 LearningRate 0.0315 Epoch: 24 Global Step: 63050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:29,255-Speed 13004.43 samples/sec Loss 4.9913 LearningRate 0.0314 Epoch: 24 Global Step: 63060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:30,821-Speed 13086.72 samples/sec Loss 4.9229 LearningRate 0.0314 Epoch: 24 Global Step: 63070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:32,393-Speed 13027.23 samples/sec Loss 4.9742 LearningRate 0.0314 Epoch: 24 Global Step: 63080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:33,967-Speed 13018.09 samples/sec Loss 4.9857 LearningRate 0.0314 Epoch: 24 Global Step: 63090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:35,533-Speed 13090.78 samples/sec Loss 4.9714 LearningRate 0.0314 Epoch: 24 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:37,116-Speed 12949.37 samples/sec Loss 5.0860 LearningRate 0.0314 Epoch: 24 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:38,693-Speed 12993.69 samples/sec Loss 5.0611 LearningRate 0.0313 Epoch: 24 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:40,276-Speed 12946.91 samples/sec Loss 5.0189 LearningRate 0.0313 Epoch: 24 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:41,837-Speed 13120.16 samples/sec Loss 4.9542 LearningRate 0.0313 Epoch: 24 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:43,408-Speed 13047.39 samples/sec Loss 4.9785 LearningRate 0.0313 Epoch: 24 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:44,946-Speed 13315.98 samples/sec Loss 4.9628 LearningRate 0.0313 Epoch: 24 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:46,532-Speed 12924.13 samples/sec Loss 5.0563 LearningRate 0.0313 Epoch: 24 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:48,105-Speed 13024.53 samples/sec Loss 5.0059 LearningRate 0.0312 Epoch: 24 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:49,670-Speed 13092.93 samples/sec Loss 4.9707 LearningRate 0.0312 Epoch: 24 Global Step: 63190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:51,249-Speed 12975.50 samples/sec Loss 4.9248 LearningRate 0.0312 Epoch: 24 Global Step: 63200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:52,916-Speed 12295.95 samples/sec Loss 5.0242 LearningRate 0.0312 Epoch: 24 Global Step: 63210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:47:54,376-Speed 14038.11 samples/sec Loss 5.0442 LearningRate 0.0312 Epoch: 24 Global Step: 63220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:08,097-Speed 1492.63 samples/sec Loss 4.5684 LearningRate 0.0312 Epoch: 25 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:09,688-Speed 12881.74 samples/sec Loss 4.3250 LearningRate 0.0311 Epoch: 25 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:11,274-Speed 12922.30 samples/sec Loss 4.2279 LearningRate 0.0311 Epoch: 25 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:12,854-Speed 12968.93 samples/sec Loss 4.2440 LearningRate 0.0311 Epoch: 25 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:14,442-Speed 12905.66 samples/sec Loss 4.2608 LearningRate 0.0311 Epoch: 25 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:16,023-Speed 12960.98 samples/sec Loss 4.2401 LearningRate 0.0311 Epoch: 25 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:17,596-Speed 13021.71 samples/sec Loss 4.2945 LearningRate 0.0311 Epoch: 25 Global Step: 63290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:19,161-Speed 13091.04 samples/sec Loss 4.2998 LearningRate 0.0310 Epoch: 25 Global Step: 63300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:20,731-Speed 13051.45 samples/sec Loss 4.2853 LearningRate 0.0310 Epoch: 25 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:22,295-Speed 13104.11 samples/sec Loss 4.2108 LearningRate 0.0310 Epoch: 25 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:23,866-Speed 13040.59 samples/sec Loss 4.2197 LearningRate 0.0310 Epoch: 25 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:25,455-Speed 12902.97 samples/sec Loss 4.2848 LearningRate 0.0310 Epoch: 25 Global Step: 63340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:48:27,006-Speed 13211.04 samples/sec Loss 4.3257 LearningRate 0.0310 Epoch: 25 Global Step: 63350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:48:28,565-Speed 13141.54 samples/sec Loss 4.3429 LearningRate 0.0309 Epoch: 25 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:30,162-Speed 12834.33 samples/sec Loss 4.3265 LearningRate 0.0309 Epoch: 25 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:31,726-Speed 13104.09 samples/sec Loss 4.2518 LearningRate 0.0309 Epoch: 25 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:33,291-Speed 13086.09 samples/sec Loss 4.3100 LearningRate 0.0309 Epoch: 25 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:34,861-Speed 13050.39 samples/sec Loss 4.3495 LearningRate 0.0309 Epoch: 25 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:36,425-Speed 13108.52 samples/sec Loss 4.3249 LearningRate 0.0309 Epoch: 25 Global Step: 63410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:37,991-Speed 13077.25 samples/sec Loss 4.2468 LearningRate 0.0308 Epoch: 25 Global Step: 63420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:39,565-Speed 13017.24 samples/sec Loss 4.3443 LearningRate 0.0308 Epoch: 25 Global Step: 63430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:41,135-Speed 13056.52 samples/sec Loss 4.3497 LearningRate 0.0308 Epoch: 25 Global Step: 63440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:42,699-Speed 13100.79 samples/sec Loss 4.3307 LearningRate 0.0308 Epoch: 25 Global Step: 63450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:44,262-Speed 13110.18 samples/sec Loss 4.3988 LearningRate 0.0308 Epoch: 25 Global Step: 63460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:45,818-Speed 13171.37 samples/sec Loss 4.3434 LearningRate 0.0308 Epoch: 25 Global Step: 63470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:47,417-Speed 12817.60 samples/sec Loss 4.4366 LearningRate 0.0307 Epoch: 25 Global Step: 63480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:48,987-Speed 13046.55 samples/sec Loss 4.3407 LearningRate 0.0307 Epoch: 25 Global Step: 63490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:50,561-Speed 13020.85 samples/sec Loss 4.3397 LearningRate 0.0307 Epoch: 25 Global Step: 63500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:48:52,126-Speed 13089.49 samples/sec Loss 4.4773 LearningRate 0.0307 Epoch: 25 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:53,706-Speed 12972.94 samples/sec Loss 4.3508 LearningRate 0.0307 Epoch: 25 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:55,276-Speed 13050.95 samples/sec Loss 4.3684 LearningRate 0.0307 Epoch: 25 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:56,862-Speed 12917.97 samples/sec Loss 4.3333 LearningRate 0.0306 Epoch: 25 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:48:58,434-Speed 13033.63 samples/sec Loss 4.3955 LearningRate 0.0306 Epoch: 25 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:00,012-Speed 12985.84 samples/sec Loss 4.4907 LearningRate 0.0306 Epoch: 25 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:01,576-Speed 13106.69 samples/sec Loss 4.3696 LearningRate 0.0306 Epoch: 25 Global Step: 63570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:03,159-Speed 12936.95 samples/sec Loss 4.4238 LearningRate 0.0306 Epoch: 25 Global Step: 63580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:04,718-Speed 13152.66 samples/sec Loss 4.4997 LearningRate 0.0306 Epoch: 25 Global Step: 63590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:06,273-Speed 13172.59 samples/sec Loss 4.4380 LearningRate 0.0306 Epoch: 25 Global Step: 63600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:07,856-Speed 12950.48 samples/sec Loss 4.3895 LearningRate 0.0305 Epoch: 25 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:49:09,414-Speed 13148.20 samples/sec Loss 4.5256 LearningRate 0.0305 Epoch: 25 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:10,977-Speed 13113.82 samples/sec Loss 4.4838 LearningRate 0.0305 Epoch: 25 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:12,568-Speed 12880.83 samples/sec Loss 4.5166 LearningRate 0.0305 Epoch: 25 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:14,108-Speed 13300.00 samples/sec Loss 4.5105 LearningRate 0.0305 Epoch: 25 Global Step: 63650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:15,715-Speed 12754.63 samples/sec Loss 4.3934 LearningRate 0.0305 Epoch: 25 Global Step: 63660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:17,259-Speed 13270.21 samples/sec Loss 4.5185 LearningRate 0.0304 Epoch: 25 Global Step: 63670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:18,833-Speed 13017.53 samples/sec Loss 4.5172 LearningRate 0.0304 Epoch: 25 Global Step: 63680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:20,376-Speed 13287.47 samples/sec Loss 4.4260 LearningRate 0.0304 Epoch: 25 Global Step: 63690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:21,947-Speed 13041.56 samples/sec Loss 4.4866 LearningRate 0.0304 Epoch: 25 Global Step: 63700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:23,534-Speed 12905.29 samples/sec Loss 4.5250 LearningRate 0.0304 Epoch: 25 Global Step: 63710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:25,117-Speed 12942.21 samples/sec Loss 4.5076 LearningRate 0.0304 Epoch: 25 Global Step: 63720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:26,887-Speed 11583.73 samples/sec Loss 4.4879 LearningRate 0.0303 Epoch: 25 Global Step: 63730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:28,470-Speed 12944.81 samples/sec Loss 4.5118 LearningRate 0.0303 Epoch: 25 Global Step: 63740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:49:30,048-Speed 12986.51 samples/sec Loss 4.5477 LearningRate 0.0303 Epoch: 25 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:31,618-Speed 13048.53 samples/sec Loss 4.4736 LearningRate 0.0303 Epoch: 25 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:33,181-Speed 13108.03 samples/sec Loss 4.5113 LearningRate 0.0303 Epoch: 25 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:34,763-Speed 12960.47 samples/sec Loss 4.5769 LearningRate 0.0303 Epoch: 25 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:36,338-Speed 13018.32 samples/sec Loss 4.5495 LearningRate 0.0302 Epoch: 25 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:37,921-Speed 12938.49 samples/sec Loss 4.5096 LearningRate 0.0302 Epoch: 25 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:39,507-Speed 12919.96 samples/sec Loss 4.5230 LearningRate 0.0302 Epoch: 25 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:41,080-Speed 13027.58 samples/sec Loss 4.4562 LearningRate 0.0302 Epoch: 25 Global Step: 63820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:42,644-Speed 13101.61 samples/sec Loss 4.4983 LearningRate 0.0302 Epoch: 25 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:44,212-Speed 13066.09 samples/sec Loss 4.5119 LearningRate 0.0302 Epoch: 25 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:45,759-Speed 13250.38 samples/sec Loss 4.5788 LearningRate 0.0301 Epoch: 25 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:47,331-Speed 13031.77 samples/sec Loss 4.5057 LearningRate 0.0301 Epoch: 25 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:48,913-Speed 12950.78 samples/sec Loss 4.6092 LearningRate 0.0301 Epoch: 25 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:50,495-Speed 12963.05 samples/sec Loss 4.5217 LearningRate 0.0301 Epoch: 25 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:52,063-Speed 13066.41 samples/sec Loss 4.5577 LearningRate 0.0301 Epoch: 25 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:53,631-Speed 13066.46 samples/sec Loss 4.6186 LearningRate 0.0301 Epoch: 25 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:55,200-Speed 13063.31 samples/sec Loss 4.6453 LearningRate 0.0300 Epoch: 25 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:56,781-Speed 12968.21 samples/sec Loss 4.5470 LearningRate 0.0300 Epoch: 25 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:58,335-Speed 13182.05 samples/sec Loss 4.6072 LearningRate 0.0300 Epoch: 25 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:49:59,920-Speed 12929.84 samples/sec Loss 4.5997 LearningRate 0.0300 Epoch: 25 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:01,485-Speed 13095.82 samples/sec Loss 4.5555 LearningRate 0.0300 Epoch: 25 Global Step: 63950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:50:03,053-Speed 13068.28 samples/sec Loss 4.6152 LearningRate 0.0300 Epoch: 25 Global Step: 63960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:50:04,612-Speed 13140.88 samples/sec Loss 4.5691 LearningRate 0.0300 Epoch: 25 Global Step: 63970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:50:06,174-Speed 13120.21 samples/sec Loss 4.6196 LearningRate 0.0299 Epoch: 25 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:07,740-Speed 13079.74 samples/sec Loss 4.6097 LearningRate 0.0299 Epoch: 25 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:09,312-Speed 13039.31 samples/sec Loss 4.5486 LearningRate 0.0299 Epoch: 25 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:10,860-Speed 13240.61 samples/sec Loss 4.5378 LearningRate 0.0299 Epoch: 25 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:12,427-Speed 13080.02 samples/sec Loss 4.6103 LearningRate 0.0299 Epoch: 25 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:14,010-Speed 12940.44 samples/sec Loss 4.5955 LearningRate 0.0299 Epoch: 25 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:15,566-Speed 13169.16 samples/sec Loss 4.7022 LearningRate 0.0298 Epoch: 25 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:17,124-Speed 13153.04 samples/sec Loss 4.6398 LearningRate 0.0298 Epoch: 25 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:18,684-Speed 13136.61 samples/sec Loss 4.5651 LearningRate 0.0298 Epoch: 25 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:20,244-Speed 13137.21 samples/sec Loss 4.6302 LearningRate 0.0298 Epoch: 25 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:21,785-Speed 13302.53 samples/sec Loss 4.6598 LearningRate 0.0298 Epoch: 25 Global Step: 64080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:50:23,347-Speed 13111.91 samples/sec Loss 4.6223 LearningRate 0.0298 Epoch: 25 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:24,916-Speed 13057.64 samples/sec Loss 4.5837 LearningRate 0.0297 Epoch: 25 Global Step: 64100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:26,495-Speed 12983.14 samples/sec Loss 4.5994 LearningRate 0.0297 Epoch: 25 Global Step: 64110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:28,062-Speed 13077.08 samples/sec Loss 4.6558 LearningRate 0.0297 Epoch: 25 Global Step: 64120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:29,641-Speed 12980.76 samples/sec Loss 4.6557 LearningRate 0.0297 Epoch: 25 Global Step: 64130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:31,185-Speed 13272.21 samples/sec Loss 4.5867 LearningRate 0.0297 Epoch: 25 Global Step: 64140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:32,764-Speed 12977.83 samples/sec Loss 4.6352 LearningRate 0.0297 Epoch: 25 Global Step: 64150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:34,360-Speed 12836.40 samples/sec Loss 4.6105 LearningRate 0.0296 Epoch: 25 Global Step: 64160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:35,939-Speed 12981.62 samples/sec Loss 4.7233 LearningRate 0.0296 Epoch: 25 Global Step: 64170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:37,487-Speed 13233.74 samples/sec Loss 4.6677 LearningRate 0.0296 Epoch: 25 Global Step: 64180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:39,076-Speed 12896.86 samples/sec Loss 4.6659 LearningRate 0.0296 Epoch: 25 Global Step: 64190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:50:40,667-Speed 12881.45 samples/sec Loss 4.6686 LearningRate 0.0296 Epoch: 25 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:42,239-Speed 13029.98 samples/sec Loss 4.6723 LearningRate 0.0296 Epoch: 25 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:43,814-Speed 13011.98 samples/sec Loss 4.7027 LearningRate 0.0296 Epoch: 25 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:45,391-Speed 12994.24 samples/sec Loss 4.6255 LearningRate 0.0295 Epoch: 25 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:46,969-Speed 12982.83 samples/sec Loss 4.6644 LearningRate 0.0295 Epoch: 25 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:48,536-Speed 13076.01 samples/sec Loss 4.6410 LearningRate 0.0295 Epoch: 25 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:50,125-Speed 12896.36 samples/sec Loss 4.7277 LearningRate 0.0295 Epoch: 25 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:51,677-Speed 13205.46 samples/sec Loss 4.6792 LearningRate 0.0295 Epoch: 25 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:53,236-Speed 13140.44 samples/sec Loss 4.7634 LearningRate 0.0295 Epoch: 25 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:54,829-Speed 12863.16 samples/sec Loss 4.7159 LearningRate 0.0294 Epoch: 25 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:56,392-Speed 13115.53 samples/sec Loss 4.6376 LearningRate 0.0294 Epoch: 25 Global Step: 64300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:50:57,972-Speed 12967.27 samples/sec Loss 4.7614 LearningRate 0.0294 Epoch: 25 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:50:59,539-Speed 13094.11 samples/sec Loss 4.6524 LearningRate 0.0294 Epoch: 25 Global Step: 64320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:01,120-Speed 12959.83 samples/sec Loss 4.6725 LearningRate 0.0294 Epoch: 25 Global Step: 64330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:02,679-Speed 13140.81 samples/sec Loss 4.6988 LearningRate 0.0294 Epoch: 25 Global Step: 64340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:04,233-Speed 13181.99 samples/sec Loss 4.7750 LearningRate 0.0293 Epoch: 25 Global Step: 64350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:05,808-Speed 13015.03 samples/sec Loss 4.6720 LearningRate 0.0293 Epoch: 25 Global Step: 64360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:07,354-Speed 13254.76 samples/sec Loss 4.7026 LearningRate 0.0293 Epoch: 25 Global Step: 64370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:08,935-Speed 12956.93 samples/sec Loss 4.7266 LearningRate 0.0293 Epoch: 25 Global Step: 64380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:10,484-Speed 13233.37 samples/sec Loss 4.7356 LearningRate 0.0293 Epoch: 25 Global Step: 64390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:12,068-Speed 12930.77 samples/sec Loss 4.7197 LearningRate 0.0293 Epoch: 25 Global Step: 64400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:13,615-Speed 13246.56 samples/sec Loss 4.7004 LearningRate 0.0292 Epoch: 25 Global Step: 64410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:15,202-Speed 12910.23 samples/sec Loss 4.7828 LearningRate 0.0292 Epoch: 25 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:16,782-Speed 12980.79 samples/sec Loss 4.7264 LearningRate 0.0292 Epoch: 25 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:18,339-Speed 13155.63 samples/sec Loss 4.6948 LearningRate 0.0292 Epoch: 25 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:19,908-Speed 13059.84 samples/sec Loss 4.6974 LearningRate 0.0292 Epoch: 25 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:21,482-Speed 13015.31 samples/sec Loss 4.7160 LearningRate 0.0292 Epoch: 25 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:23,058-Speed 13002.83 samples/sec Loss 4.7134 LearningRate 0.0292 Epoch: 25 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:24,617-Speed 13144.14 samples/sec Loss 4.6704 LearningRate 0.0291 Epoch: 25 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:26,176-Speed 13142.90 samples/sec Loss 4.7406 LearningRate 0.0291 Epoch: 25 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:27,756-Speed 12967.50 samples/sec Loss 4.7198 LearningRate 0.0291 Epoch: 25 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:29,335-Speed 12981.39 samples/sec Loss 4.7105 LearningRate 0.0291 Epoch: 25 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:30,890-Speed 13173.56 samples/sec Loss 4.7651 LearningRate 0.0291 Epoch: 25 Global Step: 64520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:51:32,455-Speed 13095.09 samples/sec Loss 4.6522 LearningRate 0.0291 Epoch: 25 Global Step: 64530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:51:33,998-Speed 13286.24 samples/sec Loss 4.7168 LearningRate 0.0290 Epoch: 25 Global Step: 64540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:35,558-Speed 13139.36 samples/sec Loss 4.6638 LearningRate 0.0290 Epoch: 25 Global Step: 64550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:37,113-Speed 13173.71 samples/sec Loss 4.7102 LearningRate 0.0290 Epoch: 25 Global Step: 64560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:38,694-Speed 12964.24 samples/sec Loss 4.7721 LearningRate 0.0290 Epoch: 25 Global Step: 64570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:40,255-Speed 13125.75 samples/sec Loss 4.6709 LearningRate 0.0290 Epoch: 25 Global Step: 64580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:41,817-Speed 13140.01 samples/sec Loss 4.7334 LearningRate 0.0290 Epoch: 25 Global Step: 64590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:43,395-Speed 12991.43 samples/sec Loss 4.7428 LearningRate 0.0289 Epoch: 25 Global Step: 64600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:44,956-Speed 13144.63 samples/sec Loss 4.7903 LearningRate 0.0289 Epoch: 25 Global Step: 64610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:46,530-Speed 13023.29 samples/sec Loss 4.7506 LearningRate 0.0289 Epoch: 25 Global Step: 64620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:48,113-Speed 12937.73 samples/sec Loss 4.7965 LearningRate 0.0289 Epoch: 25 Global Step: 64630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:49,692-Speed 12981.11 samples/sec Loss 4.7254 LearningRate 0.0289 Epoch: 25 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:51,252-Speed 13132.22 samples/sec Loss 4.7804 LearningRate 0.0289 Epoch: 25 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:52,811-Speed 13142.52 samples/sec Loss 4.7517 LearningRate 0.0289 Epoch: 25 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:54,358-Speed 13239.82 samples/sec Loss 4.8598 LearningRate 0.0288 Epoch: 25 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:51:55,917-Speed 13191.05 samples/sec Loss 4.7402 LearningRate 0.0288 Epoch: 25 Global Step: 64680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:57,481-Speed 13099.49 samples/sec Loss 4.8243 LearningRate 0.0288 Epoch: 25 Global Step: 64690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:51:59,058-Speed 12994.96 samples/sec Loss 4.7721 LearningRate 0.0288 Epoch: 25 Global Step: 64700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:00,624-Speed 13086.46 samples/sec Loss 4.7684 LearningRate 0.0288 Epoch: 25 Global Step: 64710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:02,172-Speed 13239.76 samples/sec Loss 4.7196 LearningRate 0.0288 Epoch: 25 Global Step: 64720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:03,758-Speed 12919.23 samples/sec Loss 4.7586 LearningRate 0.0287 Epoch: 25 Global Step: 64730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:05,324-Speed 13083.83 samples/sec Loss 4.7143 LearningRate 0.0287 Epoch: 25 Global Step: 64740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:06,894-Speed 13053.00 samples/sec Loss 4.7215 LearningRate 0.0287 Epoch: 25 Global Step: 64750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:08,462-Speed 13067.88 samples/sec Loss 4.7529 LearningRate 0.0287 Epoch: 25 Global Step: 64760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:10,021-Speed 13139.75 samples/sec Loss 4.7875 LearningRate 0.0287 Epoch: 25 Global Step: 64770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:11,550-Speed 13404.14 samples/sec Loss 4.7826 LearningRate 0.0287 Epoch: 25 Global Step: 64780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:13,114-Speed 13099.47 samples/sec Loss 4.7251 LearningRate 0.0286 Epoch: 25 Global Step: 64790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:14,675-Speed 13123.68 samples/sec Loss 4.7140 LearningRate 0.0286 Epoch: 25 Global Step: 64800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:16,256-Speed 12964.71 samples/sec Loss 4.7075 LearningRate 0.0286 Epoch: 25 Global Step: 64810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:17,824-Speed 13067.59 samples/sec Loss 4.8748 LearningRate 0.0286 Epoch: 25 Global Step: 64820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:19,384-Speed 13141.11 samples/sec Loss 4.7757 LearningRate 0.0286 Epoch: 25 Global Step: 64830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:20,944-Speed 13134.27 samples/sec Loss 4.8640 LearningRate 0.0286 Epoch: 25 Global Step: 64840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:22,531-Speed 12913.10 samples/sec Loss 4.8416 LearningRate 0.0286 Epoch: 25 Global Step: 64850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:24,107-Speed 13016.56 samples/sec Loss 4.8442 LearningRate 0.0285 Epoch: 25 Global Step: 64860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:25,662-Speed 13181.94 samples/sec Loss 4.8044 LearningRate 0.0285 Epoch: 25 Global Step: 64870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:52:27,224-Speed 13115.41 samples/sec Loss 4.7159 LearningRate 0.0285 Epoch: 25 Global Step: 64880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:28,780-Speed 13170.96 samples/sec Loss 4.8163 LearningRate 0.0285 Epoch: 25 Global Step: 64890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:30,345-Speed 13088.46 samples/sec Loss 4.8104 LearningRate 0.0285 Epoch: 25 Global Step: 64900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:31,923-Speed 12988.16 samples/sec Loss 4.8060 LearningRate 0.0285 Epoch: 25 Global Step: 64910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:33,486-Speed 13107.06 samples/sec Loss 4.7412 LearningRate 0.0284 Epoch: 25 Global Step: 64920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:35,048-Speed 13119.56 samples/sec Loss 4.7562 LearningRate 0.0284 Epoch: 25 Global Step: 64930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:36,602-Speed 13192.84 samples/sec Loss 4.8135 LearningRate 0.0284 Epoch: 25 Global Step: 64940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:38,158-Speed 13163.29 samples/sec Loss 4.7322 LearningRate 0.0284 Epoch: 25 Global Step: 64950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:39,731-Speed 13023.96 samples/sec Loss 4.8022 LearningRate 0.0284 Epoch: 25 Global Step: 64960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:41,310-Speed 12987.50 samples/sec Loss 4.7452 LearningRate 0.0284 Epoch: 25 Global Step: 64970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:52:42,877-Speed 13072.89 samples/sec Loss 4.7077 LearningRate 0.0283 Epoch: 25 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:52:44,459-Speed 12950.41 samples/sec Loss 4.8524 LearningRate 0.0283 Epoch: 25 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:52:46,042-Speed 12950.80 samples/sec Loss 4.7724 LearningRate 0.0283 Epoch: 25 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:53:08,214-[lfw][65000]XNorm: 8.712195 Training: 2022-01-14 16:53:08,214-[lfw][65000]Accuracy-Flip: 0.99650+-0.00383 Training: 2022-01-14 16:53:08,215-[lfw][65000]Accuracy-Highest: 0.99650 Training: 2022-01-14 16:53:35,774-[cfp_fp][65000]XNorm: 7.344439 Training: 2022-01-14 16:53:35,775-[cfp_fp][65000]Accuracy-Flip: 0.96329+-0.01119 Training: 2022-01-14 16:53:35,776-[cfp_fp][65000]Accuracy-Highest: 0.96471 Training: 2022-01-14 16:53:58,476-[agedb_30][65000]XNorm: 8.451928 Training: 2022-01-14 16:53:58,477-[agedb_30][65000]Accuracy-Flip: 0.96800+-0.00662 Training: 2022-01-14 16:53:58,477-[agedb_30][65000]Accuracy-Highest: 0.96800 Training: 2022-01-14 16:54:00,020-Speed 276.84 samples/sec Loss 4.6924 LearningRate 0.0283 Epoch: 25 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:01,583-Speed 13108.74 samples/sec Loss 4.7185 LearningRate 0.0283 Epoch: 25 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:03,157-Speed 13020.71 samples/sec Loss 4.8083 LearningRate 0.0283 Epoch: 25 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:04,730-Speed 13020.92 samples/sec Loss 4.8060 LearningRate 0.0283 Epoch: 25 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:06,310-Speed 12967.88 samples/sec Loss 4.7639 LearningRate 0.0282 Epoch: 25 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:07,872-Speed 13121.11 samples/sec Loss 4.7832 LearningRate 0.0282 Epoch: 25 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:09,426-Speed 13188.82 samples/sec Loss 4.7789 LearningRate 0.0282 Epoch: 25 Global Step: 65070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:10,986-Speed 13126.64 samples/sec Loss 4.8156 LearningRate 0.0282 Epoch: 25 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:12,550-Speed 13100.47 samples/sec Loss 4.8376 LearningRate 0.0282 Epoch: 25 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:14,100-Speed 13226.55 samples/sec Loss 4.8189 LearningRate 0.0282 Epoch: 25 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:15,652-Speed 13201.73 samples/sec Loss 4.6937 LearningRate 0.0281 Epoch: 25 Global Step: 65110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:17,216-Speed 13099.19 samples/sec Loss 4.7784 LearningRate 0.0281 Epoch: 25 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:18,774-Speed 13153.49 samples/sec Loss 4.8327 LearningRate 0.0281 Epoch: 25 Global Step: 65130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:20,327-Speed 13197.64 samples/sec Loss 4.8328 LearningRate 0.0281 Epoch: 25 Global Step: 65140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:21,901-Speed 13015.92 samples/sec Loss 4.8121 LearningRate 0.0281 Epoch: 25 Global Step: 65150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:23,458-Speed 13180.96 samples/sec Loss 4.8010 LearningRate 0.0281 Epoch: 25 Global Step: 65160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:25,021-Speed 13108.12 samples/sec Loss 4.7182 LearningRate 0.0280 Epoch: 25 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:26,571-Speed 13216.69 samples/sec Loss 4.9068 LearningRate 0.0280 Epoch: 25 Global Step: 65180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:28,139-Speed 13073.04 samples/sec Loss 4.7599 LearningRate 0.0280 Epoch: 25 Global Step: 65190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:29,699-Speed 13132.43 samples/sec Loss 4.8499 LearningRate 0.0280 Epoch: 25 Global Step: 65200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:31,269-Speed 13053.32 samples/sec Loss 4.7332 LearningRate 0.0280 Epoch: 25 Global Step: 65210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:32,846-Speed 12986.40 samples/sec Loss 4.8452 LearningRate 0.0280 Epoch: 25 Global Step: 65220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:34,397-Speed 13214.56 samples/sec Loss 4.8164 LearningRate 0.0280 Epoch: 25 Global Step: 65230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:35,979-Speed 12956.78 samples/sec Loss 4.7834 LearningRate 0.0279 Epoch: 25 Global Step: 65240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:37,532-Speed 13185.88 samples/sec Loss 4.8785 LearningRate 0.0279 Epoch: 25 Global Step: 65250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:39,111-Speed 12989.16 samples/sec Loss 4.9115 LearningRate 0.0279 Epoch: 25 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:40,685-Speed 13030.72 samples/sec Loss 4.7652 LearningRate 0.0279 Epoch: 25 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:42,245-Speed 13142.27 samples/sec Loss 4.8708 LearningRate 0.0279 Epoch: 25 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:43,800-Speed 13178.70 samples/sec Loss 4.8560 LearningRate 0.0279 Epoch: 25 Global Step: 65290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:45,360-Speed 13135.94 samples/sec Loss 4.7921 LearningRate 0.0278 Epoch: 25 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:46,923-Speed 13100.01 samples/sec Loss 4.8125 LearningRate 0.0278 Epoch: 25 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:48,478-Speed 13182.85 samples/sec Loss 4.8233 LearningRate 0.0278 Epoch: 25 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:50,039-Speed 13127.69 samples/sec Loss 4.7762 LearningRate 0.0278 Epoch: 25 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:51,612-Speed 13026.61 samples/sec Loss 4.8438 LearningRate 0.0278 Epoch: 25 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:53,187-Speed 13012.34 samples/sec Loss 4.7257 LearningRate 0.0278 Epoch: 25 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:54,757-Speed 13075.16 samples/sec Loss 4.7275 LearningRate 0.0278 Epoch: 25 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:56,349-Speed 12869.87 samples/sec Loss 4.8887 LearningRate 0.0277 Epoch: 25 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:54:57,916-Speed 13071.87 samples/sec Loss 4.8468 LearningRate 0.0277 Epoch: 25 Global Step: 65380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:54:59,475-Speed 13149.62 samples/sec Loss 4.7983 LearningRate 0.0277 Epoch: 25 Global Step: 65390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:01,061-Speed 12916.64 samples/sec Loss 4.8233 LearningRate 0.0277 Epoch: 25 Global Step: 65400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:02,612-Speed 13214.13 samples/sec Loss 4.8178 LearningRate 0.0277 Epoch: 25 Global Step: 65410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:04,211-Speed 12814.25 samples/sec Loss 4.8807 LearningRate 0.0277 Epoch: 25 Global Step: 65420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:05,797-Speed 12917.97 samples/sec Loss 4.8766 LearningRate 0.0276 Epoch: 25 Global Step: 65430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:07,367-Speed 13052.91 samples/sec Loss 4.8845 LearningRate 0.0276 Epoch: 25 Global Step: 65440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:08,939-Speed 13032.02 samples/sec Loss 4.8662 LearningRate 0.0276 Epoch: 25 Global Step: 65450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:10,492-Speed 13192.85 samples/sec Loss 4.7918 LearningRate 0.0276 Epoch: 25 Global Step: 65460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:12,041-Speed 13228.15 samples/sec Loss 4.8368 LearningRate 0.0276 Epoch: 25 Global Step: 65470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:13,617-Speed 13002.50 samples/sec Loss 4.7830 LearningRate 0.0276 Epoch: 25 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:15,168-Speed 13215.36 samples/sec Loss 4.9393 LearningRate 0.0276 Epoch: 25 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:16,735-Speed 13071.93 samples/sec Loss 4.6820 LearningRate 0.0275 Epoch: 25 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:18,294-Speed 13147.95 samples/sec Loss 4.7810 LearningRate 0.0275 Epoch: 25 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:19,887-Speed 12857.24 samples/sec Loss 4.8456 LearningRate 0.0275 Epoch: 25 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:21,453-Speed 13086.86 samples/sec Loss 4.8369 LearningRate 0.0275 Epoch: 25 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:23,018-Speed 13092.60 samples/sec Loss 4.8288 LearningRate 0.0275 Epoch: 25 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:24,571-Speed 13196.56 samples/sec Loss 4.8018 LearningRate 0.0275 Epoch: 25 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:26,133-Speed 13121.40 samples/sec Loss 4.9188 LearningRate 0.0274 Epoch: 25 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:27,694-Speed 13127.37 samples/sec Loss 4.7915 LearningRate 0.0274 Epoch: 25 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:29,256-Speed 13113.16 samples/sec Loss 4.8047 LearningRate 0.0274 Epoch: 25 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:30,831-Speed 13013.86 samples/sec Loss 4.8728 LearningRate 0.0274 Epoch: 25 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:32,390-Speed 13143.49 samples/sec Loss 4.8409 LearningRate 0.0274 Epoch: 25 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:33,959-Speed 13056.22 samples/sec Loss 4.7273 LearningRate 0.0274 Epoch: 25 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:35,514-Speed 13174.06 samples/sec Loss 4.9210 LearningRate 0.0274 Epoch: 25 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:37,086-Speed 13037.89 samples/sec Loss 4.8701 LearningRate 0.0273 Epoch: 25 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:38,636-Speed 13224.35 samples/sec Loss 4.8651 LearningRate 0.0273 Epoch: 25 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:40,212-Speed 12999.20 samples/sec Loss 4.7462 LearningRate 0.0273 Epoch: 25 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:55:41,775-Speed 13105.33 samples/sec Loss 4.8014 LearningRate 0.0273 Epoch: 25 Global Step: 65660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:43,330-Speed 13187.81 samples/sec Loss 4.8526 LearningRate 0.0273 Epoch: 25 Global Step: 65670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:44,901-Speed 13046.56 samples/sec Loss 4.9003 LearningRate 0.0273 Epoch: 25 Global Step: 65680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:46,478-Speed 12992.19 samples/sec Loss 4.8503 LearningRate 0.0272 Epoch: 25 Global Step: 65690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:48,050-Speed 13036.01 samples/sec Loss 4.8990 LearningRate 0.0272 Epoch: 25 Global Step: 65700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:49,614-Speed 13122.26 samples/sec Loss 4.8538 LearningRate 0.0272 Epoch: 25 Global Step: 65710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:51,216-Speed 12787.49 samples/sec Loss 4.8934 LearningRate 0.0272 Epoch: 25 Global Step: 65720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:52,780-Speed 13102.76 samples/sec Loss 4.8115 LearningRate 0.0272 Epoch: 25 Global Step: 65730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:54,453-Speed 12250.08 samples/sec Loss 4.8803 LearningRate 0.0272 Epoch: 25 Global Step: 65740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:55:55,926-Speed 13910.76 samples/sec Loss 4.7792 LearningRate 0.0272 Epoch: 25 Global Step: 65750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:11,545-Speed 1311.51 samples/sec Loss 4.4609 LearningRate 0.0271 Epoch: 26 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:13,149-Speed 12779.48 samples/sec Loss 4.1122 LearningRate 0.0271 Epoch: 26 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:14,741-Speed 12868.76 samples/sec Loss 4.0974 LearningRate 0.0271 Epoch: 26 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:16,311-Speed 13050.11 samples/sec Loss 4.1414 LearningRate 0.0271 Epoch: 26 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:17,896-Speed 12927.82 samples/sec Loss 4.1818 LearningRate 0.0271 Epoch: 26 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:19,482-Speed 12923.85 samples/sec Loss 4.1823 LearningRate 0.0271 Epoch: 26 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:21,059-Speed 12992.76 samples/sec Loss 4.1911 LearningRate 0.0270 Epoch: 26 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:22,668-Speed 12729.71 samples/sec Loss 4.1717 LearningRate 0.0270 Epoch: 26 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:24,274-Speed 12770.73 samples/sec Loss 4.1588 LearningRate 0.0270 Epoch: 26 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:25,842-Speed 13067.13 samples/sec Loss 4.1378 LearningRate 0.0270 Epoch: 26 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:27,393-Speed 13207.18 samples/sec Loss 4.1661 LearningRate 0.0270 Epoch: 26 Global Step: 65860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 16:56:28,969-Speed 13009.84 samples/sec Loss 4.2095 LearningRate 0.0270 Epoch: 26 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:30,559-Speed 12881.11 samples/sec Loss 4.3157 LearningRate 0.0270 Epoch: 26 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:32,096-Speed 13335.54 samples/sec Loss 4.3345 LearningRate 0.0269 Epoch: 26 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:33,688-Speed 12870.10 samples/sec Loss 4.2321 LearningRate 0.0269 Epoch: 26 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:35,254-Speed 13088.20 samples/sec Loss 4.2205 LearningRate 0.0269 Epoch: 26 Global Step: 65910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:36,810-Speed 13164.07 samples/sec Loss 4.2116 LearningRate 0.0269 Epoch: 26 Global Step: 65920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:38,383-Speed 13031.19 samples/sec Loss 4.2547 LearningRate 0.0269 Epoch: 26 Global Step: 65930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:39,962-Speed 12975.38 samples/sec Loss 4.2263 LearningRate 0.0269 Epoch: 26 Global Step: 65940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:41,546-Speed 12936.43 samples/sec Loss 4.2114 LearningRate 0.0268 Epoch: 26 Global Step: 65950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:43,130-Speed 12931.00 samples/sec Loss 4.2546 LearningRate 0.0268 Epoch: 26 Global Step: 65960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:44,695-Speed 13097.37 samples/sec Loss 4.3263 LearningRate 0.0268 Epoch: 26 Global Step: 65970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:46,265-Speed 13057.49 samples/sec Loss 4.3447 LearningRate 0.0268 Epoch: 26 Global Step: 65980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:47,833-Speed 13063.59 samples/sec Loss 4.3265 LearningRate 0.0268 Epoch: 26 Global Step: 65990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:49,429-Speed 12846.08 samples/sec Loss 4.2765 LearningRate 0.0268 Epoch: 26 Global Step: 66000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:56:51,006-Speed 12992.00 samples/sec Loss 4.2217 LearningRate 0.0268 Epoch: 26 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:52,582-Speed 12993.48 samples/sec Loss 4.2413 LearningRate 0.0267 Epoch: 26 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:54,153-Speed 13045.30 samples/sec Loss 4.2679 LearningRate 0.0267 Epoch: 26 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:55,747-Speed 12856.09 samples/sec Loss 4.2508 LearningRate 0.0267 Epoch: 26 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:57,320-Speed 13027.87 samples/sec Loss 4.3465 LearningRate 0.0267 Epoch: 26 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:56:58,860-Speed 13308.73 samples/sec Loss 4.2908 LearningRate 0.0267 Epoch: 26 Global Step: 66060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:00,454-Speed 12856.28 samples/sec Loss 4.2425 LearningRate 0.0267 Epoch: 26 Global Step: 66070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:02,009-Speed 13177.68 samples/sec Loss 4.3271 LearningRate 0.0266 Epoch: 26 Global Step: 66080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:03,582-Speed 13032.03 samples/sec Loss 4.2919 LearningRate 0.0266 Epoch: 26 Global Step: 66090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:05,158-Speed 12993.82 samples/sec Loss 4.2724 LearningRate 0.0266 Epoch: 26 Global Step: 66100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:06,708-Speed 13227.81 samples/sec Loss 4.3128 LearningRate 0.0266 Epoch: 26 Global Step: 66110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:08,291-Speed 12958.34 samples/sec Loss 4.3412 LearningRate 0.0266 Epoch: 26 Global Step: 66120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:09,863-Speed 13039.36 samples/sec Loss 4.2756 LearningRate 0.0266 Epoch: 26 Global Step: 66130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:11,463-Speed 12805.44 samples/sec Loss 4.3617 LearningRate 0.0266 Epoch: 26 Global Step: 66140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:13,027-Speed 13102.71 samples/sec Loss 4.3670 LearningRate 0.0265 Epoch: 26 Global Step: 66150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:14,607-Speed 12967.88 samples/sec Loss 4.3299 LearningRate 0.0265 Epoch: 26 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:16,152-Speed 13261.52 samples/sec Loss 4.2826 LearningRate 0.0265 Epoch: 26 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:17,701-Speed 13227.28 samples/sec Loss 4.3378 LearningRate 0.0265 Epoch: 26 Global Step: 66180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:19,265-Speed 13102.22 samples/sec Loss 4.3713 LearningRate 0.0265 Epoch: 26 Global Step: 66190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:20,850-Speed 12931.24 samples/sec Loss 4.3367 LearningRate 0.0265 Epoch: 26 Global Step: 66200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:22,413-Speed 13109.20 samples/sec Loss 4.4295 LearningRate 0.0265 Epoch: 26 Global Step: 66210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:24,000-Speed 12910.45 samples/sec Loss 4.3030 LearningRate 0.0264 Epoch: 26 Global Step: 66220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:25,569-Speed 13057.63 samples/sec Loss 4.4433 LearningRate 0.0264 Epoch: 26 Global Step: 66230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:27,137-Speed 13066.75 samples/sec Loss 4.3839 LearningRate 0.0264 Epoch: 26 Global Step: 66240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:28,702-Speed 13101.76 samples/sec Loss 4.3186 LearningRate 0.0264 Epoch: 26 Global Step: 66250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:30,289-Speed 12914.49 samples/sec Loss 4.3751 LearningRate 0.0264 Epoch: 26 Global Step: 66260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:31,859-Speed 13054.58 samples/sec Loss 4.4587 LearningRate 0.0264 Epoch: 26 Global Step: 66270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:33,428-Speed 13058.02 samples/sec Loss 4.3697 LearningRate 0.0263 Epoch: 26 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:34,999-Speed 13044.73 samples/sec Loss 4.3373 LearningRate 0.0263 Epoch: 26 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:36,567-Speed 13065.91 samples/sec Loss 4.4457 LearningRate 0.0263 Epoch: 26 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:38,136-Speed 13058.93 samples/sec Loss 4.3782 LearningRate 0.0263 Epoch: 26 Global Step: 66310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:39,728-Speed 12871.91 samples/sec Loss 4.3872 LearningRate 0.0263 Epoch: 26 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:41,306-Speed 12987.45 samples/sec Loss 4.3752 LearningRate 0.0263 Epoch: 26 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:42,873-Speed 13071.23 samples/sec Loss 4.4098 LearningRate 0.0263 Epoch: 26 Global Step: 66340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:44,454-Speed 12966.04 samples/sec Loss 4.4589 LearningRate 0.0262 Epoch: 26 Global Step: 66350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:46,041-Speed 12914.07 samples/sec Loss 4.4382 LearningRate 0.0262 Epoch: 26 Global Step: 66360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:47,616-Speed 13002.01 samples/sec Loss 4.4489 LearningRate 0.0262 Epoch: 26 Global Step: 66370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:49,174-Speed 13163.68 samples/sec Loss 4.4532 LearningRate 0.0262 Epoch: 26 Global Step: 66380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:50,796-Speed 12627.92 samples/sec Loss 4.4361 LearningRate 0.0262 Epoch: 26 Global Step: 66390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:52,367-Speed 13051.18 samples/sec Loss 4.4217 LearningRate 0.0262 Epoch: 26 Global Step: 66400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:53,949-Speed 12953.02 samples/sec Loss 4.3257 LearningRate 0.0262 Epoch: 26 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:55,515-Speed 13085.72 samples/sec Loss 4.3791 LearningRate 0.0261 Epoch: 26 Global Step: 66420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:57:57,070-Speed 13167.85 samples/sec Loss 4.4855 LearningRate 0.0261 Epoch: 26 Global Step: 66430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:57:58,660-Speed 12896.85 samples/sec Loss 4.3714 LearningRate 0.0261 Epoch: 26 Global Step: 66440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:00,250-Speed 12881.86 samples/sec Loss 4.4136 LearningRate 0.0261 Epoch: 26 Global Step: 66450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:01,834-Speed 12936.17 samples/sec Loss 4.4324 LearningRate 0.0261 Epoch: 26 Global Step: 66460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:03,395-Speed 13123.65 samples/sec Loss 4.4067 LearningRate 0.0261 Epoch: 26 Global Step: 66470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:04,973-Speed 12988.02 samples/sec Loss 4.4224 LearningRate 0.0260 Epoch: 26 Global Step: 66480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:06,547-Speed 13014.86 samples/sec Loss 4.4554 LearningRate 0.0260 Epoch: 26 Global Step: 66490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:08,164-Speed 12670.64 samples/sec Loss 4.4714 LearningRate 0.0260 Epoch: 26 Global Step: 66500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:09,739-Speed 13035.00 samples/sec Loss 4.4417 LearningRate 0.0260 Epoch: 26 Global Step: 66510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:11,331-Speed 12874.30 samples/sec Loss 4.5306 LearningRate 0.0260 Epoch: 26 Global Step: 66520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:12,887-Speed 13173.46 samples/sec Loss 4.4345 LearningRate 0.0260 Epoch: 26 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:14,460-Speed 13050.70 samples/sec Loss 4.4355 LearningRate 0.0260 Epoch: 26 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:16,015-Speed 13179.44 samples/sec Loss 4.4887 LearningRate 0.0259 Epoch: 26 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:17,586-Speed 13037.83 samples/sec Loss 4.5048 LearningRate 0.0259 Epoch: 26 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:19,170-Speed 12937.59 samples/sec Loss 4.4573 LearningRate 0.0259 Epoch: 26 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:20,739-Speed 13060.86 samples/sec Loss 4.5454 LearningRate 0.0259 Epoch: 26 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:22,308-Speed 13058.92 samples/sec Loss 4.5544 LearningRate 0.0259 Epoch: 26 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:23,885-Speed 12992.65 samples/sec Loss 4.4828 LearningRate 0.0259 Epoch: 26 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:25,482-Speed 12834.31 samples/sec Loss 4.4628 LearningRate 0.0259 Epoch: 26 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:27,071-Speed 12894.30 samples/sec Loss 4.5014 LearningRate 0.0258 Epoch: 26 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:58:28,615-Speed 13270.31 samples/sec Loss 4.5874 LearningRate 0.0258 Epoch: 26 Global Step: 66630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:30,177-Speed 13118.47 samples/sec Loss 4.5750 LearningRate 0.0258 Epoch: 26 Global Step: 66640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:31,753-Speed 13002.90 samples/sec Loss 4.5369 LearningRate 0.0258 Epoch: 26 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:33,323-Speed 13047.90 samples/sec Loss 4.5223 LearningRate 0.0258 Epoch: 26 Global Step: 66660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:34,882-Speed 13151.56 samples/sec Loss 4.5277 LearningRate 0.0258 Epoch: 26 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:36,449-Speed 13072.93 samples/sec Loss 4.4983 LearningRate 0.0257 Epoch: 26 Global Step: 66680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:38,029-Speed 12969.03 samples/sec Loss 4.4932 LearningRate 0.0257 Epoch: 26 Global Step: 66690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:39,580-Speed 13210.89 samples/sec Loss 4.4660 LearningRate 0.0257 Epoch: 26 Global Step: 66700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:41,165-Speed 12930.96 samples/sec Loss 4.5430 LearningRate 0.0257 Epoch: 26 Global Step: 66710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:42,715-Speed 13218.16 samples/sec Loss 4.5324 LearningRate 0.0257 Epoch: 26 Global Step: 66720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:44,284-Speed 13059.67 samples/sec Loss 4.4889 LearningRate 0.0257 Epoch: 26 Global Step: 66730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:45,847-Speed 13105.67 samples/sec Loss 4.5592 LearningRate 0.0257 Epoch: 26 Global Step: 66740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:47,418-Speed 13046.74 samples/sec Loss 4.5540 LearningRate 0.0256 Epoch: 26 Global Step: 66750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:48,981-Speed 13108.90 samples/sec Loss 4.5099 LearningRate 0.0256 Epoch: 26 Global Step: 66760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:50,562-Speed 12963.98 samples/sec Loss 4.4728 LearningRate 0.0256 Epoch: 26 Global Step: 66770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:52,159-Speed 12828.74 samples/sec Loss 4.5414 LearningRate 0.0256 Epoch: 26 Global Step: 66780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:53,716-Speed 13160.65 samples/sec Loss 4.5591 LearningRate 0.0256 Epoch: 26 Global Step: 66790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 16:58:55,312-Speed 12841.31 samples/sec Loss 4.5346 LearningRate 0.0256 Epoch: 26 Global Step: 66800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:56,874-Speed 13118.16 samples/sec Loss 4.5118 LearningRate 0.0256 Epoch: 26 Global Step: 66810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:58:58,431-Speed 13160.43 samples/sec Loss 4.5378 LearningRate 0.0255 Epoch: 26 Global Step: 66820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:00,012-Speed 12961.78 samples/sec Loss 4.5710 LearningRate 0.0255 Epoch: 26 Global Step: 66830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:01,581-Speed 13055.46 samples/sec Loss 4.5905 LearningRate 0.0255 Epoch: 26 Global Step: 66840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:03,147-Speed 13088.45 samples/sec Loss 4.5768 LearningRate 0.0255 Epoch: 26 Global Step: 66850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:04,723-Speed 13004.61 samples/sec Loss 4.5494 LearningRate 0.0255 Epoch: 26 Global Step: 66860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:06,282-Speed 13145.33 samples/sec Loss 4.5405 LearningRate 0.0255 Epoch: 26 Global Step: 66870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:07,851-Speed 13064.59 samples/sec Loss 4.6034 LearningRate 0.0254 Epoch: 26 Global Step: 66880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:09,427-Speed 13001.15 samples/sec Loss 4.5300 LearningRate 0.0254 Epoch: 26 Global Step: 66890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:10,990-Speed 13105.92 samples/sec Loss 4.5001 LearningRate 0.0254 Epoch: 26 Global Step: 66900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:12,580-Speed 12887.72 samples/sec Loss 4.6279 LearningRate 0.0254 Epoch: 26 Global Step: 66910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:14,141-Speed 13146.99 samples/sec Loss 4.5066 LearningRate 0.0254 Epoch: 26 Global Step: 66920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:15,708-Speed 13078.09 samples/sec Loss 4.6297 LearningRate 0.0254 Epoch: 26 Global Step: 66930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:17,281-Speed 13029.85 samples/sec Loss 4.6439 LearningRate 0.0254 Epoch: 26 Global Step: 66940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:18,838-Speed 13157.49 samples/sec Loss 4.5447 LearningRate 0.0253 Epoch: 26 Global Step: 66950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:20,423-Speed 12928.07 samples/sec Loss 4.5681 LearningRate 0.0253 Epoch: 26 Global Step: 66960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:21,997-Speed 13019.19 samples/sec Loss 4.6285 LearningRate 0.0253 Epoch: 26 Global Step: 66970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:23,559-Speed 13124.63 samples/sec Loss 4.5320 LearningRate 0.0253 Epoch: 26 Global Step: 66980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:25,159-Speed 12799.30 samples/sec Loss 4.6477 LearningRate 0.0253 Epoch: 26 Global Step: 66990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:26,729-Speed 13053.17 samples/sec Loss 4.5492 LearningRate 0.0253 Epoch: 26 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:28,308-Speed 12976.26 samples/sec Loss 4.6396 LearningRate 0.0253 Epoch: 26 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:29,891-Speed 12951.32 samples/sec Loss 4.6409 LearningRate 0.0252 Epoch: 26 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:31,453-Speed 13112.13 samples/sec Loss 4.6833 LearningRate 0.0252 Epoch: 26 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:33,049-Speed 12840.37 samples/sec Loss 4.7006 LearningRate 0.0252 Epoch: 26 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:34,620-Speed 13045.96 samples/sec Loss 4.5794 LearningRate 0.0252 Epoch: 26 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:36,204-Speed 12933.02 samples/sec Loss 4.5649 LearningRate 0.0252 Epoch: 26 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:37,771-Speed 13079.14 samples/sec Loss 4.6119 LearningRate 0.0252 Epoch: 26 Global Step: 67070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:39,332-Speed 13125.82 samples/sec Loss 4.6238 LearningRate 0.0252 Epoch: 26 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:40,942-Speed 12729.52 samples/sec Loss 4.6727 LearningRate 0.0251 Epoch: 26 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:42,515-Speed 13026.47 samples/sec Loss 4.5543 LearningRate 0.0251 Epoch: 26 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:44,100-Speed 12927.67 samples/sec Loss 4.6001 LearningRate 0.0251 Epoch: 26 Global Step: 67110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:45,668-Speed 13065.36 samples/sec Loss 4.6147 LearningRate 0.0251 Epoch: 26 Global Step: 67120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:47,252-Speed 12934.07 samples/sec Loss 4.5629 LearningRate 0.0251 Epoch: 26 Global Step: 67130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:48,821-Speed 13069.18 samples/sec Loss 4.6320 LearningRate 0.0251 Epoch: 26 Global Step: 67140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:50,394-Speed 13020.33 samples/sec Loss 4.5798 LearningRate 0.0250 Epoch: 26 Global Step: 67150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:51,951-Speed 13161.21 samples/sec Loss 4.6719 LearningRate 0.0250 Epoch: 26 Global Step: 67160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 16:59:53,522-Speed 13044.39 samples/sec Loss 4.6105 LearningRate 0.0250 Epoch: 26 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:55,103-Speed 12961.80 samples/sec Loss 4.5567 LearningRate 0.0250 Epoch: 26 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:56,657-Speed 13188.45 samples/sec Loss 4.6944 LearningRate 0.0250 Epoch: 26 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:58,232-Speed 13007.08 samples/sec Loss 4.6481 LearningRate 0.0250 Epoch: 26 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 16:59:59,782-Speed 13228.97 samples/sec Loss 4.6426 LearningRate 0.0250 Epoch: 26 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:00:01,375-Speed 12864.67 samples/sec Loss 4.6606 LearningRate 0.0249 Epoch: 26 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:00:02,928-Speed 13195.20 samples/sec Loss 4.6799 LearningRate 0.0249 Epoch: 26 Global Step: 67230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:04,498-Speed 13052.30 samples/sec Loss 4.6350 LearningRate 0.0249 Epoch: 26 Global Step: 67240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:06,037-Speed 13310.81 samples/sec Loss 4.6182 LearningRate 0.0249 Epoch: 26 Global Step: 67250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:07,627-Speed 12890.30 samples/sec Loss 4.6604 LearningRate 0.0249 Epoch: 26 Global Step: 67260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:09,193-Speed 13086.21 samples/sec Loss 4.5164 LearningRate 0.0249 Epoch: 26 Global Step: 67270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:10,753-Speed 13135.96 samples/sec Loss 4.6229 LearningRate 0.0249 Epoch: 26 Global Step: 67280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:12,322-Speed 13053.92 samples/sec Loss 4.6297 LearningRate 0.0248 Epoch: 26 Global Step: 67290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:13,883-Speed 13220.83 samples/sec Loss 4.6651 LearningRate 0.0248 Epoch: 26 Global Step: 67300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:15,436-Speed 13190.59 samples/sec Loss 4.6027 LearningRate 0.0248 Epoch: 26 Global Step: 67310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:17,015-Speed 12978.00 samples/sec Loss 4.5642 LearningRate 0.0248 Epoch: 26 Global Step: 67320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:18,610-Speed 12841.71 samples/sec Loss 4.5941 LearningRate 0.0248 Epoch: 26 Global Step: 67330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:20,189-Speed 12981.38 samples/sec Loss 4.7406 LearningRate 0.0248 Epoch: 26 Global Step: 67340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:21,759-Speed 13051.19 samples/sec Loss 4.6320 LearningRate 0.0248 Epoch: 26 Global Step: 67350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:23,332-Speed 13026.40 samples/sec Loss 4.6931 LearningRate 0.0247 Epoch: 26 Global Step: 67360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:24,902-Speed 13051.74 samples/sec Loss 4.6982 LearningRate 0.0247 Epoch: 26 Global Step: 67370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:26,479-Speed 12994.10 samples/sec Loss 4.5869 LearningRate 0.0247 Epoch: 26 Global Step: 67380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:28,051-Speed 13033.40 samples/sec Loss 4.6862 LearningRate 0.0247 Epoch: 26 Global Step: 67390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:29,612-Speed 13128.37 samples/sec Loss 4.6053 LearningRate 0.0247 Epoch: 26 Global Step: 67400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:00:31,200-Speed 12906.97 samples/sec Loss 4.6926 LearningRate 0.0247 Epoch: 26 Global Step: 67410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:32,761-Speed 13123.58 samples/sec Loss 4.6965 LearningRate 0.0247 Epoch: 26 Global Step: 67420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:34,343-Speed 12957.99 samples/sec Loss 4.5956 LearningRate 0.0246 Epoch: 26 Global Step: 67430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:35,904-Speed 13119.82 samples/sec Loss 4.5815 LearningRate 0.0246 Epoch: 26 Global Step: 67440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:37,464-Speed 13132.12 samples/sec Loss 4.6272 LearningRate 0.0246 Epoch: 26 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:39,045-Speed 12965.53 samples/sec Loss 4.6278 LearningRate 0.0246 Epoch: 26 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:40,626-Speed 12964.61 samples/sec Loss 4.6580 LearningRate 0.0246 Epoch: 26 Global Step: 67470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:42,193-Speed 13074.79 samples/sec Loss 4.7559 LearningRate 0.0246 Epoch: 26 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:43,781-Speed 12906.53 samples/sec Loss 4.6298 LearningRate 0.0245 Epoch: 26 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:45,340-Speed 13136.28 samples/sec Loss 4.6494 LearningRate 0.0245 Epoch: 26 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:46,901-Speed 13129.60 samples/sec Loss 4.5880 LearningRate 0.0245 Epoch: 26 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:48,454-Speed 13195.12 samples/sec Loss 4.6233 LearningRate 0.0245 Epoch: 26 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:50,028-Speed 13040.68 samples/sec Loss 4.6727 LearningRate 0.0245 Epoch: 26 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:51,630-Speed 12787.46 samples/sec Loss 4.6695 LearningRate 0.0245 Epoch: 26 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:53,199-Speed 13062.38 samples/sec Loss 4.5800 LearningRate 0.0245 Epoch: 26 Global Step: 67550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:54,768-Speed 13056.70 samples/sec Loss 4.6955 LearningRate 0.0244 Epoch: 26 Global Step: 67560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:56,328-Speed 13137.21 samples/sec Loss 4.6897 LearningRate 0.0244 Epoch: 26 Global Step: 67570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:57,900-Speed 13031.66 samples/sec Loss 4.6431 LearningRate 0.0244 Epoch: 26 Global Step: 67580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:00:59,489-Speed 12903.57 samples/sec Loss 4.6747 LearningRate 0.0244 Epoch: 26 Global Step: 67590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:01,086-Speed 12827.87 samples/sec Loss 4.6501 LearningRate 0.0244 Epoch: 26 Global Step: 67600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:02,633-Speed 13245.78 samples/sec Loss 4.6692 LearningRate 0.0244 Epoch: 26 Global Step: 67610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:04,227-Speed 12861.37 samples/sec Loss 4.7402 LearningRate 0.0244 Epoch: 26 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:05,827-Speed 12808.29 samples/sec Loss 4.6829 LearningRate 0.0243 Epoch: 26 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:07,416-Speed 12893.83 samples/sec Loss 4.6565 LearningRate 0.0243 Epoch: 26 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:08,989-Speed 13029.36 samples/sec Loss 4.7080 LearningRate 0.0243 Epoch: 26 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:10,563-Speed 13013.25 samples/sec Loss 4.7206 LearningRate 0.0243 Epoch: 26 Global Step: 67660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:12,142-Speed 12976.37 samples/sec Loss 4.6371 LearningRate 0.0243 Epoch: 26 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:13,732-Speed 12892.46 samples/sec Loss 4.6851 LearningRate 0.0243 Epoch: 26 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:15,292-Speed 13135.33 samples/sec Loss 4.6862 LearningRate 0.0243 Epoch: 26 Global Step: 67690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:16,855-Speed 13112.47 samples/sec Loss 4.7094 LearningRate 0.0242 Epoch: 26 Global Step: 67700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:18,422-Speed 13071.05 samples/sec Loss 4.6524 LearningRate 0.0242 Epoch: 26 Global Step: 67710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:20,002-Speed 12968.01 samples/sec Loss 4.7122 LearningRate 0.0242 Epoch: 26 Global Step: 67720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:21,579-Speed 12996.18 samples/sec Loss 4.7477 LearningRate 0.0242 Epoch: 26 Global Step: 67730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:23,162-Speed 12945.52 samples/sec Loss 4.6040 LearningRate 0.0242 Epoch: 26 Global Step: 67740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:24,729-Speed 13074.97 samples/sec Loss 4.6625 LearningRate 0.0242 Epoch: 26 Global Step: 67750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:26,298-Speed 13062.72 samples/sec Loss 4.6719 LearningRate 0.0242 Epoch: 26 Global Step: 67760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:27,860-Speed 13112.35 samples/sec Loss 4.7128 LearningRate 0.0241 Epoch: 26 Global Step: 67770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:29,466-Speed 12763.84 samples/sec Loss 4.7132 LearningRate 0.0241 Epoch: 26 Global Step: 67780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:31,021-Speed 13175.14 samples/sec Loss 4.7685 LearningRate 0.0241 Epoch: 26 Global Step: 67790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:32,622-Speed 12800.56 samples/sec Loss 4.7022 LearningRate 0.0241 Epoch: 26 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:34,196-Speed 13019.30 samples/sec Loss 4.6790 LearningRate 0.0241 Epoch: 26 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:35,753-Speed 13163.45 samples/sec Loss 4.6714 LearningRate 0.0241 Epoch: 26 Global Step: 67820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:37,330-Speed 12991.24 samples/sec Loss 4.6477 LearningRate 0.0241 Epoch: 26 Global Step: 67830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:38,893-Speed 13113.66 samples/sec Loss 4.7402 LearningRate 0.0240 Epoch: 26 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:40,455-Speed 13118.82 samples/sec Loss 4.7123 LearningRate 0.0240 Epoch: 26 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:42,011-Speed 13162.21 samples/sec Loss 4.6823 LearningRate 0.0240 Epoch: 26 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:01:43,575-Speed 13099.80 samples/sec Loss 4.7132 LearningRate 0.0240 Epoch: 26 Global Step: 67870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:45,151-Speed 13018.92 samples/sec Loss 4.7408 LearningRate 0.0240 Epoch: 26 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:46,714-Speed 13111.19 samples/sec Loss 4.6629 LearningRate 0.0240 Epoch: 26 Global Step: 67890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:48,292-Speed 12985.41 samples/sec Loss 4.6855 LearningRate 0.0240 Epoch: 26 Global Step: 67900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:49,879-Speed 12912.42 samples/sec Loss 4.7632 LearningRate 0.0239 Epoch: 26 Global Step: 67910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:51,455-Speed 13005.94 samples/sec Loss 4.7004 LearningRate 0.0239 Epoch: 26 Global Step: 67920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:53,030-Speed 13007.20 samples/sec Loss 4.7140 LearningRate 0.0239 Epoch: 26 Global Step: 67930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:01:54,614-Speed 12936.85 samples/sec Loss 4.6378 LearningRate 0.0239 Epoch: 26 Global Step: 67940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:01:56,164-Speed 13222.82 samples/sec Loss 4.7053 LearningRate 0.0239 Epoch: 26 Global Step: 67950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:01:57,734-Speed 13048.16 samples/sec Loss 4.6763 LearningRate 0.0239 Epoch: 26 Global Step: 67960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:01:59,315-Speed 12970.52 samples/sec Loss 4.7197 LearningRate 0.0239 Epoch: 26 Global Step: 67970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:02:00,896-Speed 12956.06 samples/sec Loss 4.6446 LearningRate 0.0238 Epoch: 26 Global Step: 67980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:02:02,465-Speed 13059.55 samples/sec Loss 4.7125 LearningRate 0.0238 Epoch: 26 Global Step: 67990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:02:04,058-Speed 12867.37 samples/sec Loss 4.6409 LearningRate 0.0238 Epoch: 26 Global Step: 68000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:02:05,626-Speed 13065.06 samples/sec Loss 4.7086 LearningRate 0.0238 Epoch: 26 Global Step: 68010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:02:07,192-Speed 13083.50 samples/sec Loss 4.6619 LearningRate 0.0238 Epoch: 26 Global Step: 68020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:02:08,760-Speed 13065.15 samples/sec Loss 4.7502 LearningRate 0.0238 Epoch: 26 Global Step: 68030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 17:02:10,341-Speed 12963.92 samples/sec Loss 4.7283 LearningRate 0.0238 Epoch: 26 Global Step: 68040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:11,941-Speed 12804.61 samples/sec Loss 4.6975 LearningRate 0.0237 Epoch: 26 Global Step: 68050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:13,535-Speed 12851.44 samples/sec Loss 4.7802 LearningRate 0.0237 Epoch: 26 Global Step: 68060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:15,100-Speed 13100.60 samples/sec Loss 4.6844 LearningRate 0.0237 Epoch: 26 Global Step: 68070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:16,676-Speed 13005.45 samples/sec Loss 4.6390 LearningRate 0.0237 Epoch: 26 Global Step: 68080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:18,258-Speed 12943.30 samples/sec Loss 4.6614 LearningRate 0.0237 Epoch: 26 Global Step: 68090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:19,847-Speed 12902.18 samples/sec Loss 4.7259 LearningRate 0.0237 Epoch: 26 Global Step: 68100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:21,407-Speed 13132.87 samples/sec Loss 4.7220 LearningRate 0.0237 Epoch: 26 Global Step: 68110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:22,998-Speed 12875.37 samples/sec Loss 4.7064 LearningRate 0.0236 Epoch: 26 Global Step: 68120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:24,591-Speed 12861.77 samples/sec Loss 4.7134 LearningRate 0.0236 Epoch: 26 Global Step: 68130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:02:26,167-Speed 13005.77 samples/sec Loss 4.7261 LearningRate 0.0236 Epoch: 26 Global Step: 68140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:27,746-Speed 12979.10 samples/sec Loss 4.7681 LearningRate 0.0236 Epoch: 26 Global Step: 68150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:29,319-Speed 13024.89 samples/sec Loss 4.8237 LearningRate 0.0236 Epoch: 26 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:30,898-Speed 12977.46 samples/sec Loss 4.7034 LearningRate 0.0236 Epoch: 26 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:32,485-Speed 12912.56 samples/sec Loss 4.7167 LearningRate 0.0236 Epoch: 26 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:34,042-Speed 13163.96 samples/sec Loss 4.6802 LearningRate 0.0235 Epoch: 26 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:35,631-Speed 12893.47 samples/sec Loss 4.7056 LearningRate 0.0235 Epoch: 26 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:37,211-Speed 12964.71 samples/sec Loss 4.6297 LearningRate 0.0235 Epoch: 26 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:38,783-Speed 13036.52 samples/sec Loss 4.6970 LearningRate 0.0235 Epoch: 26 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:40,353-Speed 13056.78 samples/sec Loss 4.6489 LearningRate 0.0235 Epoch: 26 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:41,933-Speed 12964.47 samples/sec Loss 4.6949 LearningRate 0.0235 Epoch: 26 Global Step: 68240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 17:02:43,486-Speed 13193.94 samples/sec Loss 4.6323 LearningRate 0.0235 Epoch: 26 Global Step: 68250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 17:02:45,054-Speed 13072.70 samples/sec Loss 4.7074 LearningRate 0.0234 Epoch: 26 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:46,714-Speed 12342.65 samples/sec Loss 4.6093 LearningRate 0.0234 Epoch: 26 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:02:48,172-Speed 14053.22 samples/sec Loss 4.7717 LearningRate 0.0234 Epoch: 26 Global Step: 68280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:03,683-Speed 1320.45 samples/sec Loss 4.2382 LearningRate 0.0234 Epoch: 27 Global Step: 68290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:05,281-Speed 12828.50 samples/sec Loss 4.1262 LearningRate 0.0234 Epoch: 27 Global Step: 68300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:06,832-Speed 13207.90 samples/sec Loss 4.0391 LearningRate 0.0234 Epoch: 27 Global Step: 68310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:08,433-Speed 12823.89 samples/sec Loss 4.0518 LearningRate 0.0234 Epoch: 27 Global Step: 68320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:10,013-Speed 12966.84 samples/sec Loss 4.0273 LearningRate 0.0233 Epoch: 27 Global Step: 68330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:11,591-Speed 12979.00 samples/sec Loss 4.0337 LearningRate 0.0233 Epoch: 27 Global Step: 68340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:13,183-Speed 12874.27 samples/sec Loss 4.0942 LearningRate 0.0233 Epoch: 27 Global Step: 68350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:14,757-Speed 13025.11 samples/sec Loss 4.0301 LearningRate 0.0233 Epoch: 27 Global Step: 68360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:16,312-Speed 13175.78 samples/sec Loss 4.1146 LearningRate 0.0233 Epoch: 27 Global Step: 68370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:17,865-Speed 13190.53 samples/sec Loss 4.0785 LearningRate 0.0233 Epoch: 27 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:03:19,425-Speed 13142.33 samples/sec Loss 4.0761 LearningRate 0.0233 Epoch: 27 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:03:20,981-Speed 13169.49 samples/sec Loss 4.1367 LearningRate 0.0232 Epoch: 27 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:03:22,566-Speed 12926.00 samples/sec Loss 4.0354 LearningRate 0.0232 Epoch: 27 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:03:24,119-Speed 13197.66 samples/sec Loss 4.1416 LearningRate 0.0232 Epoch: 27 Global Step: 68420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:25,686-Speed 13076.14 samples/sec Loss 4.1458 LearningRate 0.0232 Epoch: 27 Global Step: 68430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:27,249-Speed 13109.73 samples/sec Loss 4.0967 LearningRate 0.0232 Epoch: 27 Global Step: 68440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:28,830-Speed 12960.93 samples/sec Loss 4.1471 LearningRate 0.0232 Epoch: 27 Global Step: 68450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:30,410-Speed 12972.72 samples/sec Loss 4.0822 LearningRate 0.0232 Epoch: 27 Global Step: 68460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:32,001-Speed 12882.85 samples/sec Loss 4.1340 LearningRate 0.0231 Epoch: 27 Global Step: 68470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:33,542-Speed 13297.03 samples/sec Loss 4.1668 LearningRate 0.0231 Epoch: 27 Global Step: 68480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:35,137-Speed 12848.78 samples/sec Loss 4.1441 LearningRate 0.0231 Epoch: 27 Global Step: 68490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:36,700-Speed 13106.86 samples/sec Loss 4.1479 LearningRate 0.0231 Epoch: 27 Global Step: 68500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:38,276-Speed 12999.89 samples/sec Loss 4.1572 LearningRate 0.0231 Epoch: 27 Global Step: 68510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:39,816-Speed 13303.45 samples/sec Loss 4.1730 LearningRate 0.0231 Epoch: 27 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:03:41,403-Speed 12912.11 samples/sec Loss 4.1307 LearningRate 0.0231 Epoch: 27 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:03:42,988-Speed 12928.62 samples/sec Loss 4.1664 LearningRate 0.0230 Epoch: 27 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:03:44,542-Speed 13187.24 samples/sec Loss 4.1775 LearningRate 0.0230 Epoch: 27 Global Step: 68550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:46,124-Speed 12954.21 samples/sec Loss 4.1735 LearningRate 0.0230 Epoch: 27 Global Step: 68560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:47,673-Speed 13225.91 samples/sec Loss 4.1750 LearningRate 0.0230 Epoch: 27 Global Step: 68570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:49,259-Speed 12917.05 samples/sec Loss 4.2280 LearningRate 0.0230 Epoch: 27 Global Step: 68580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:50,832-Speed 13026.28 samples/sec Loss 4.1999 LearningRate 0.0230 Epoch: 27 Global Step: 68590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:52,374-Speed 13292.65 samples/sec Loss 4.1909 LearningRate 0.0230 Epoch: 27 Global Step: 68600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:53,964-Speed 12885.85 samples/sec Loss 4.1547 LearningRate 0.0229 Epoch: 27 Global Step: 68610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:55,530-Speed 13092.89 samples/sec Loss 4.2497 LearningRate 0.0229 Epoch: 27 Global Step: 68620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:57,085-Speed 13175.35 samples/sec Loss 4.2143 LearningRate 0.0229 Epoch: 27 Global Step: 68630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:03:58,652-Speed 13070.79 samples/sec Loss 4.2001 LearningRate 0.0229 Epoch: 27 Global Step: 68640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:00,201-Speed 13236.64 samples/sec Loss 4.1465 LearningRate 0.0229 Epoch: 27 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:01,756-Speed 13176.03 samples/sec Loss 4.2374 LearningRate 0.0229 Epoch: 27 Global Step: 68660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:03,336-Speed 12969.20 samples/sec Loss 4.2376 LearningRate 0.0229 Epoch: 27 Global Step: 68670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:04,910-Speed 13022.35 samples/sec Loss 4.1682 LearningRate 0.0228 Epoch: 27 Global Step: 68680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:06,477-Speed 13074.69 samples/sec Loss 4.2010 LearningRate 0.0228 Epoch: 27 Global Step: 68690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:08,086-Speed 12731.48 samples/sec Loss 4.2458 LearningRate 0.0228 Epoch: 27 Global Step: 68700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:09,649-Speed 13113.17 samples/sec Loss 4.2570 LearningRate 0.0228 Epoch: 27 Global Step: 68710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:11,204-Speed 13172.15 samples/sec Loss 4.2900 LearningRate 0.0228 Epoch: 27 Global Step: 68720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:12,781-Speed 12998.82 samples/sec Loss 4.2075 LearningRate 0.0228 Epoch: 27 Global Step: 68730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:14,346-Speed 13094.94 samples/sec Loss 4.1978 LearningRate 0.0228 Epoch: 27 Global Step: 68740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:15,904-Speed 13154.21 samples/sec Loss 4.1941 LearningRate 0.0227 Epoch: 27 Global Step: 68750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:17,464-Speed 13129.36 samples/sec Loss 4.2583 LearningRate 0.0227 Epoch: 27 Global Step: 68760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:19,253-Speed 11459.39 samples/sec Loss 4.2936 LearningRate 0.0227 Epoch: 27 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:20,810-Speed 13153.61 samples/sec Loss 4.1976 LearningRate 0.0227 Epoch: 27 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:22,379-Speed 13060.33 samples/sec Loss 4.2637 LearningRate 0.0227 Epoch: 27 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:23,938-Speed 13140.36 samples/sec Loss 4.2927 LearningRate 0.0227 Epoch: 27 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:25,503-Speed 13101.85 samples/sec Loss 4.2356 LearningRate 0.0227 Epoch: 27 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:27,072-Speed 13056.34 samples/sec Loss 4.3704 LearningRate 0.0226 Epoch: 27 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:28,651-Speed 12977.46 samples/sec Loss 4.3364 LearningRate 0.0226 Epoch: 27 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:30,210-Speed 13141.58 samples/sec Loss 4.2892 LearningRate 0.0226 Epoch: 27 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:31,779-Speed 13060.24 samples/sec Loss 4.2772 LearningRate 0.0226 Epoch: 27 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:33,365-Speed 12916.73 samples/sec Loss 4.2573 LearningRate 0.0226 Epoch: 27 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:34,923-Speed 13157.75 samples/sec Loss 4.2985 LearningRate 0.0226 Epoch: 27 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 17:04:36,462-Speed 13318.92 samples/sec Loss 4.3035 LearningRate 0.0226 Epoch: 27 Global Step: 68880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:38,029-Speed 13069.72 samples/sec Loss 4.3151 LearningRate 0.0226 Epoch: 27 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:39,618-Speed 12901.74 samples/sec Loss 4.2085 LearningRate 0.0225 Epoch: 27 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:41,183-Speed 13090.68 samples/sec Loss 4.2939 LearningRate 0.0225 Epoch: 27 Global Step: 68910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:42,731-Speed 13236.89 samples/sec Loss 4.3692 LearningRate 0.0225 Epoch: 27 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:44,295-Speed 13106.51 samples/sec Loss 4.2684 LearningRate 0.0225 Epoch: 27 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:45,870-Speed 13006.10 samples/sec Loss 4.2945 LearningRate 0.0225 Epoch: 27 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:47,438-Speed 13071.30 samples/sec Loss 4.2957 LearningRate 0.0225 Epoch: 27 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:49,008-Speed 13044.20 samples/sec Loss 4.2743 LearningRate 0.0225 Epoch: 27 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:50,573-Speed 13100.77 samples/sec Loss 4.3224 LearningRate 0.0224 Epoch: 27 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:52,146-Speed 13019.08 samples/sec Loss 4.3276 LearningRate 0.0224 Epoch: 27 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:53,710-Speed 13102.28 samples/sec Loss 4.3295 LearningRate 0.0224 Epoch: 27 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:55,300-Speed 12890.59 samples/sec Loss 4.3645 LearningRate 0.0224 Epoch: 27 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:04:56,862-Speed 13113.37 samples/sec Loss 4.2922 LearningRate 0.0224 Epoch: 27 Global Step: 69010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:58,417-Speed 13182.82 samples/sec Loss 4.3242 LearningRate 0.0224 Epoch: 27 Global Step: 69020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:04:59,980-Speed 13112.41 samples/sec Loss 4.3312 LearningRate 0.0224 Epoch: 27 Global Step: 69030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:01,550-Speed 13050.19 samples/sec Loss 4.2839 LearningRate 0.0223 Epoch: 27 Global Step: 69040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:03,107-Speed 13163.98 samples/sec Loss 4.3353 LearningRate 0.0223 Epoch: 27 Global Step: 69050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:04,681-Speed 13018.73 samples/sec Loss 4.3781 LearningRate 0.0223 Epoch: 27 Global Step: 69060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:06,239-Speed 13149.84 samples/sec Loss 4.3721 LearningRate 0.0223 Epoch: 27 Global Step: 69070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:07,830-Speed 12879.11 samples/sec Loss 4.3529 LearningRate 0.0223 Epoch: 27 Global Step: 69080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:09,394-Speed 13102.71 samples/sec Loss 4.3047 LearningRate 0.0223 Epoch: 27 Global Step: 69090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:10,998-Speed 12773.19 samples/sec Loss 4.3759 LearningRate 0.0223 Epoch: 27 Global Step: 69100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:12,559-Speed 13131.80 samples/sec Loss 4.3877 LearningRate 0.0222 Epoch: 27 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:14,159-Speed 12805.65 samples/sec Loss 4.3311 LearningRate 0.0222 Epoch: 27 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:15,749-Speed 12883.30 samples/sec Loss 4.3541 LearningRate 0.0222 Epoch: 27 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:17,298-Speed 13230.41 samples/sec Loss 4.4406 LearningRate 0.0222 Epoch: 27 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:18,852-Speed 13185.87 samples/sec Loss 4.4289 LearningRate 0.0222 Epoch: 27 Global Step: 69150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:20,431-Speed 12981.96 samples/sec Loss 4.4457 LearningRate 0.0222 Epoch: 27 Global Step: 69160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:22,017-Speed 12920.61 samples/sec Loss 4.3637 LearningRate 0.0222 Epoch: 27 Global Step: 69170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:23,609-Speed 12867.02 samples/sec Loss 4.3630 LearningRate 0.0221 Epoch: 27 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:25,194-Speed 12942.53 samples/sec Loss 4.4027 LearningRate 0.0221 Epoch: 27 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:26,740-Speed 13247.31 samples/sec Loss 4.3628 LearningRate 0.0221 Epoch: 27 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:28,296-Speed 13170.02 samples/sec Loss 4.3777 LearningRate 0.0221 Epoch: 27 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:29,865-Speed 13059.15 samples/sec Loss 4.4510 LearningRate 0.0221 Epoch: 27 Global Step: 69220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:31,429-Speed 13106.06 samples/sec Loss 4.4205 LearningRate 0.0221 Epoch: 27 Global Step: 69230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:33,004-Speed 13008.50 samples/sec Loss 4.3705 LearningRate 0.0221 Epoch: 27 Global Step: 69240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:34,585-Speed 12962.90 samples/sec Loss 4.4630 LearningRate 0.0221 Epoch: 27 Global Step: 69250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:36,149-Speed 13095.73 samples/sec Loss 4.3875 LearningRate 0.0220 Epoch: 27 Global Step: 69260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:37,726-Speed 12996.84 samples/sec Loss 4.3922 LearningRate 0.0220 Epoch: 27 Global Step: 69270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:39,291-Speed 13088.86 samples/sec Loss 4.4682 LearningRate 0.0220 Epoch: 27 Global Step: 69280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:40,862-Speed 13041.59 samples/sec Loss 4.4083 LearningRate 0.0220 Epoch: 27 Global Step: 69290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:42,432-Speed 13053.85 samples/sec Loss 4.4592 LearningRate 0.0220 Epoch: 27 Global Step: 69300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:44,000-Speed 13075.07 samples/sec Loss 4.4264 LearningRate 0.0220 Epoch: 27 Global Step: 69310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:45,559-Speed 13138.48 samples/sec Loss 4.3895 LearningRate 0.0220 Epoch: 27 Global Step: 69320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:47,127-Speed 13070.09 samples/sec Loss 4.3679 LearningRate 0.0219 Epoch: 27 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:05:48,681-Speed 13181.77 samples/sec Loss 4.4049 LearningRate 0.0219 Epoch: 27 Global Step: 69340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:50,244-Speed 13118.34 samples/sec Loss 4.3759 LearningRate 0.0219 Epoch: 27 Global Step: 69350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:51,833-Speed 12897.22 samples/sec Loss 4.3946 LearningRate 0.0219 Epoch: 27 Global Step: 69360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:53,426-Speed 12860.48 samples/sec Loss 4.4675 LearningRate 0.0219 Epoch: 27 Global Step: 69370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:55,005-Speed 12979.80 samples/sec Loss 4.4578 LearningRate 0.0219 Epoch: 27 Global Step: 69380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:56,570-Speed 13090.35 samples/sec Loss 4.3742 LearningRate 0.0219 Epoch: 27 Global Step: 69390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:58,137-Speed 13071.53 samples/sec Loss 4.4014 LearningRate 0.0218 Epoch: 27 Global Step: 69400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:05:59,691-Speed 13189.85 samples/sec Loss 4.4617 LearningRate 0.0218 Epoch: 27 Global Step: 69410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:01,255-Speed 13097.71 samples/sec Loss 4.4802 LearningRate 0.0218 Epoch: 27 Global Step: 69420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:02,838-Speed 12949.33 samples/sec Loss 4.4513 LearningRate 0.0218 Epoch: 27 Global Step: 69430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:04,402-Speed 13105.52 samples/sec Loss 4.3941 LearningRate 0.0218 Epoch: 27 Global Step: 69440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:06:05,976-Speed 13012.61 samples/sec Loss 4.4518 LearningRate 0.0218 Epoch: 27 Global Step: 69450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:06:07,556-Speed 12969.83 samples/sec Loss 4.4114 LearningRate 0.0218 Epoch: 27 Global Step: 69460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:06:09,132-Speed 13004.32 samples/sec Loss 4.4029 LearningRate 0.0217 Epoch: 27 Global Step: 69470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:06:10,697-Speed 13095.13 samples/sec Loss 4.4812 LearningRate 0.0217 Epoch: 27 Global Step: 69480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:12,267-Speed 13052.79 samples/sec Loss 4.3248 LearningRate 0.0217 Epoch: 27 Global Step: 69490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:13,847-Speed 12967.79 samples/sec Loss 4.4857 LearningRate 0.0217 Epoch: 27 Global Step: 69500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:15,409-Speed 13121.37 samples/sec Loss 4.3647 LearningRate 0.0217 Epoch: 27 Global Step: 69510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:16,962-Speed 13194.68 samples/sec Loss 4.4100 LearningRate 0.0217 Epoch: 27 Global Step: 69520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:18,530-Speed 13065.26 samples/sec Loss 4.4630 LearningRate 0.0217 Epoch: 27 Global Step: 69530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:20,104-Speed 13021.58 samples/sec Loss 4.5046 LearningRate 0.0217 Epoch: 27 Global Step: 69540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:21,707-Speed 12780.80 samples/sec Loss 4.4106 LearningRate 0.0216 Epoch: 27 Global Step: 69550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:23,277-Speed 13050.95 samples/sec Loss 4.4450 LearningRate 0.0216 Epoch: 27 Global Step: 69560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:24,837-Speed 13136.00 samples/sec Loss 4.4677 LearningRate 0.0216 Epoch: 27 Global Step: 69570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:26,387-Speed 13217.48 samples/sec Loss 4.4614 LearningRate 0.0216 Epoch: 27 Global Step: 69580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:27,959-Speed 13036.40 samples/sec Loss 4.4740 LearningRate 0.0216 Epoch: 27 Global Step: 69590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:29,519-Speed 13165.09 samples/sec Loss 4.4603 LearningRate 0.0216 Epoch: 27 Global Step: 69600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:31,115-Speed 12841.69 samples/sec Loss 4.4603 LearningRate 0.0216 Epoch: 27 Global Step: 69610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:32,667-Speed 13198.82 samples/sec Loss 4.3845 LearningRate 0.0215 Epoch: 27 Global Step: 69620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:34,241-Speed 13012.72 samples/sec Loss 4.4443 LearningRate 0.0215 Epoch: 27 Global Step: 69630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:35,810-Speed 13065.88 samples/sec Loss 4.5256 LearningRate 0.0215 Epoch: 27 Global Step: 69640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:37,384-Speed 13017.23 samples/sec Loss 4.4722 LearningRate 0.0215 Epoch: 27 Global Step: 69650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:38,980-Speed 12836.03 samples/sec Loss 4.4469 LearningRate 0.0215 Epoch: 27 Global Step: 69660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:40,578-Speed 12833.04 samples/sec Loss 4.4972 LearningRate 0.0215 Epoch: 27 Global Step: 69670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:42,142-Speed 13095.77 samples/sec Loss 4.4764 LearningRate 0.0215 Epoch: 27 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:06:43,748-Speed 12757.79 samples/sec Loss 4.5180 LearningRate 0.0214 Epoch: 27 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:06:45,331-Speed 12945.45 samples/sec Loss 4.4941 LearningRate 0.0214 Epoch: 27 Global Step: 69700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:46,925-Speed 12852.74 samples/sec Loss 4.5413 LearningRate 0.0214 Epoch: 27 Global Step: 69710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:48,493-Speed 13070.91 samples/sec Loss 4.4616 LearningRate 0.0214 Epoch: 27 Global Step: 69720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:50,063-Speed 13056.77 samples/sec Loss 4.5350 LearningRate 0.0214 Epoch: 27 Global Step: 69730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:51,626-Speed 13108.23 samples/sec Loss 4.4695 LearningRate 0.0214 Epoch: 27 Global Step: 69740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:53,221-Speed 12850.89 samples/sec Loss 4.4958 LearningRate 0.0214 Epoch: 27 Global Step: 69750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:54,793-Speed 13035.63 samples/sec Loss 4.5441 LearningRate 0.0214 Epoch: 27 Global Step: 69760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:56,372-Speed 12976.90 samples/sec Loss 4.4722 LearningRate 0.0213 Epoch: 27 Global Step: 69770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:57,943-Speed 13035.62 samples/sec Loss 4.5128 LearningRate 0.0213 Epoch: 27 Global Step: 69780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:06:59,517-Speed 13025.44 samples/sec Loss 4.4664 LearningRate 0.0213 Epoch: 27 Global Step: 69790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:07:01,117-Speed 12805.73 samples/sec Loss 4.4727 LearningRate 0.0213 Epoch: 27 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:02,711-Speed 12853.09 samples/sec Loss 4.4915 LearningRate 0.0213 Epoch: 27 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:04,283-Speed 13036.28 samples/sec Loss 4.4816 LearningRate 0.0213 Epoch: 27 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:05,862-Speed 12982.15 samples/sec Loss 4.5362 LearningRate 0.0213 Epoch: 27 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:07,441-Speed 12970.68 samples/sec Loss 4.5009 LearningRate 0.0212 Epoch: 27 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:09,004-Speed 13112.68 samples/sec Loss 4.5016 LearningRate 0.0212 Epoch: 27 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:10,575-Speed 13039.11 samples/sec Loss 4.4811 LearningRate 0.0212 Epoch: 27 Global Step: 69860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:12,152-Speed 13000.78 samples/sec Loss 4.4910 LearningRate 0.0212 Epoch: 27 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:13,732-Speed 12965.10 samples/sec Loss 4.4502 LearningRate 0.0212 Epoch: 27 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:15,311-Speed 12979.84 samples/sec Loss 4.5661 LearningRate 0.0212 Epoch: 27 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:16,875-Speed 13103.20 samples/sec Loss 4.5254 LearningRate 0.0212 Epoch: 27 Global Step: 69900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 17:07:18,463-Speed 12902.18 samples/sec Loss 4.5738 LearningRate 0.0211 Epoch: 27 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:20,039-Speed 13010.65 samples/sec Loss 4.4965 LearningRate 0.0211 Epoch: 27 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:21,619-Speed 12969.93 samples/sec Loss 4.4527 LearningRate 0.0211 Epoch: 27 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:23,191-Speed 13028.96 samples/sec Loss 4.5475 LearningRate 0.0211 Epoch: 27 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:07:24,766-Speed 13016.06 samples/sec Loss 4.4788 LearningRate 0.0211 Epoch: 27 Global Step: 69950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:07:26,332-Speed 13078.69 samples/sec Loss 4.5164 LearningRate 0.0211 Epoch: 27 Global Step: 69960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:07:27,914-Speed 12953.59 samples/sec Loss 4.4904 LearningRate 0.0211 Epoch: 27 Global Step: 69970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:07:29,491-Speed 12996.92 samples/sec Loss 4.5547 LearningRate 0.0211 Epoch: 27 Global Step: 69980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:07:31,066-Speed 13009.84 samples/sec Loss 4.5127 LearningRate 0.0210 Epoch: 27 Global Step: 69990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:07:32,654-Speed 12903.29 samples/sec Loss 4.6339 LearningRate 0.0210 Epoch: 27 Global Step: 70000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:07:54,764-[lfw][70000]XNorm: 8.229949 Training: 2022-01-14 17:07:54,765-[lfw][70000]Accuracy-Flip: 0.99567+-0.00343 Training: 2022-01-14 17:07:54,765-[lfw][70000]Accuracy-Highest: 0.99650 Training: 2022-01-14 17:08:20,703-[cfp_fp][70000]XNorm: 6.972436 Training: 2022-01-14 17:08:20,704-[cfp_fp][70000]Accuracy-Flip: 0.96771+-0.00633 Training: 2022-01-14 17:08:20,705-[cfp_fp][70000]Accuracy-Highest: 0.96771 Training: 2022-01-14 17:08:43,127-[agedb_30][70000]XNorm: 7.961451 Training: 2022-01-14 17:08:43,128-[agedb_30][70000]Accuracy-Flip: 0.96533+-0.00547 Training: 2022-01-14 17:08:43,129-[agedb_30][70000]Accuracy-Highest: 0.96800 Training: 2022-01-14 17:08:44,691-Speed 284.30 samples/sec Loss 4.5618 LearningRate 0.0210 Epoch: 27 Global Step: 70010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:46,257-Speed 13081.38 samples/sec Loss 4.4888 LearningRate 0.0210 Epoch: 27 Global Step: 70020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:47,826-Speed 13059.83 samples/sec Loss 4.5061 LearningRate 0.0210 Epoch: 27 Global Step: 70030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:49,408-Speed 12945.13 samples/sec Loss 4.4795 LearningRate 0.0210 Epoch: 27 Global Step: 70040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:50,972-Speed 13105.47 samples/sec Loss 4.4867 LearningRate 0.0210 Epoch: 27 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:08:52,518-Speed 13252.16 samples/sec Loss 4.4696 LearningRate 0.0209 Epoch: 27 Global Step: 70060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:54,098-Speed 12982.65 samples/sec Loss 4.5294 LearningRate 0.0209 Epoch: 27 Global Step: 70070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:55,677-Speed 12975.73 samples/sec Loss 4.4762 LearningRate 0.0209 Epoch: 27 Global Step: 70080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:57,242-Speed 13090.58 samples/sec Loss 4.5123 LearningRate 0.0209 Epoch: 27 Global Step: 70090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:08:58,801-Speed 13149.25 samples/sec Loss 4.5568 LearningRate 0.0209 Epoch: 27 Global Step: 70100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:00,369-Speed 13065.76 samples/sec Loss 4.5198 LearningRate 0.0209 Epoch: 27 Global Step: 70110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:01,941-Speed 13035.35 samples/sec Loss 4.5636 LearningRate 0.0209 Epoch: 27 Global Step: 70120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:03,497-Speed 13173.33 samples/sec Loss 4.4967 LearningRate 0.0209 Epoch: 27 Global Step: 70130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:05,043-Speed 13258.56 samples/sec Loss 4.6082 LearningRate 0.0208 Epoch: 27 Global Step: 70140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:06,604-Speed 13123.67 samples/sec Loss 4.5512 LearningRate 0.0208 Epoch: 27 Global Step: 70150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:08,175-Speed 13045.61 samples/sec Loss 4.5950 LearningRate 0.0208 Epoch: 27 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:09,754-Speed 12976.24 samples/sec Loss 4.5779 LearningRate 0.0208 Epoch: 27 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:11,304-Speed 13216.40 samples/sec Loss 4.4886 LearningRate 0.0208 Epoch: 27 Global Step: 70180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:12,868-Speed 13107.07 samples/sec Loss 4.5514 LearningRate 0.0208 Epoch: 27 Global Step: 70190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:14,437-Speed 13054.69 samples/sec Loss 4.5902 LearningRate 0.0208 Epoch: 27 Global Step: 70200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:16,023-Speed 12916.90 samples/sec Loss 4.5719 LearningRate 0.0207 Epoch: 27 Global Step: 70210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:17,606-Speed 12946.72 samples/sec Loss 4.6036 LearningRate 0.0207 Epoch: 27 Global Step: 70220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:19,179-Speed 13030.58 samples/sec Loss 4.6301 LearningRate 0.0207 Epoch: 27 Global Step: 70230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:20,760-Speed 12981.65 samples/sec Loss 4.4600 LearningRate 0.0207 Epoch: 27 Global Step: 70240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:22,321-Speed 13128.51 samples/sec Loss 4.5093 LearningRate 0.0207 Epoch: 27 Global Step: 70250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:24,142-Speed 11252.09 samples/sec Loss 4.5479 LearningRate 0.0207 Epoch: 27 Global Step: 70260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:26,313-Speed 9437.55 samples/sec Loss 4.5466 LearningRate 0.0207 Epoch: 27 Global Step: 70270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:28,105-Speed 11431.87 samples/sec Loss 4.5769 LearningRate 0.0206 Epoch: 27 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:29,683-Speed 12988.95 samples/sec Loss 4.5813 LearningRate 0.0206 Epoch: 27 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:31,786-Speed 9741.35 samples/sec Loss 4.5301 LearningRate 0.0206 Epoch: 27 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:33,339-Speed 13193.24 samples/sec Loss 4.4039 LearningRate 0.0206 Epoch: 27 Global Step: 70310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:34,904-Speed 13092.82 samples/sec Loss 4.5950 LearningRate 0.0206 Epoch: 27 Global Step: 70320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:36,481-Speed 12990.98 samples/sec Loss 4.4527 LearningRate 0.0206 Epoch: 27 Global Step: 70330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:38,057-Speed 13006.94 samples/sec Loss 4.6179 LearningRate 0.0206 Epoch: 27 Global Step: 70340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:39,626-Speed 13061.05 samples/sec Loss 4.4734 LearningRate 0.0206 Epoch: 27 Global Step: 70350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:41,195-Speed 13057.93 samples/sec Loss 4.5072 LearningRate 0.0205 Epoch: 27 Global Step: 70360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:42,786-Speed 12874.76 samples/sec Loss 4.5949 LearningRate 0.0205 Epoch: 27 Global Step: 70370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:44,359-Speed 13031.08 samples/sec Loss 4.6654 LearningRate 0.0205 Epoch: 27 Global Step: 70380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:45,904-Speed 13263.23 samples/sec Loss 4.5265 LearningRate 0.0205 Epoch: 27 Global Step: 70390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:47,481-Speed 12991.84 samples/sec Loss 4.5393 LearningRate 0.0205 Epoch: 27 Global Step: 70400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:09:49,041-Speed 13138.64 samples/sec Loss 4.6375 LearningRate 0.0205 Epoch: 27 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:50,617-Speed 12999.72 samples/sec Loss 4.6197 LearningRate 0.0205 Epoch: 27 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:52,174-Speed 13163.50 samples/sec Loss 4.5697 LearningRate 0.0204 Epoch: 27 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:53,741-Speed 13078.74 samples/sec Loss 4.5971 LearningRate 0.0204 Epoch: 27 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:55,296-Speed 13177.27 samples/sec Loss 4.6063 LearningRate 0.0204 Epoch: 27 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:56,853-Speed 13151.69 samples/sec Loss 4.5059 LearningRate 0.0204 Epoch: 27 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:09:58,426-Speed 13032.84 samples/sec Loss 4.5317 LearningRate 0.0204 Epoch: 27 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:10:00,016-Speed 12889.46 samples/sec Loss 4.6605 LearningRate 0.0204 Epoch: 27 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:10:01,571-Speed 13174.49 samples/sec Loss 4.6609 LearningRate 0.0204 Epoch: 27 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:10:03,139-Speed 13066.61 samples/sec Loss 4.5577 LearningRate 0.0204 Epoch: 27 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:10:04,714-Speed 13014.72 samples/sec Loss 4.6349 LearningRate 0.0203 Epoch: 27 Global Step: 70510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 17:10:06,261-Speed 13244.45 samples/sec Loss 4.6145 LearningRate 0.0203 Epoch: 27 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:10:07,844-Speed 12948.43 samples/sec Loss 4.5923 LearningRate 0.0203 Epoch: 27 Global Step: 70530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 17:10:09,386-Speed 13288.28 samples/sec Loss 4.5945 LearningRate 0.0203 Epoch: 27 Global Step: 70540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:10,939-Speed 13189.73 samples/sec Loss 4.6760 LearningRate 0.0203 Epoch: 27 Global Step: 70550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:12,520-Speed 12962.57 samples/sec Loss 4.5749 LearningRate 0.0203 Epoch: 27 Global Step: 70560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:14,082-Speed 13130.97 samples/sec Loss 4.6022 LearningRate 0.0203 Epoch: 27 Global Step: 70570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:15,669-Speed 12911.57 samples/sec Loss 4.5359 LearningRate 0.0203 Epoch: 27 Global Step: 70580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:17,224-Speed 13175.46 samples/sec Loss 4.5024 LearningRate 0.0202 Epoch: 27 Global Step: 70590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:18,811-Speed 12915.82 samples/sec Loss 4.6098 LearningRate 0.0202 Epoch: 27 Global Step: 70600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:20,392-Speed 12960.76 samples/sec Loss 4.6011 LearningRate 0.0202 Epoch: 27 Global Step: 70610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:21,953-Speed 13118.83 samples/sec Loss 4.5951 LearningRate 0.0202 Epoch: 27 Global Step: 70620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:23,539-Speed 12927.63 samples/sec Loss 4.5873 LearningRate 0.0202 Epoch: 27 Global Step: 70630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 17:10:25,086-Speed 13248.41 samples/sec Loss 4.6208 LearningRate 0.0202 Epoch: 27 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:10:26,669-Speed 12936.92 samples/sec Loss 4.4743 LearningRate 0.0202 Epoch: 27 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:10:28,227-Speed 13159.44 samples/sec Loss 4.5515 LearningRate 0.0201 Epoch: 27 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:10:29,790-Speed 13111.14 samples/sec Loss 4.6096 LearningRate 0.0201 Epoch: 27 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:10:31,329-Speed 13315.41 samples/sec Loss 4.5378 LearningRate 0.0201 Epoch: 27 Global Step: 70680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:32,922-Speed 12862.59 samples/sec Loss 4.6397 LearningRate 0.0201 Epoch: 27 Global Step: 70690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:34,499-Speed 13006.42 samples/sec Loss 4.5745 LearningRate 0.0201 Epoch: 27 Global Step: 70700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:36,047-Speed 13241.66 samples/sec Loss 4.6781 LearningRate 0.0201 Epoch: 27 Global Step: 70710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:37,622-Speed 13003.00 samples/sec Loss 4.6058 LearningRate 0.0201 Epoch: 27 Global Step: 70720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:39,194-Speed 13034.77 samples/sec Loss 4.6231 LearningRate 0.0201 Epoch: 27 Global Step: 70730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:40,773-Speed 12981.15 samples/sec Loss 4.5798 LearningRate 0.0200 Epoch: 27 Global Step: 70740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:42,322-Speed 13228.15 samples/sec Loss 4.6112 LearningRate 0.0200 Epoch: 27 Global Step: 70750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:43,882-Speed 13135.60 samples/sec Loss 4.5867 LearningRate 0.0200 Epoch: 27 Global Step: 70760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:45,441-Speed 13140.20 samples/sec Loss 4.5825 LearningRate 0.0200 Epoch: 27 Global Step: 70770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:46,992-Speed 13213.82 samples/sec Loss 4.6124 LearningRate 0.0200 Epoch: 27 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:10:48,537-Speed 13264.92 samples/sec Loss 4.6262 LearningRate 0.0200 Epoch: 27 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:10:50,263-Speed 11868.78 samples/sec Loss 4.5647 LearningRate 0.0200 Epoch: 27 Global Step: 70800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:10:51,763-Speed 13657.08 samples/sec Loss 4.5369 LearningRate 0.0199 Epoch: 27 Global Step: 70810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:07,391-Speed 1310.61 samples/sec Loss 4.0819 LearningRate 0.0199 Epoch: 28 Global Step: 70820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:08,966-Speed 13014.94 samples/sec Loss 3.9158 LearningRate 0.0199 Epoch: 28 Global Step: 70830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:10,529-Speed 13118.74 samples/sec Loss 3.9538 LearningRate 0.0199 Epoch: 28 Global Step: 70840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:12,118-Speed 12895.48 samples/sec Loss 3.9866 LearningRate 0.0199 Epoch: 28 Global Step: 70850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:13,727-Speed 12737.04 samples/sec Loss 3.9540 LearningRate 0.0199 Epoch: 28 Global Step: 70860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:15,288-Speed 13128.29 samples/sec Loss 3.9356 LearningRate 0.0199 Epoch: 28 Global Step: 70870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:16,863-Speed 13006.00 samples/sec Loss 3.9545 LearningRate 0.0199 Epoch: 28 Global Step: 70880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:18,456-Speed 12864.62 samples/sec Loss 4.0225 LearningRate 0.0198 Epoch: 28 Global Step: 70890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:20,040-Speed 12936.11 samples/sec Loss 4.0206 LearningRate 0.0198 Epoch: 28 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:21,640-Speed 12800.24 samples/sec Loss 4.0724 LearningRate 0.0198 Epoch: 28 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:23,203-Speed 13118.67 samples/sec Loss 3.9934 LearningRate 0.0198 Epoch: 28 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:24,784-Speed 12954.67 samples/sec Loss 4.0603 LearningRate 0.0198 Epoch: 28 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:26,387-Speed 12783.84 samples/sec Loss 3.9935 LearningRate 0.0198 Epoch: 28 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:27,961-Speed 13017.29 samples/sec Loss 4.0727 LearningRate 0.0198 Epoch: 28 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:29,524-Speed 13119.23 samples/sec Loss 3.9893 LearningRate 0.0198 Epoch: 28 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:31,111-Speed 12911.09 samples/sec Loss 4.0387 LearningRate 0.0197 Epoch: 28 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:32,716-Speed 12764.01 samples/sec Loss 4.0816 LearningRate 0.0197 Epoch: 28 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:34,271-Speed 13176.36 samples/sec Loss 4.0288 LearningRate 0.0197 Epoch: 28 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:35,850-Speed 12976.07 samples/sec Loss 4.0158 LearningRate 0.0197 Epoch: 28 Global Step: 71000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:11:37,441-Speed 12879.80 samples/sec Loss 4.0673 LearningRate 0.0197 Epoch: 28 Global Step: 71010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:11:38,989-Speed 13239.86 samples/sec Loss 4.1141 LearningRate 0.0197 Epoch: 28 Global Step: 71020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:40,565-Speed 12998.29 samples/sec Loss 4.1023 LearningRate 0.0197 Epoch: 28 Global Step: 71030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:42,132-Speed 13073.54 samples/sec Loss 4.1096 LearningRate 0.0196 Epoch: 28 Global Step: 71040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:43,696-Speed 13109.60 samples/sec Loss 4.1139 LearningRate 0.0196 Epoch: 28 Global Step: 71050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:45,276-Speed 12966.20 samples/sec Loss 4.0544 LearningRate 0.0196 Epoch: 28 Global Step: 71060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:46,865-Speed 12894.52 samples/sec Loss 3.9950 LearningRate 0.0196 Epoch: 28 Global Step: 71070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:48,434-Speed 13061.76 samples/sec Loss 4.0866 LearningRate 0.0196 Epoch: 28 Global Step: 71080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:50,030-Speed 12839.38 samples/sec Loss 4.0707 LearningRate 0.0196 Epoch: 28 Global Step: 71090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:51,626-Speed 12837.43 samples/sec Loss 4.0715 LearningRate 0.0196 Epoch: 28 Global Step: 71100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:53,199-Speed 13029.22 samples/sec Loss 4.0925 LearningRate 0.0196 Epoch: 28 Global Step: 71110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:11:54,800-Speed 12797.36 samples/sec Loss 4.0559 LearningRate 0.0195 Epoch: 28 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:56,374-Speed 13019.17 samples/sec Loss 4.0252 LearningRate 0.0195 Epoch: 28 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:57,939-Speed 13089.43 samples/sec Loss 4.0546 LearningRate 0.0195 Epoch: 28 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:11:59,519-Speed 12968.97 samples/sec Loss 4.0807 LearningRate 0.0195 Epoch: 28 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:01,124-Speed 12766.20 samples/sec Loss 4.0789 LearningRate 0.0195 Epoch: 28 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:02,691-Speed 13074.57 samples/sec Loss 4.1620 LearningRate 0.0195 Epoch: 28 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:04,274-Speed 12950.68 samples/sec Loss 4.1357 LearningRate 0.0195 Epoch: 28 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:05,840-Speed 13085.44 samples/sec Loss 4.1116 LearningRate 0.0195 Epoch: 28 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:07,418-Speed 12982.92 samples/sec Loss 4.1396 LearningRate 0.0194 Epoch: 28 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:09,031-Speed 12705.20 samples/sec Loss 4.0486 LearningRate 0.0194 Epoch: 28 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:10,607-Speed 12999.60 samples/sec Loss 4.0991 LearningRate 0.0194 Epoch: 28 Global Step: 71220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:12,198-Speed 12876.73 samples/sec Loss 4.1536 LearningRate 0.0194 Epoch: 28 Global Step: 71230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:13,784-Speed 12926.82 samples/sec Loss 4.1704 LearningRate 0.0194 Epoch: 28 Global Step: 71240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:15,365-Speed 12965.87 samples/sec Loss 4.1396 LearningRate 0.0194 Epoch: 28 Global Step: 71250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:16,929-Speed 13100.79 samples/sec Loss 4.1594 LearningRate 0.0194 Epoch: 28 Global Step: 71260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:18,502-Speed 13024.86 samples/sec Loss 4.1431 LearningRate 0.0193 Epoch: 28 Global Step: 71270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:20,077-Speed 13011.81 samples/sec Loss 4.1917 LearningRate 0.0193 Epoch: 28 Global Step: 71280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:21,631-Speed 13189.59 samples/sec Loss 4.2383 LearningRate 0.0193 Epoch: 28 Global Step: 71290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:23,216-Speed 12923.74 samples/sec Loss 4.0728 LearningRate 0.0193 Epoch: 28 Global Step: 71300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:24,787-Speed 13048.14 samples/sec Loss 4.2245 LearningRate 0.0193 Epoch: 28 Global Step: 71310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:26,353-Speed 13077.89 samples/sec Loss 4.1033 LearningRate 0.0193 Epoch: 28 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:27,922-Speed 13056.40 samples/sec Loss 4.1775 LearningRate 0.0193 Epoch: 28 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:29,528-Speed 12765.60 samples/sec Loss 4.1448 LearningRate 0.0193 Epoch: 28 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:31,110-Speed 12950.95 samples/sec Loss 4.1395 LearningRate 0.0192 Epoch: 28 Global Step: 71350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:32,707-Speed 12829.46 samples/sec Loss 4.1106 LearningRate 0.0192 Epoch: 28 Global Step: 71360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:34,285-Speed 12985.18 samples/sec Loss 4.1338 LearningRate 0.0192 Epoch: 28 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:35,838-Speed 13200.69 samples/sec Loss 4.2493 LearningRate 0.0192 Epoch: 28 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:37,411-Speed 13025.76 samples/sec Loss 4.1098 LearningRate 0.0192 Epoch: 28 Global Step: 71390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:38,989-Speed 12984.50 samples/sec Loss 4.0892 LearningRate 0.0192 Epoch: 28 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:40,581-Speed 12872.16 samples/sec Loss 4.1802 LearningRate 0.0192 Epoch: 28 Global Step: 71410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:42,171-Speed 12887.91 samples/sec Loss 4.1601 LearningRate 0.0192 Epoch: 28 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:12:43,726-Speed 13179.16 samples/sec Loss 4.0962 LearningRate 0.0191 Epoch: 28 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:12:45,296-Speed 13049.79 samples/sec Loss 4.2875 LearningRate 0.0191 Epoch: 28 Global Step: 71440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:46,851-Speed 13177.27 samples/sec Loss 4.2355 LearningRate 0.0191 Epoch: 28 Global Step: 71450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:12:48,422-Speed 13047.54 samples/sec Loss 4.1677 LearningRate 0.0191 Epoch: 28 Global Step: 71460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:50,004-Speed 12954.56 samples/sec Loss 4.1218 LearningRate 0.0191 Epoch: 28 Global Step: 71470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:51,564-Speed 13129.72 samples/sec Loss 4.1519 LearningRate 0.0191 Epoch: 28 Global Step: 71480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:53,151-Speed 12917.75 samples/sec Loss 4.1828 LearningRate 0.0191 Epoch: 28 Global Step: 71490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:54,754-Speed 12783.81 samples/sec Loss 4.2079 LearningRate 0.0191 Epoch: 28 Global Step: 71500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:56,310-Speed 13174.19 samples/sec Loss 4.2168 LearningRate 0.0190 Epoch: 28 Global Step: 71510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:57,880-Speed 13053.61 samples/sec Loss 4.2312 LearningRate 0.0190 Epoch: 28 Global Step: 71520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:12:59,489-Speed 12739.00 samples/sec Loss 4.2279 LearningRate 0.0190 Epoch: 28 Global Step: 71530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:01,064-Speed 13011.49 samples/sec Loss 4.2485 LearningRate 0.0190 Epoch: 28 Global Step: 71540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:02,645-Speed 12956.66 samples/sec Loss 4.1539 LearningRate 0.0190 Epoch: 28 Global Step: 71550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:04,232-Speed 12915.31 samples/sec Loss 4.2417 LearningRate 0.0190 Epoch: 28 Global Step: 71560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:05,804-Speed 13033.63 samples/sec Loss 4.2566 LearningRate 0.0190 Epoch: 28 Global Step: 71570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:07,391-Speed 12916.29 samples/sec Loss 4.2820 LearningRate 0.0189 Epoch: 28 Global Step: 71580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:08,975-Speed 12939.98 samples/sec Loss 4.2317 LearningRate 0.0189 Epoch: 28 Global Step: 71590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:10,549-Speed 13013.69 samples/sec Loss 4.1792 LearningRate 0.0189 Epoch: 28 Global Step: 71600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:12,143-Speed 12858.42 samples/sec Loss 4.2988 LearningRate 0.0189 Epoch: 28 Global Step: 71610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:13,714-Speed 13043.94 samples/sec Loss 4.2608 LearningRate 0.0189 Epoch: 28 Global Step: 71620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:15,296-Speed 12951.70 samples/sec Loss 4.2865 LearningRate 0.0189 Epoch: 28 Global Step: 71630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:16,902-Speed 12760.56 samples/sec Loss 4.1908 LearningRate 0.0189 Epoch: 28 Global Step: 71640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:18,469-Speed 13077.89 samples/sec Loss 4.2168 LearningRate 0.0189 Epoch: 28 Global Step: 71650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:20,036-Speed 13071.52 samples/sec Loss 4.2737 LearningRate 0.0188 Epoch: 28 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:21,606-Speed 13056.95 samples/sec Loss 4.3256 LearningRate 0.0188 Epoch: 28 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:23,190-Speed 12934.03 samples/sec Loss 4.2624 LearningRate 0.0188 Epoch: 28 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:24,786-Speed 12836.27 samples/sec Loss 4.3089 LearningRate 0.0188 Epoch: 28 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:26,334-Speed 13243.73 samples/sec Loss 4.3377 LearningRate 0.0188 Epoch: 28 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:27,930-Speed 12833.42 samples/sec Loss 4.2349 LearningRate 0.0188 Epoch: 28 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:29,518-Speed 12909.15 samples/sec Loss 4.3460 LearningRate 0.0188 Epoch: 28 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:31,103-Speed 12928.94 samples/sec Loss 4.2930 LearningRate 0.0188 Epoch: 28 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:32,651-Speed 13233.33 samples/sec Loss 4.2644 LearningRate 0.0187 Epoch: 28 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:34,224-Speed 13025.53 samples/sec Loss 4.2959 LearningRate 0.0187 Epoch: 28 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:35,781-Speed 13162.41 samples/sec Loss 4.2687 LearningRate 0.0187 Epoch: 28 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:37,343-Speed 13117.48 samples/sec Loss 4.3582 LearningRate 0.0187 Epoch: 28 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:38,931-Speed 12901.12 samples/sec Loss 4.2224 LearningRate 0.0187 Epoch: 28 Global Step: 71780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:40,516-Speed 12930.74 samples/sec Loss 4.2282 LearningRate 0.0187 Epoch: 28 Global Step: 71790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:42,091-Speed 13006.03 samples/sec Loss 4.3282 LearningRate 0.0187 Epoch: 28 Global Step: 71800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:43,656-Speed 13100.81 samples/sec Loss 4.3249 LearningRate 0.0187 Epoch: 28 Global Step: 71810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:45,268-Speed 12705.23 samples/sec Loss 4.2947 LearningRate 0.0186 Epoch: 28 Global Step: 71820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:46,846-Speed 12990.30 samples/sec Loss 4.1950 LearningRate 0.0186 Epoch: 28 Global Step: 71830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:48,424-Speed 13003.72 samples/sec Loss 4.2548 LearningRate 0.0186 Epoch: 28 Global Step: 71840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:50,006-Speed 12950.88 samples/sec Loss 4.3467 LearningRate 0.0186 Epoch: 28 Global Step: 71850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:51,571-Speed 13089.97 samples/sec Loss 4.3346 LearningRate 0.0186 Epoch: 28 Global Step: 71860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:53,157-Speed 12923.02 samples/sec Loss 4.3224 LearningRate 0.0186 Epoch: 28 Global Step: 71870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:13:54,732-Speed 13010.12 samples/sec Loss 4.2774 LearningRate 0.0186 Epoch: 28 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:56,319-Speed 12911.72 samples/sec Loss 4.2758 LearningRate 0.0186 Epoch: 28 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:57,882-Speed 13111.85 samples/sec Loss 4.3064 LearningRate 0.0185 Epoch: 28 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:13:59,486-Speed 12771.18 samples/sec Loss 4.3124 LearningRate 0.0185 Epoch: 28 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:01,037-Speed 13212.20 samples/sec Loss 4.3137 LearningRate 0.0185 Epoch: 28 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:02,621-Speed 12935.10 samples/sec Loss 4.3528 LearningRate 0.0185 Epoch: 28 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:04,200-Speed 12985.27 samples/sec Loss 4.2739 LearningRate 0.0185 Epoch: 28 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:05,765-Speed 13095.01 samples/sec Loss 4.3243 LearningRate 0.0185 Epoch: 28 Global Step: 71950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:07,345-Speed 12965.67 samples/sec Loss 4.3590 LearningRate 0.0185 Epoch: 28 Global Step: 71960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:08,942-Speed 12835.39 samples/sec Loss 4.3437 LearningRate 0.0185 Epoch: 28 Global Step: 71970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:10,526-Speed 12928.87 samples/sec Loss 4.2739 LearningRate 0.0184 Epoch: 28 Global Step: 71980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:12,108-Speed 12956.57 samples/sec Loss 4.2444 LearningRate 0.0184 Epoch: 28 Global Step: 71990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:13,678-Speed 13053.50 samples/sec Loss 4.2475 LearningRate 0.0184 Epoch: 28 Global Step: 72000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:15,260-Speed 12949.14 samples/sec Loss 4.3610 LearningRate 0.0184 Epoch: 28 Global Step: 72010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:16,849-Speed 12896.45 samples/sec Loss 4.3822 LearningRate 0.0184 Epoch: 28 Global Step: 72020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:18,439-Speed 12884.74 samples/sec Loss 4.3246 LearningRate 0.0184 Epoch: 28 Global Step: 72030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:20,011-Speed 13033.99 samples/sec Loss 4.3250 LearningRate 0.0184 Epoch: 28 Global Step: 72040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:21,587-Speed 12999.52 samples/sec Loss 4.3655 LearningRate 0.0184 Epoch: 28 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:23,166-Speed 12982.35 samples/sec Loss 4.3215 LearningRate 0.0183 Epoch: 28 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:24,723-Speed 13159.95 samples/sec Loss 4.3648 LearningRate 0.0183 Epoch: 28 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:26,292-Speed 13062.08 samples/sec Loss 4.3072 LearningRate 0.0183 Epoch: 28 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:27,890-Speed 12823.58 samples/sec Loss 4.3850 LearningRate 0.0183 Epoch: 28 Global Step: 72090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:29,465-Speed 13008.27 samples/sec Loss 4.3591 LearningRate 0.0183 Epoch: 28 Global Step: 72100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:31,040-Speed 13010.22 samples/sec Loss 4.3699 LearningRate 0.0183 Epoch: 28 Global Step: 72110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:32,599-Speed 13147.64 samples/sec Loss 4.2505 LearningRate 0.0183 Epoch: 28 Global Step: 72120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:34,199-Speed 12809.01 samples/sec Loss 4.3058 LearningRate 0.0182 Epoch: 28 Global Step: 72130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:35,779-Speed 12961.24 samples/sec Loss 4.3639 LearningRate 0.0182 Epoch: 28 Global Step: 72140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:37,340-Speed 13127.16 samples/sec Loss 4.3367 LearningRate 0.0182 Epoch: 28 Global Step: 72150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:38,917-Speed 12998.75 samples/sec Loss 4.3376 LearningRate 0.0182 Epoch: 28 Global Step: 72160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:40,500-Speed 12944.98 samples/sec Loss 4.3316 LearningRate 0.0182 Epoch: 28 Global Step: 72170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:42,069-Speed 13056.51 samples/sec Loss 4.2638 LearningRate 0.0182 Epoch: 28 Global Step: 72180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:43,659-Speed 12892.57 samples/sec Loss 4.3173 LearningRate 0.0182 Epoch: 28 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:45,224-Speed 13090.71 samples/sec Loss 4.3858 LearningRate 0.0182 Epoch: 28 Global Step: 72200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:46,794-Speed 13054.84 samples/sec Loss 4.3547 LearningRate 0.0181 Epoch: 28 Global Step: 72210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:14:48,355-Speed 13127.20 samples/sec Loss 4.3805 LearningRate 0.0181 Epoch: 28 Global Step: 72220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:49,947-Speed 12870.21 samples/sec Loss 4.3065 LearningRate 0.0181 Epoch: 28 Global Step: 72230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:51,532-Speed 12928.44 samples/sec Loss 4.3174 LearningRate 0.0181 Epoch: 28 Global Step: 72240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:53,094-Speed 13117.45 samples/sec Loss 4.3564 LearningRate 0.0181 Epoch: 28 Global Step: 72250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:54,653-Speed 13145.57 samples/sec Loss 4.3988 LearningRate 0.0181 Epoch: 28 Global Step: 72260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:56,233-Speed 12964.55 samples/sec Loss 4.3353 LearningRate 0.0181 Epoch: 28 Global Step: 72270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:57,813-Speed 12970.46 samples/sec Loss 4.4117 LearningRate 0.0181 Epoch: 28 Global Step: 72280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:14:59,403-Speed 12891.19 samples/sec Loss 4.2634 LearningRate 0.0180 Epoch: 28 Global Step: 72290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:00,954-Speed 13208.82 samples/sec Loss 4.3178 LearningRate 0.0180 Epoch: 28 Global Step: 72300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:02,564-Speed 12729.43 samples/sec Loss 4.3707 LearningRate 0.0180 Epoch: 28 Global Step: 72310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:04,131-Speed 13075.90 samples/sec Loss 4.3841 LearningRate 0.0180 Epoch: 28 Global Step: 72320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:05,694-Speed 13111.11 samples/sec Loss 4.3292 LearningRate 0.0180 Epoch: 28 Global Step: 72330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:07,267-Speed 13030.95 samples/sec Loss 4.2923 LearningRate 0.0180 Epoch: 28 Global Step: 72340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:08,859-Speed 12870.05 samples/sec Loss 4.3099 LearningRate 0.0180 Epoch: 28 Global Step: 72350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:10,451-Speed 12875.01 samples/sec Loss 4.4089 LearningRate 0.0180 Epoch: 28 Global Step: 72360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:12,038-Speed 12904.44 samples/sec Loss 4.3772 LearningRate 0.0179 Epoch: 28 Global Step: 72370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:13,602-Speed 13103.82 samples/sec Loss 4.3961 LearningRate 0.0179 Epoch: 28 Global Step: 72380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:15,180-Speed 12989.69 samples/sec Loss 4.3421 LearningRate 0.0179 Epoch: 28 Global Step: 72390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:16,756-Speed 13000.73 samples/sec Loss 4.3397 LearningRate 0.0179 Epoch: 28 Global Step: 72400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:18,335-Speed 12977.33 samples/sec Loss 4.3729 LearningRate 0.0179 Epoch: 28 Global Step: 72410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:19,922-Speed 12911.21 samples/sec Loss 4.3494 LearningRate 0.0179 Epoch: 28 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:21,486-Speed 13100.56 samples/sec Loss 4.3830 LearningRate 0.0179 Epoch: 28 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:23,059-Speed 13018.22 samples/sec Loss 4.3795 LearningRate 0.0179 Epoch: 28 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:24,638-Speed 12987.04 samples/sec Loss 4.3689 LearningRate 0.0178 Epoch: 28 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:26,270-Speed 12557.07 samples/sec Loss 4.3488 LearningRate 0.0178 Epoch: 28 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:27,855-Speed 12924.44 samples/sec Loss 4.3551 LearningRate 0.0178 Epoch: 28 Global Step: 72470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:29,432-Speed 12997.44 samples/sec Loss 4.3891 LearningRate 0.0178 Epoch: 28 Global Step: 72480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:30,997-Speed 13095.90 samples/sec Loss 4.2702 LearningRate 0.0178 Epoch: 28 Global Step: 72490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:32,567-Speed 13045.72 samples/sec Loss 4.3649 LearningRate 0.0178 Epoch: 28 Global Step: 72500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:34,143-Speed 13009.54 samples/sec Loss 4.3978 LearningRate 0.0178 Epoch: 28 Global Step: 72510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:35,720-Speed 12993.04 samples/sec Loss 4.4098 LearningRate 0.0178 Epoch: 28 Global Step: 72520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:37,305-Speed 12927.09 samples/sec Loss 4.3320 LearningRate 0.0178 Epoch: 28 Global Step: 72530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:38,886-Speed 12961.85 samples/sec Loss 4.4635 LearningRate 0.0177 Epoch: 28 Global Step: 72540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:40,464-Speed 12977.50 samples/sec Loss 4.3716 LearningRate 0.0177 Epoch: 28 Global Step: 72550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:42,027-Speed 13110.37 samples/sec Loss 4.5203 LearningRate 0.0177 Epoch: 28 Global Step: 72560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:15:43,599-Speed 13036.43 samples/sec Loss 4.3597 LearningRate 0.0177 Epoch: 28 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:45,159-Speed 13133.63 samples/sec Loss 4.3631 LearningRate 0.0177 Epoch: 28 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:46,725-Speed 13085.98 samples/sec Loss 4.3833 LearningRate 0.0177 Epoch: 28 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:48,314-Speed 12895.95 samples/sec Loss 4.4314 LearningRate 0.0177 Epoch: 28 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:49,893-Speed 12975.34 samples/sec Loss 4.4756 LearningRate 0.0177 Epoch: 28 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:51,476-Speed 12943.84 samples/sec Loss 4.3898 LearningRate 0.0176 Epoch: 28 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:53,045-Speed 13057.39 samples/sec Loss 4.3321 LearningRate 0.0176 Epoch: 28 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:54,608-Speed 13115.29 samples/sec Loss 4.3492 LearningRate 0.0176 Epoch: 28 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:56,180-Speed 13036.95 samples/sec Loss 4.4145 LearningRate 0.0176 Epoch: 28 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:57,729-Speed 13224.70 samples/sec Loss 4.2988 LearningRate 0.0176 Epoch: 28 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:15:59,287-Speed 13154.64 samples/sec Loss 4.3936 LearningRate 0.0176 Epoch: 28 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:16:00,845-Speed 13153.80 samples/sec Loss 4.3725 LearningRate 0.0176 Epoch: 28 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:16:02,403-Speed 13149.51 samples/sec Loss 4.3827 LearningRate 0.0176 Epoch: 28 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:16:04,002-Speed 12814.74 samples/sec Loss 4.4188 LearningRate 0.0175 Epoch: 28 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:16:05,580-Speed 12982.41 samples/sec Loss 4.4000 LearningRate 0.0175 Epoch: 28 Global Step: 72710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:07,144-Speed 13106.49 samples/sec Loss 4.4222 LearningRate 0.0175 Epoch: 28 Global Step: 72720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:08,747-Speed 12779.64 samples/sec Loss 4.3989 LearningRate 0.0175 Epoch: 28 Global Step: 72730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:10,314-Speed 13081.47 samples/sec Loss 4.3920 LearningRate 0.0175 Epoch: 28 Global Step: 72740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:11,905-Speed 12876.84 samples/sec Loss 4.3915 LearningRate 0.0175 Epoch: 28 Global Step: 72750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:13,504-Speed 12814.06 samples/sec Loss 4.4308 LearningRate 0.0175 Epoch: 28 Global Step: 72760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:15,090-Speed 12920.78 samples/sec Loss 4.3551 LearningRate 0.0175 Epoch: 28 Global Step: 72770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:16,651-Speed 13125.54 samples/sec Loss 4.3932 LearningRate 0.0174 Epoch: 28 Global Step: 72780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:18,242-Speed 12882.69 samples/sec Loss 4.4711 LearningRate 0.0174 Epoch: 28 Global Step: 72790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:19,804-Speed 13118.18 samples/sec Loss 4.3346 LearningRate 0.0174 Epoch: 28 Global Step: 72800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:21,375-Speed 13062.18 samples/sec Loss 4.3394 LearningRate 0.0174 Epoch: 28 Global Step: 72810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:22,981-Speed 12761.47 samples/sec Loss 4.3907 LearningRate 0.0174 Epoch: 28 Global Step: 72820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:24,572-Speed 12883.88 samples/sec Loss 4.4095 LearningRate 0.0174 Epoch: 28 Global Step: 72830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:26,139-Speed 13068.60 samples/sec Loss 4.3930 LearningRate 0.0174 Epoch: 28 Global Step: 72840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:27,710-Speed 13048.77 samples/sec Loss 4.4307 LearningRate 0.0174 Epoch: 28 Global Step: 72850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:29,305-Speed 12843.77 samples/sec Loss 4.4735 LearningRate 0.0173 Epoch: 28 Global Step: 72860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:30,870-Speed 13091.60 samples/sec Loss 4.3983 LearningRate 0.0173 Epoch: 28 Global Step: 72870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:32,438-Speed 13067.65 samples/sec Loss 4.5148 LearningRate 0.0173 Epoch: 28 Global Step: 72880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:34,026-Speed 12908.25 samples/sec Loss 4.4792 LearningRate 0.0173 Epoch: 28 Global Step: 72890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:35,580-Speed 13186.82 samples/sec Loss 4.5076 LearningRate 0.0173 Epoch: 28 Global Step: 72900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:37,205-Speed 12604.52 samples/sec Loss 4.3517 LearningRate 0.0173 Epoch: 28 Global Step: 72910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:38,772-Speed 13081.05 samples/sec Loss 4.3810 LearningRate 0.0173 Epoch: 28 Global Step: 72920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:40,362-Speed 12888.23 samples/sec Loss 4.3494 LearningRate 0.0173 Epoch: 28 Global Step: 72930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:41,918-Speed 13167.70 samples/sec Loss 4.4552 LearningRate 0.0172 Epoch: 28 Global Step: 72940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:43,498-Speed 12970.85 samples/sec Loss 4.5233 LearningRate 0.0172 Epoch: 28 Global Step: 72950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:45,080-Speed 12951.18 samples/sec Loss 4.4281 LearningRate 0.0172 Epoch: 28 Global Step: 72960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:46,650-Speed 13050.52 samples/sec Loss 4.4660 LearningRate 0.0172 Epoch: 28 Global Step: 72970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:48,214-Speed 13106.81 samples/sec Loss 4.4573 LearningRate 0.0172 Epoch: 28 Global Step: 72980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:49,809-Speed 12847.78 samples/sec Loss 4.4893 LearningRate 0.0172 Epoch: 28 Global Step: 72990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:51,379-Speed 13048.85 samples/sec Loss 4.5096 LearningRate 0.0172 Epoch: 28 Global Step: 73000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:16:52,943-Speed 13098.85 samples/sec Loss 4.5259 LearningRate 0.0172 Epoch: 28 Global Step: 73010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:54,512-Speed 13065.87 samples/sec Loss 4.4380 LearningRate 0.0171 Epoch: 28 Global Step: 73020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:56,085-Speed 13020.25 samples/sec Loss 4.4182 LearningRate 0.0171 Epoch: 28 Global Step: 73030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:57,691-Speed 12762.57 samples/sec Loss 4.4319 LearningRate 0.0171 Epoch: 28 Global Step: 73040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:16:59,271-Speed 12971.68 samples/sec Loss 4.3796 LearningRate 0.0171 Epoch: 28 Global Step: 73050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:00,860-Speed 12889.17 samples/sec Loss 4.4364 LearningRate 0.0171 Epoch: 28 Global Step: 73060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:02,450-Speed 12890.17 samples/sec Loss 4.3958 LearningRate 0.0171 Epoch: 28 Global Step: 73070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:04,011-Speed 13127.61 samples/sec Loss 4.5330 LearningRate 0.0171 Epoch: 28 Global Step: 73080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:05,585-Speed 13018.83 samples/sec Loss 4.5013 LearningRate 0.0171 Epoch: 28 Global Step: 73090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:07,178-Speed 12866.12 samples/sec Loss 4.5228 LearningRate 0.0171 Epoch: 28 Global Step: 73100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:08,751-Speed 13027.39 samples/sec Loss 4.4907 LearningRate 0.0170 Epoch: 28 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:10,352-Speed 12797.05 samples/sec Loss 4.4109 LearningRate 0.0170 Epoch: 28 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:11,913-Speed 13123.37 samples/sec Loss 4.3747 LearningRate 0.0170 Epoch: 28 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:13,480-Speed 13076.19 samples/sec Loss 4.3742 LearningRate 0.0170 Epoch: 28 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:15,068-Speed 12900.72 samples/sec Loss 4.4541 LearningRate 0.0170 Epoch: 28 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:16,626-Speed 13155.51 samples/sec Loss 4.4513 LearningRate 0.0170 Epoch: 28 Global Step: 73160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:18,211-Speed 12925.37 samples/sec Loss 4.4599 LearningRate 0.0170 Epoch: 28 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:19,812-Speed 12801.33 samples/sec Loss 4.3713 LearningRate 0.0170 Epoch: 28 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:21,373-Speed 13127.34 samples/sec Loss 4.4804 LearningRate 0.0169 Epoch: 28 Global Step: 73190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:22,937-Speed 13099.37 samples/sec Loss 4.4331 LearningRate 0.0169 Epoch: 28 Global Step: 73200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:24,516-Speed 12985.64 samples/sec Loss 4.4600 LearningRate 0.0169 Epoch: 28 Global Step: 73210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:26,099-Speed 12943.19 samples/sec Loss 4.4791 LearningRate 0.0169 Epoch: 28 Global Step: 73220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:27,667-Speed 13070.36 samples/sec Loss 4.5003 LearningRate 0.0169 Epoch: 28 Global Step: 73230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:29,241-Speed 13051.14 samples/sec Loss 4.5220 LearningRate 0.0169 Epoch: 28 Global Step: 73240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:30,823-Speed 12951.57 samples/sec Loss 4.3841 LearningRate 0.0169 Epoch: 28 Global Step: 73250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:32,389-Speed 13087.36 samples/sec Loss 4.4909 LearningRate 0.0169 Epoch: 28 Global Step: 73260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:33,961-Speed 13033.93 samples/sec Loss 4.4316 LearningRate 0.0168 Epoch: 28 Global Step: 73270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:35,547-Speed 12918.63 samples/sec Loss 4.4558 LearningRate 0.0168 Epoch: 28 Global Step: 73280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:37,104-Speed 13163.38 samples/sec Loss 4.4086 LearningRate 0.0168 Epoch: 28 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:38,677-Speed 13028.90 samples/sec Loss 4.4452 LearningRate 0.0168 Epoch: 28 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:40,287-Speed 12720.51 samples/sec Loss 4.4686 LearningRate 0.0168 Epoch: 28 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:41,857-Speed 13057.11 samples/sec Loss 4.4636 LearningRate 0.0168 Epoch: 28 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:17:43,480-Speed 12618.60 samples/sec Loss 4.5343 LearningRate 0.0168 Epoch: 28 Global Step: 73330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:17:44,959-Speed 13862.10 samples/sec Loss 4.4399 LearningRate 0.0168 Epoch: 28 Global Step: 73340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:00,337-Speed 1331.95 samples/sec Loss 3.9091 LearningRate 0.0167 Epoch: 29 Global Step: 73350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:02,039-Speed 12042.18 samples/sec Loss 3.8386 LearningRate 0.0167 Epoch: 29 Global Step: 73360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:03,612-Speed 13023.57 samples/sec Loss 3.9163 LearningRate 0.0167 Epoch: 29 Global Step: 73370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:05,187-Speed 13005.39 samples/sec Loss 3.8890 LearningRate 0.0167 Epoch: 29 Global Step: 73380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:06,767-Speed 12972.99 samples/sec Loss 3.8677 LearningRate 0.0167 Epoch: 29 Global Step: 73390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:08,379-Speed 12711.10 samples/sec Loss 3.8987 LearningRate 0.0167 Epoch: 29 Global Step: 73400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:09,963-Speed 12939.98 samples/sec Loss 3.8796 LearningRate 0.0167 Epoch: 29 Global Step: 73410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:11,543-Speed 12967.79 samples/sec Loss 3.8641 LearningRate 0.0167 Epoch: 29 Global Step: 73420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:13,134-Speed 12878.90 samples/sec Loss 3.8247 LearningRate 0.0167 Epoch: 29 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:14,711-Speed 12993.99 samples/sec Loss 3.8744 LearningRate 0.0166 Epoch: 29 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:16,337-Speed 12599.74 samples/sec Loss 3.9038 LearningRate 0.0166 Epoch: 29 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:17,895-Speed 13160.52 samples/sec Loss 3.8546 LearningRate 0.0166 Epoch: 29 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:19,494-Speed 12814.14 samples/sec Loss 3.9273 LearningRate 0.0166 Epoch: 29 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:21,081-Speed 12912.50 samples/sec Loss 3.9384 LearningRate 0.0166 Epoch: 29 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:22,645-Speed 13103.46 samples/sec Loss 3.8670 LearningRate 0.0166 Epoch: 29 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:24,209-Speed 13100.40 samples/sec Loss 3.9507 LearningRate 0.0166 Epoch: 29 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:25,765-Speed 13169.88 samples/sec Loss 3.9706 LearningRate 0.0166 Epoch: 29 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:27,337-Speed 13038.27 samples/sec Loss 3.9073 LearningRate 0.0165 Epoch: 29 Global Step: 73520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:28,910-Speed 13026.06 samples/sec Loss 3.8467 LearningRate 0.0165 Epoch: 29 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:18:30,458-Speed 13240.67 samples/sec Loss 3.8193 LearningRate 0.0165 Epoch: 29 Global Step: 73540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:32,032-Speed 13010.73 samples/sec Loss 3.9169 LearningRate 0.0165 Epoch: 29 Global Step: 73550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:33,575-Speed 13282.42 samples/sec Loss 3.9645 LearningRate 0.0165 Epoch: 29 Global Step: 73560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:35,137-Speed 13119.32 samples/sec Loss 3.9765 LearningRate 0.0165 Epoch: 29 Global Step: 73570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:36,717-Speed 12966.43 samples/sec Loss 3.8519 LearningRate 0.0165 Epoch: 29 Global Step: 73580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:38,301-Speed 12939.33 samples/sec Loss 3.9512 LearningRate 0.0165 Epoch: 29 Global Step: 73590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:39,872-Speed 13043.94 samples/sec Loss 3.9843 LearningRate 0.0164 Epoch: 29 Global Step: 73600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:41,447-Speed 13015.82 samples/sec Loss 3.9695 LearningRate 0.0164 Epoch: 29 Global Step: 73610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:43,018-Speed 13039.62 samples/sec Loss 3.9372 LearningRate 0.0164 Epoch: 29 Global Step: 73620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:44,598-Speed 12971.03 samples/sec Loss 3.9165 LearningRate 0.0164 Epoch: 29 Global Step: 73630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:46,172-Speed 13014.17 samples/sec Loss 4.0190 LearningRate 0.0164 Epoch: 29 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:47,744-Speed 13037.28 samples/sec Loss 4.0228 LearningRate 0.0164 Epoch: 29 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:18:49,311-Speed 13072.89 samples/sec Loss 3.9508 LearningRate 0.0164 Epoch: 29 Global Step: 73660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:50,886-Speed 13009.57 samples/sec Loss 3.9856 LearningRate 0.0164 Epoch: 29 Global Step: 73670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:52,409-Speed 13460.06 samples/sec Loss 4.0058 LearningRate 0.0164 Epoch: 29 Global Step: 73680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:53,985-Speed 12997.91 samples/sec Loss 4.0071 LearningRate 0.0163 Epoch: 29 Global Step: 73690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:55,559-Speed 13023.31 samples/sec Loss 4.0328 LearningRate 0.0163 Epoch: 29 Global Step: 73700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:57,129-Speed 13050.44 samples/sec Loss 4.0424 LearningRate 0.0163 Epoch: 29 Global Step: 73710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:18:58,703-Speed 13014.08 samples/sec Loss 4.0270 LearningRate 0.0163 Epoch: 29 Global Step: 73720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:00,283-Speed 12973.54 samples/sec Loss 3.9723 LearningRate 0.0163 Epoch: 29 Global Step: 73730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:01,850-Speed 13078.82 samples/sec Loss 4.0452 LearningRate 0.0163 Epoch: 29 Global Step: 73740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:03,437-Speed 12909.01 samples/sec Loss 4.0549 LearningRate 0.0163 Epoch: 29 Global Step: 73750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:05,031-Speed 12853.36 samples/sec Loss 4.0497 LearningRate 0.0163 Epoch: 29 Global Step: 73760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:19:06,586-Speed 13179.74 samples/sec Loss 4.0454 LearningRate 0.0162 Epoch: 29 Global Step: 73770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:19:08,165-Speed 12978.12 samples/sec Loss 3.9579 LearningRate 0.0162 Epoch: 29 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:09,752-Speed 12914.44 samples/sec Loss 4.0671 LearningRate 0.0162 Epoch: 29 Global Step: 73790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:11,348-Speed 12837.11 samples/sec Loss 4.0452 LearningRate 0.0162 Epoch: 29 Global Step: 73800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:12,939-Speed 12876.90 samples/sec Loss 4.0659 LearningRate 0.0162 Epoch: 29 Global Step: 73810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:14,515-Speed 13005.71 samples/sec Loss 4.0472 LearningRate 0.0162 Epoch: 29 Global Step: 73820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:16,075-Speed 13132.39 samples/sec Loss 3.9729 LearningRate 0.0162 Epoch: 29 Global Step: 73830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:17,681-Speed 12757.03 samples/sec Loss 4.0359 LearningRate 0.0162 Epoch: 29 Global Step: 73840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:19,295-Speed 12698.95 samples/sec Loss 4.0644 LearningRate 0.0162 Epoch: 29 Global Step: 73850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:20,877-Speed 12953.96 samples/sec Loss 3.8839 LearningRate 0.0161 Epoch: 29 Global Step: 73860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:22,460-Speed 12940.92 samples/sec Loss 4.0122 LearningRate 0.0161 Epoch: 29 Global Step: 73870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:24,056-Speed 12849.14 samples/sec Loss 4.0861 LearningRate 0.0161 Epoch: 29 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:19:25,598-Speed 13285.72 samples/sec Loss 3.9870 LearningRate 0.0161 Epoch: 29 Global Step: 73890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:27,150-Speed 13205.72 samples/sec Loss 4.0125 LearningRate 0.0161 Epoch: 29 Global Step: 73900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:28,725-Speed 13005.73 samples/sec Loss 4.1112 LearningRate 0.0161 Epoch: 29 Global Step: 73910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:30,305-Speed 12964.65 samples/sec Loss 4.0976 LearningRate 0.0161 Epoch: 29 Global Step: 73920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:31,898-Speed 12868.80 samples/sec Loss 4.1266 LearningRate 0.0161 Epoch: 29 Global Step: 73930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:33,457-Speed 13140.42 samples/sec Loss 3.9918 LearningRate 0.0160 Epoch: 29 Global Step: 73940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:35,031-Speed 13017.96 samples/sec Loss 4.0951 LearningRate 0.0160 Epoch: 29 Global Step: 73950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:36,605-Speed 13021.30 samples/sec Loss 4.0358 LearningRate 0.0160 Epoch: 29 Global Step: 73960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:38,187-Speed 12946.41 samples/sec Loss 4.1034 LearningRate 0.0160 Epoch: 29 Global Step: 73970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:39,780-Speed 12872.35 samples/sec Loss 4.0793 LearningRate 0.0160 Epoch: 29 Global Step: 73980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:41,371-Speed 12871.47 samples/sec Loss 4.0650 LearningRate 0.0160 Epoch: 29 Global Step: 73990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:19:42,940-Speed 13068.28 samples/sec Loss 4.1174 LearningRate 0.0160 Epoch: 29 Global Step: 74000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:44,526-Speed 12925.01 samples/sec Loss 4.0591 LearningRate 0.0160 Epoch: 29 Global Step: 74010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:46,117-Speed 12873.46 samples/sec Loss 4.0903 LearningRate 0.0160 Epoch: 29 Global Step: 74020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:47,697-Speed 12971.47 samples/sec Loss 4.1526 LearningRate 0.0159 Epoch: 29 Global Step: 74030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:49,267-Speed 13055.72 samples/sec Loss 4.0723 LearningRate 0.0159 Epoch: 29 Global Step: 74040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:50,879-Speed 12706.20 samples/sec Loss 4.0562 LearningRate 0.0159 Epoch: 29 Global Step: 74050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:52,460-Speed 12963.90 samples/sec Loss 4.1327 LearningRate 0.0159 Epoch: 29 Global Step: 74060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:54,052-Speed 12867.30 samples/sec Loss 4.0668 LearningRate 0.0159 Epoch: 29 Global Step: 74070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:55,620-Speed 13070.30 samples/sec Loss 4.1039 LearningRate 0.0159 Epoch: 29 Global Step: 74080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:57,184-Speed 13097.44 samples/sec Loss 4.0856 LearningRate 0.0159 Epoch: 29 Global Step: 74090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:19:58,778-Speed 12855.31 samples/sec Loss 4.0607 LearningRate 0.0159 Epoch: 29 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:00,347-Speed 13061.63 samples/sec Loss 4.0933 LearningRate 0.0158 Epoch: 29 Global Step: 74110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:01,966-Speed 12659.63 samples/sec Loss 4.2073 LearningRate 0.0158 Epoch: 29 Global Step: 74120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:03,555-Speed 12894.15 samples/sec Loss 4.2031 LearningRate 0.0158 Epoch: 29 Global Step: 74130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:05,116-Speed 13136.83 samples/sec Loss 4.1437 LearningRate 0.0158 Epoch: 29 Global Step: 74140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:06,683-Speed 13073.00 samples/sec Loss 4.1286 LearningRate 0.0158 Epoch: 29 Global Step: 74150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:08,255-Speed 13034.26 samples/sec Loss 4.1686 LearningRate 0.0158 Epoch: 29 Global Step: 74160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:09,847-Speed 12876.20 samples/sec Loss 4.1211 LearningRate 0.0158 Epoch: 29 Global Step: 74170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:11,412-Speed 13090.26 samples/sec Loss 4.0959 LearningRate 0.0158 Epoch: 29 Global Step: 74180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:12,987-Speed 13003.11 samples/sec Loss 4.0955 LearningRate 0.0158 Epoch: 29 Global Step: 74190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:14,597-Speed 12729.63 samples/sec Loss 4.0760 LearningRate 0.0157 Epoch: 29 Global Step: 74200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:16,164-Speed 13079.41 samples/sec Loss 4.1435 LearningRate 0.0157 Epoch: 29 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:17,785-Speed 12639.17 samples/sec Loss 4.0710 LearningRate 0.0157 Epoch: 29 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:19,365-Speed 12972.70 samples/sec Loss 4.1276 LearningRate 0.0157 Epoch: 29 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:20,975-Speed 12725.03 samples/sec Loss 4.2010 LearningRate 0.0157 Epoch: 29 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:22,573-Speed 12823.76 samples/sec Loss 4.1287 LearningRate 0.0157 Epoch: 29 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:24,144-Speed 13040.17 samples/sec Loss 4.1429 LearningRate 0.0157 Epoch: 29 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:25,724-Speed 12968.33 samples/sec Loss 4.1082 LearningRate 0.0157 Epoch: 29 Global Step: 74270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:27,291-Speed 13083.58 samples/sec Loss 4.0940 LearningRate 0.0156 Epoch: 29 Global Step: 74280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:28,882-Speed 12874.97 samples/sec Loss 4.2158 LearningRate 0.0156 Epoch: 29 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:30,456-Speed 13020.07 samples/sec Loss 4.1183 LearningRate 0.0156 Epoch: 29 Global Step: 74300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:32,042-Speed 12916.63 samples/sec Loss 4.1612 LearningRate 0.0156 Epoch: 29 Global Step: 74310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:33,620-Speed 12988.87 samples/sec Loss 4.1608 LearningRate 0.0156 Epoch: 29 Global Step: 74320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:35,193-Speed 13021.31 samples/sec Loss 4.1810 LearningRate 0.0156 Epoch: 29 Global Step: 74330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:36,746-Speed 13198.33 samples/sec Loss 4.1408 LearningRate 0.0156 Epoch: 29 Global Step: 74340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:38,330-Speed 12933.26 samples/sec Loss 4.1885 LearningRate 0.0156 Epoch: 29 Global Step: 74350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:39,926-Speed 12843.71 samples/sec Loss 4.1371 LearningRate 0.0156 Epoch: 29 Global Step: 74360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:41,490-Speed 13097.41 samples/sec Loss 4.1877 LearningRate 0.0155 Epoch: 29 Global Step: 74370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:43,061-Speed 13038.58 samples/sec Loss 4.1310 LearningRate 0.0155 Epoch: 29 Global Step: 74380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:44,623-Speed 13126.96 samples/sec Loss 4.1480 LearningRate 0.0155 Epoch: 29 Global Step: 74390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:46,198-Speed 13004.53 samples/sec Loss 4.1580 LearningRate 0.0155 Epoch: 29 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:47,807-Speed 12730.36 samples/sec Loss 4.1127 LearningRate 0.0155 Epoch: 29 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:20:49,363-Speed 13178.81 samples/sec Loss 4.1222 LearningRate 0.0155 Epoch: 29 Global Step: 74420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:50,941-Speed 12982.74 samples/sec Loss 4.1546 LearningRate 0.0155 Epoch: 29 Global Step: 74430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:52,518-Speed 12993.52 samples/sec Loss 4.2006 LearningRate 0.0155 Epoch: 29 Global Step: 74440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:54,077-Speed 13146.88 samples/sec Loss 4.1681 LearningRate 0.0154 Epoch: 29 Global Step: 74450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:55,649-Speed 13030.39 samples/sec Loss 4.1379 LearningRate 0.0154 Epoch: 29 Global Step: 74460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:57,226-Speed 12999.14 samples/sec Loss 4.1554 LearningRate 0.0154 Epoch: 29 Global Step: 74470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:20:58,782-Speed 13165.75 samples/sec Loss 4.1222 LearningRate 0.0154 Epoch: 29 Global Step: 74480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:00,371-Speed 12897.14 samples/sec Loss 4.2717 LearningRate 0.0154 Epoch: 29 Global Step: 74490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:01,941-Speed 13047.21 samples/sec Loss 4.2274 LearningRate 0.0154 Epoch: 29 Global Step: 74500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:03,526-Speed 12932.45 samples/sec Loss 4.1698 LearningRate 0.0154 Epoch: 29 Global Step: 74510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:05,104-Speed 12986.92 samples/sec Loss 4.2229 LearningRate 0.0154 Epoch: 29 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:06,683-Speed 12968.30 samples/sec Loss 4.2854 LearningRate 0.0154 Epoch: 29 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:08,253-Speed 13052.43 samples/sec Loss 4.3087 LearningRate 0.0153 Epoch: 29 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:09,839-Speed 12945.49 samples/sec Loss 4.1666 LearningRate 0.0153 Epoch: 29 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:11,422-Speed 12941.03 samples/sec Loss 4.2166 LearningRate 0.0153 Epoch: 29 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:13,000-Speed 12979.99 samples/sec Loss 4.1899 LearningRate 0.0153 Epoch: 29 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:14,575-Speed 13014.29 samples/sec Loss 4.1803 LearningRate 0.0153 Epoch: 29 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:16,136-Speed 13128.47 samples/sec Loss 4.2008 LearningRate 0.0153 Epoch: 29 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:17,724-Speed 12903.71 samples/sec Loss 4.1794 LearningRate 0.0153 Epoch: 29 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:19,324-Speed 12803.26 samples/sec Loss 4.1609 LearningRate 0.0153 Epoch: 29 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:20,910-Speed 12922.45 samples/sec Loss 4.2045 LearningRate 0.0153 Epoch: 29 Global Step: 74620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:21:22,489-Speed 12977.68 samples/sec Loss 4.2134 LearningRate 0.0152 Epoch: 29 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:24,059-Speed 13053.05 samples/sec Loss 4.1978 LearningRate 0.0152 Epoch: 29 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:25,640-Speed 12961.00 samples/sec Loss 4.2510 LearningRate 0.0152 Epoch: 29 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:27,198-Speed 13150.64 samples/sec Loss 4.1274 LearningRate 0.0152 Epoch: 29 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:28,765-Speed 13074.93 samples/sec Loss 4.2150 LearningRate 0.0152 Epoch: 29 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:30,325-Speed 13141.84 samples/sec Loss 4.1825 LearningRate 0.0152 Epoch: 29 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:31,900-Speed 13010.37 samples/sec Loss 4.2462 LearningRate 0.0152 Epoch: 29 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:33,467-Speed 13071.25 samples/sec Loss 4.1676 LearningRate 0.0152 Epoch: 29 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:35,033-Speed 13091.94 samples/sec Loss 4.2590 LearningRate 0.0152 Epoch: 29 Global Step: 74710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:36,622-Speed 12892.06 samples/sec Loss 4.2254 LearningRate 0.0151 Epoch: 29 Global Step: 74720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:38,217-Speed 12847.56 samples/sec Loss 4.2494 LearningRate 0.0151 Epoch: 29 Global Step: 74730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:39,793-Speed 13008.41 samples/sec Loss 4.2851 LearningRate 0.0151 Epoch: 29 Global Step: 74740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:41,355-Speed 13114.74 samples/sec Loss 4.2062 LearningRate 0.0151 Epoch: 29 Global Step: 74750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:42,932-Speed 12993.53 samples/sec Loss 4.2561 LearningRate 0.0151 Epoch: 29 Global Step: 74760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:44,519-Speed 12918.90 samples/sec Loss 4.2966 LearningRate 0.0151 Epoch: 29 Global Step: 74770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:46,076-Speed 13154.09 samples/sec Loss 4.1844 LearningRate 0.0151 Epoch: 29 Global Step: 74780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:47,653-Speed 12997.40 samples/sec Loss 4.2441 LearningRate 0.0151 Epoch: 29 Global Step: 74790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:49,225-Speed 13037.78 samples/sec Loss 4.1418 LearningRate 0.0150 Epoch: 29 Global Step: 74800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:50,796-Speed 13037.89 samples/sec Loss 4.2505 LearningRate 0.0150 Epoch: 29 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:52,348-Speed 13203.96 samples/sec Loss 4.2505 LearningRate 0.0150 Epoch: 29 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:53,941-Speed 12862.71 samples/sec Loss 4.2626 LearningRate 0.0150 Epoch: 29 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:55,504-Speed 13115.67 samples/sec Loss 4.2354 LearningRate 0.0150 Epoch: 29 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:21:57,111-Speed 12752.89 samples/sec Loss 4.2437 LearningRate 0.0150 Epoch: 29 Global Step: 74850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:21:58,687-Speed 12998.27 samples/sec Loss 4.2179 LearningRate 0.0150 Epoch: 29 Global Step: 74860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:00,270-Speed 12942.50 samples/sec Loss 4.3001 LearningRate 0.0150 Epoch: 29 Global Step: 74870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:01,837-Speed 13076.64 samples/sec Loss 4.2388 LearningRate 0.0150 Epoch: 29 Global Step: 74880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:03,414-Speed 12992.57 samples/sec Loss 4.2277 LearningRate 0.0149 Epoch: 29 Global Step: 74890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:04,978-Speed 13107.58 samples/sec Loss 4.2138 LearningRate 0.0149 Epoch: 29 Global Step: 74900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:06,548-Speed 13047.55 samples/sec Loss 4.2587 LearningRate 0.0149 Epoch: 29 Global Step: 74910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:08,110-Speed 13122.78 samples/sec Loss 4.2353 LearningRate 0.0149 Epoch: 29 Global Step: 74920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:09,683-Speed 13030.31 samples/sec Loss 4.2665 LearningRate 0.0149 Epoch: 29 Global Step: 74930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:11,276-Speed 12854.79 samples/sec Loss 4.2514 LearningRate 0.0149 Epoch: 29 Global Step: 74940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:12,862-Speed 12922.58 samples/sec Loss 4.2286 LearningRate 0.0149 Epoch: 29 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:22:14,435-Speed 13030.36 samples/sec Loss 4.2680 LearningRate 0.0149 Epoch: 29 Global Step: 74960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:15,986-Speed 13207.95 samples/sec Loss 4.3010 LearningRate 0.0149 Epoch: 29 Global Step: 74970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:17,551-Speed 13091.01 samples/sec Loss 4.2567 LearningRate 0.0148 Epoch: 29 Global Step: 74980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:19,113-Speed 13126.86 samples/sec Loss 4.2621 LearningRate 0.0148 Epoch: 29 Global Step: 74990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:20,679-Speed 13082.03 samples/sec Loss 4.1904 LearningRate 0.0148 Epoch: 29 Global Step: 75000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:22:42,583-[lfw][75000]XNorm: 7.884357 Training: 2022-01-14 17:22:42,584-[lfw][75000]Accuracy-Flip: 0.99583+-0.00396 Training: 2022-01-14 17:22:42,585-[lfw][75000]Accuracy-Highest: 0.99650 Training: 2022-01-14 17:23:08,677-[cfp_fp][75000]XNorm: 6.647636 Training: 2022-01-14 17:23:08,678-[cfp_fp][75000]Accuracy-Flip: 0.96586+-0.00853 Training: 2022-01-14 17:23:08,679-[cfp_fp][75000]Accuracy-Highest: 0.96771 Training: 2022-01-14 17:23:31,444-[agedb_30][75000]XNorm: 7.609590 Training: 2022-01-14 17:23:31,445-[agedb_30][75000]Accuracy-Flip: 0.96567+-0.00684 Training: 2022-01-14 17:23:31,445-[agedb_30][75000]Accuracy-Highest: 0.96800 Training: 2022-01-14 17:23:33,034-Speed 283.05 samples/sec Loss 4.2618 LearningRate 0.0148 Epoch: 29 Global Step: 75010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:34,604-Speed 13053.29 samples/sec Loss 4.2962 LearningRate 0.0148 Epoch: 29 Global Step: 75020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:36,202-Speed 12825.20 samples/sec Loss 4.3161 LearningRate 0.0148 Epoch: 29 Global Step: 75030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:37,782-Speed 12963.72 samples/sec Loss 4.2915 LearningRate 0.0148 Epoch: 29 Global Step: 75040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:39,334-Speed 13208.07 samples/sec Loss 4.1964 LearningRate 0.0148 Epoch: 29 Global Step: 75050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:40,919-Speed 12928.30 samples/sec Loss 4.2365 LearningRate 0.0148 Epoch: 29 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:23:42,501-Speed 12953.78 samples/sec Loss 4.2794 LearningRate 0.0147 Epoch: 29 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:23:44,088-Speed 12905.56 samples/sec Loss 4.3496 LearningRate 0.0147 Epoch: 29 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:23:45,664-Speed 13001.39 samples/sec Loss 4.2542 LearningRate 0.0147 Epoch: 29 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:23:47,230-Speed 13092.82 samples/sec Loss 4.2540 LearningRate 0.0147 Epoch: 29 Global Step: 75100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:48,806-Speed 12998.13 samples/sec Loss 4.3033 LearningRate 0.0147 Epoch: 29 Global Step: 75110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:50,374-Speed 13068.04 samples/sec Loss 4.2918 LearningRate 0.0147 Epoch: 29 Global Step: 75120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:51,933-Speed 13142.21 samples/sec Loss 4.3010 LearningRate 0.0147 Epoch: 29 Global Step: 75130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:53,507-Speed 13021.31 samples/sec Loss 4.3045 LearningRate 0.0147 Epoch: 29 Global Step: 75140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:55,120-Speed 12702.37 samples/sec Loss 4.3023 LearningRate 0.0147 Epoch: 29 Global Step: 75150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:56,680-Speed 13158.20 samples/sec Loss 4.2277 LearningRate 0.0146 Epoch: 29 Global Step: 75160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:58,253-Speed 13025.54 samples/sec Loss 4.2860 LearningRate 0.0146 Epoch: 29 Global Step: 75170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:23:59,849-Speed 12841.30 samples/sec Loss 4.2452 LearningRate 0.0146 Epoch: 29 Global Step: 75180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:01,425-Speed 13003.15 samples/sec Loss 4.2136 LearningRate 0.0146 Epoch: 29 Global Step: 75190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:03,009-Speed 12932.19 samples/sec Loss 4.2965 LearningRate 0.0146 Epoch: 29 Global Step: 75200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:04,566-Speed 13158.06 samples/sec Loss 4.2866 LearningRate 0.0146 Epoch: 29 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:06,119-Speed 13196.40 samples/sec Loss 4.2862 LearningRate 0.0146 Epoch: 29 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:07,694-Speed 13012.17 samples/sec Loss 4.2652 LearningRate 0.0146 Epoch: 29 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:09,280-Speed 12919.58 samples/sec Loss 4.2770 LearningRate 0.0145 Epoch: 29 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:10,847-Speed 13073.89 samples/sec Loss 4.3077 LearningRate 0.0145 Epoch: 29 Global Step: 75250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:12,406-Speed 13144.63 samples/sec Loss 4.3959 LearningRate 0.0145 Epoch: 29 Global Step: 75260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:14,005-Speed 12819.64 samples/sec Loss 4.2705 LearningRate 0.0145 Epoch: 29 Global Step: 75270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:15,565-Speed 13132.97 samples/sec Loss 4.4020 LearningRate 0.0145 Epoch: 29 Global Step: 75280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:17,134-Speed 13065.39 samples/sec Loss 4.3315 LearningRate 0.0145 Epoch: 29 Global Step: 75290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:18,733-Speed 12814.60 samples/sec Loss 4.2943 LearningRate 0.0145 Epoch: 29 Global Step: 75300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:20,309-Speed 12999.78 samples/sec Loss 4.2789 LearningRate 0.0145 Epoch: 29 Global Step: 75310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:21,874-Speed 13095.74 samples/sec Loss 4.2968 LearningRate 0.0145 Epoch: 29 Global Step: 75320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:23,441-Speed 13075.73 samples/sec Loss 4.2892 LearningRate 0.0144 Epoch: 29 Global Step: 75330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:25,017-Speed 13002.73 samples/sec Loss 4.3451 LearningRate 0.0144 Epoch: 29 Global Step: 75340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:26,568-Speed 13212.57 samples/sec Loss 4.3055 LearningRate 0.0144 Epoch: 29 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:28,141-Speed 13026.97 samples/sec Loss 4.2822 LearningRate 0.0144 Epoch: 29 Global Step: 75360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:29,699-Speed 13155.81 samples/sec Loss 4.1955 LearningRate 0.0144 Epoch: 29 Global Step: 75370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:31,268-Speed 13058.07 samples/sec Loss 4.3685 LearningRate 0.0144 Epoch: 29 Global Step: 75380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:32,837-Speed 13058.78 samples/sec Loss 4.3213 LearningRate 0.0144 Epoch: 29 Global Step: 75390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:34,401-Speed 13096.62 samples/sec Loss 4.2772 LearningRate 0.0144 Epoch: 29 Global Step: 75400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:36,000-Speed 12816.70 samples/sec Loss 4.3083 LearningRate 0.0144 Epoch: 29 Global Step: 75410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:37,575-Speed 13010.44 samples/sec Loss 4.2603 LearningRate 0.0143 Epoch: 29 Global Step: 75420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:39,175-Speed 12808.38 samples/sec Loss 4.2987 LearningRate 0.0143 Epoch: 29 Global Step: 75430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:40,754-Speed 12976.38 samples/sec Loss 4.3484 LearningRate 0.0143 Epoch: 29 Global Step: 75440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:42,315-Speed 13134.40 samples/sec Loss 4.2582 LearningRate 0.0143 Epoch: 29 Global Step: 75450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:43,874-Speed 13137.40 samples/sec Loss 4.2509 LearningRate 0.0143 Epoch: 29 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:45,454-Speed 12995.52 samples/sec Loss 4.2794 LearningRate 0.0143 Epoch: 29 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:47,022-Speed 13062.83 samples/sec Loss 4.2372 LearningRate 0.0143 Epoch: 29 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:48,591-Speed 13059.67 samples/sec Loss 4.3540 LearningRate 0.0143 Epoch: 29 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:50,157-Speed 13081.78 samples/sec Loss 4.3383 LearningRate 0.0143 Epoch: 29 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:51,727-Speed 13060.84 samples/sec Loss 4.3212 LearningRate 0.0142 Epoch: 29 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:53,298-Speed 13039.17 samples/sec Loss 4.2848 LearningRate 0.0142 Epoch: 29 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:24:54,851-Speed 13196.62 samples/sec Loss 4.2879 LearningRate 0.0142 Epoch: 29 Global Step: 75530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:56,432-Speed 12958.70 samples/sec Loss 4.3193 LearningRate 0.0142 Epoch: 29 Global Step: 75540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:57,999-Speed 13073.64 samples/sec Loss 4.2367 LearningRate 0.0142 Epoch: 29 Global Step: 75550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:24:59,568-Speed 13065.50 samples/sec Loss 4.2420 LearningRate 0.0142 Epoch: 29 Global Step: 75560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:01,128-Speed 13129.67 samples/sec Loss 4.3074 LearningRate 0.0142 Epoch: 29 Global Step: 75570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:02,675-Speed 13257.26 samples/sec Loss 4.1972 LearningRate 0.0142 Epoch: 29 Global Step: 75580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:04,252-Speed 12988.83 samples/sec Loss 4.3738 LearningRate 0.0142 Epoch: 29 Global Step: 75590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:05,815-Speed 13108.35 samples/sec Loss 4.3114 LearningRate 0.0141 Epoch: 29 Global Step: 75600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:07,383-Speed 13070.57 samples/sec Loss 4.2867 LearningRate 0.0141 Epoch: 29 Global Step: 75610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:08,943-Speed 13134.16 samples/sec Loss 4.2666 LearningRate 0.0141 Epoch: 29 Global Step: 75620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:10,519-Speed 13003.80 samples/sec Loss 4.2906 LearningRate 0.0141 Epoch: 29 Global Step: 75630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:12,097-Speed 12996.45 samples/sec Loss 4.3391 LearningRate 0.0141 Epoch: 29 Global Step: 75640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:13,683-Speed 12913.96 samples/sec Loss 4.3227 LearningRate 0.0141 Epoch: 29 Global Step: 75650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:15,259-Speed 13004.01 samples/sec Loss 4.3418 LearningRate 0.0141 Epoch: 29 Global Step: 75660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:16,816-Speed 13156.85 samples/sec Loss 4.3355 LearningRate 0.0141 Epoch: 29 Global Step: 75670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:18,385-Speed 13081.16 samples/sec Loss 4.3504 LearningRate 0.0141 Epoch: 29 Global Step: 75680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:19,973-Speed 12906.27 samples/sec Loss 4.3690 LearningRate 0.0140 Epoch: 29 Global Step: 75690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:21,547-Speed 13019.86 samples/sec Loss 4.3252 LearningRate 0.0140 Epoch: 29 Global Step: 75700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:23,102-Speed 13190.54 samples/sec Loss 4.3388 LearningRate 0.0140 Epoch: 29 Global Step: 75710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:25:24,664-Speed 13121.80 samples/sec Loss 4.2413 LearningRate 0.0140 Epoch: 29 Global Step: 75720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:26,217-Speed 13198.81 samples/sec Loss 4.2519 LearningRate 0.0140 Epoch: 29 Global Step: 75730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:27,796-Speed 12978.37 samples/sec Loss 4.3970 LearningRate 0.0140 Epoch: 29 Global Step: 75740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:29,379-Speed 12939.84 samples/sec Loss 4.2886 LearningRate 0.0140 Epoch: 29 Global Step: 75750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:30,950-Speed 13039.65 samples/sec Loss 4.3222 LearningRate 0.0140 Epoch: 29 Global Step: 75760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:32,522-Speed 13042.27 samples/sec Loss 4.2769 LearningRate 0.0140 Epoch: 29 Global Step: 75770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:34,088-Speed 13083.30 samples/sec Loss 4.2676 LearningRate 0.0139 Epoch: 29 Global Step: 75780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:35,644-Speed 13163.64 samples/sec Loss 4.3775 LearningRate 0.0139 Epoch: 29 Global Step: 75790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:37,223-Speed 12980.44 samples/sec Loss 4.3643 LearningRate 0.0139 Epoch: 29 Global Step: 75800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:38,807-Speed 12932.72 samples/sec Loss 4.2738 LearningRate 0.0139 Epoch: 29 Global Step: 75810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:40,375-Speed 13072.04 samples/sec Loss 4.2619 LearningRate 0.0139 Epoch: 29 Global Step: 75820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:25:41,926-Speed 13211.90 samples/sec Loss 4.3447 LearningRate 0.0139 Epoch: 29 Global Step: 75830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:43,516-Speed 12884.44 samples/sec Loss 4.2959 LearningRate 0.0139 Epoch: 29 Global Step: 75840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:45,118-Speed 12787.36 samples/sec Loss 4.3043 LearningRate 0.0139 Epoch: 29 Global Step: 75850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:46,735-Speed 12683.20 samples/sec Loss 4.2817 LearningRate 0.0139 Epoch: 29 Global Step: 75860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:25:48,194-Speed 14045.64 samples/sec Loss 4.2998 LearningRate 0.0139 Epoch: 29 Global Step: 75870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:01,861-Speed 1498.59 samples/sec Loss 3.7599 LearningRate 0.0138 Epoch: 30 Global Step: 75880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:03,460-Speed 12820.59 samples/sec Loss 3.8309 LearningRate 0.0138 Epoch: 30 Global Step: 75890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:05,026-Speed 13084.02 samples/sec Loss 3.8185 LearningRate 0.0138 Epoch: 30 Global Step: 75900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:06,609-Speed 12940.10 samples/sec Loss 3.7792 LearningRate 0.0138 Epoch: 30 Global Step: 75910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:08,165-Speed 13173.70 samples/sec Loss 3.7609 LearningRate 0.0138 Epoch: 30 Global Step: 75920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:09,752-Speed 12909.89 samples/sec Loss 3.8056 LearningRate 0.0138 Epoch: 30 Global Step: 75930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:11,318-Speed 13086.33 samples/sec Loss 3.8479 LearningRate 0.0138 Epoch: 30 Global Step: 75940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:12,887-Speed 13053.62 samples/sec Loss 3.8119 LearningRate 0.0138 Epoch: 30 Global Step: 75950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:14,461-Speed 13017.03 samples/sec Loss 3.8134 LearningRate 0.0138 Epoch: 30 Global Step: 75960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:16,020-Speed 13145.77 samples/sec Loss 3.8002 LearningRate 0.0137 Epoch: 30 Global Step: 75970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:17,598-Speed 12991.30 samples/sec Loss 3.7657 LearningRate 0.0137 Epoch: 30 Global Step: 75980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:19,149-Speed 13205.16 samples/sec Loss 3.9058 LearningRate 0.0137 Epoch: 30 Global Step: 75990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:20,740-Speed 12883.86 samples/sec Loss 3.7707 LearningRate 0.0137 Epoch: 30 Global Step: 76000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:22,290-Speed 13228.10 samples/sec Loss 3.8814 LearningRate 0.0137 Epoch: 30 Global Step: 76010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:26:23,851-Speed 13121.78 samples/sec Loss 3.8486 LearningRate 0.0137 Epoch: 30 Global Step: 76020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:25,441-Speed 12885.21 samples/sec Loss 3.8272 LearningRate 0.0137 Epoch: 30 Global Step: 76030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:27,021-Speed 12971.32 samples/sec Loss 3.8400 LearningRate 0.0137 Epoch: 30 Global Step: 76040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:28,584-Speed 13113.47 samples/sec Loss 3.7888 LearningRate 0.0137 Epoch: 30 Global Step: 76050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:30,149-Speed 13091.30 samples/sec Loss 3.8694 LearningRate 0.0136 Epoch: 30 Global Step: 76060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:31,726-Speed 12995.27 samples/sec Loss 3.8371 LearningRate 0.0136 Epoch: 30 Global Step: 76070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:33,318-Speed 12865.53 samples/sec Loss 3.8468 LearningRate 0.0136 Epoch: 30 Global Step: 76080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:34,878-Speed 13140.37 samples/sec Loss 3.8735 LearningRate 0.0136 Epoch: 30 Global Step: 76090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:36,444-Speed 13082.49 samples/sec Loss 3.8710 LearningRate 0.0136 Epoch: 30 Global Step: 76100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:38,030-Speed 12918.72 samples/sec Loss 3.8218 LearningRate 0.0136 Epoch: 30 Global Step: 76110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:39,606-Speed 13004.16 samples/sec Loss 3.8704 LearningRate 0.0136 Epoch: 30 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:26:41,171-Speed 13089.55 samples/sec Loss 3.9118 LearningRate 0.0136 Epoch: 30 Global Step: 76130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:42,737-Speed 13095.20 samples/sec Loss 3.8754 LearningRate 0.0136 Epoch: 30 Global Step: 76140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:44,323-Speed 12934.38 samples/sec Loss 3.8961 LearningRate 0.0135 Epoch: 30 Global Step: 76150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:45,884-Speed 13132.54 samples/sec Loss 3.9079 LearningRate 0.0135 Epoch: 30 Global Step: 76160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:47,450-Speed 13081.82 samples/sec Loss 3.8779 LearningRate 0.0135 Epoch: 30 Global Step: 76170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:49,026-Speed 13004.44 samples/sec Loss 3.9539 LearningRate 0.0135 Epoch: 30 Global Step: 76180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:50,592-Speed 13088.13 samples/sec Loss 3.9053 LearningRate 0.0135 Epoch: 30 Global Step: 76190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:52,153-Speed 13120.90 samples/sec Loss 3.8772 LearningRate 0.0135 Epoch: 30 Global Step: 76200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:53,729-Speed 13006.91 samples/sec Loss 3.8377 LearningRate 0.0135 Epoch: 30 Global Step: 76210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:55,296-Speed 13074.09 samples/sec Loss 3.9170 LearningRate 0.0135 Epoch: 30 Global Step: 76220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:26:56,873-Speed 12997.06 samples/sec Loss 3.8660 LearningRate 0.0135 Epoch: 30 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:26:58,429-Speed 13167.45 samples/sec Loss 3.8544 LearningRate 0.0134 Epoch: 30 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:00,025-Speed 12834.41 samples/sec Loss 3.9372 LearningRate 0.0134 Epoch: 30 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:01,598-Speed 13028.23 samples/sec Loss 3.8755 LearningRate 0.0134 Epoch: 30 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:03,152-Speed 13188.36 samples/sec Loss 3.8552 LearningRate 0.0134 Epoch: 30 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:04,726-Speed 13015.90 samples/sec Loss 3.8959 LearningRate 0.0134 Epoch: 30 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:06,295-Speed 13063.57 samples/sec Loss 3.9436 LearningRate 0.0134 Epoch: 30 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:07,852-Speed 13159.17 samples/sec Loss 3.9270 LearningRate 0.0134 Epoch: 30 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:09,419-Speed 13075.33 samples/sec Loss 3.9105 LearningRate 0.0134 Epoch: 30 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:11,021-Speed 12790.75 samples/sec Loss 3.9161 LearningRate 0.0134 Epoch: 30 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:12,601-Speed 12975.20 samples/sec Loss 3.9133 LearningRate 0.0134 Epoch: 30 Global Step: 76330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:27:14,162-Speed 13123.55 samples/sec Loss 3.8852 LearningRate 0.0133 Epoch: 30 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:15,776-Speed 12693.79 samples/sec Loss 4.0093 LearningRate 0.0133 Epoch: 30 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:17,342-Speed 13088.48 samples/sec Loss 3.9179 LearningRate 0.0133 Epoch: 30 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:18,903-Speed 13125.33 samples/sec Loss 3.9787 LearningRate 0.0133 Epoch: 30 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:20,493-Speed 12881.96 samples/sec Loss 3.8728 LearningRate 0.0133 Epoch: 30 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:22,069-Speed 13002.80 samples/sec Loss 3.9503 LearningRate 0.0133 Epoch: 30 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:23,640-Speed 13049.15 samples/sec Loss 4.0643 LearningRate 0.0133 Epoch: 30 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:27:25,210-Speed 13047.48 samples/sec Loss 3.9707 LearningRate 0.0133 Epoch: 30 Global Step: 76410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:26,764-Speed 13186.11 samples/sec Loss 3.9962 LearningRate 0.0133 Epoch: 30 Global Step: 76420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:28,349-Speed 12930.01 samples/sec Loss 3.9065 LearningRate 0.0132 Epoch: 30 Global Step: 76430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:29,903-Speed 13189.41 samples/sec Loss 3.9569 LearningRate 0.0132 Epoch: 30 Global Step: 76440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:31,465-Speed 13114.70 samples/sec Loss 3.9436 LearningRate 0.0132 Epoch: 30 Global Step: 76450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:33,019-Speed 13187.61 samples/sec Loss 3.9697 LearningRate 0.0132 Epoch: 30 Global Step: 76460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:34,592-Speed 13023.82 samples/sec Loss 4.0335 LearningRate 0.0132 Epoch: 30 Global Step: 76470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:36,145-Speed 13197.31 samples/sec Loss 3.9472 LearningRate 0.0132 Epoch: 30 Global Step: 76480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:37,703-Speed 13152.92 samples/sec Loss 3.9433 LearningRate 0.0132 Epoch: 30 Global Step: 76490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:39,261-Speed 13152.62 samples/sec Loss 3.9004 LearningRate 0.0132 Epoch: 30 Global Step: 76500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:40,825-Speed 13097.64 samples/sec Loss 3.9065 LearningRate 0.0132 Epoch: 30 Global Step: 76510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:42,383-Speed 13155.91 samples/sec Loss 3.9366 LearningRate 0.0131 Epoch: 30 Global Step: 76520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:43,973-Speed 12882.52 samples/sec Loss 3.9340 LearningRate 0.0131 Epoch: 30 Global Step: 76530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:27:45,571-Speed 12841.89 samples/sec Loss 3.9593 LearningRate 0.0131 Epoch: 30 Global Step: 76540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:47,134-Speed 13105.56 samples/sec Loss 3.9709 LearningRate 0.0131 Epoch: 30 Global Step: 76550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:48,703-Speed 13065.56 samples/sec Loss 3.9620 LearningRate 0.0131 Epoch: 30 Global Step: 76560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:50,272-Speed 13061.11 samples/sec Loss 3.9004 LearningRate 0.0131 Epoch: 30 Global Step: 76570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:51,835-Speed 13110.11 samples/sec Loss 3.9550 LearningRate 0.0131 Epoch: 30 Global Step: 76580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:53,415-Speed 12968.73 samples/sec Loss 3.9675 LearningRate 0.0131 Epoch: 30 Global Step: 76590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:54,973-Speed 13150.00 samples/sec Loss 3.9288 LearningRate 0.0131 Epoch: 30 Global Step: 76600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:56,542-Speed 13058.30 samples/sec Loss 3.9536 LearningRate 0.0131 Epoch: 30 Global Step: 76610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:58,094-Speed 13202.72 samples/sec Loss 3.9865 LearningRate 0.0130 Epoch: 30 Global Step: 76620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:27:59,677-Speed 12938.53 samples/sec Loss 3.8873 LearningRate 0.0130 Epoch: 30 Global Step: 76630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:01,237-Speed 13135.27 samples/sec Loss 3.9860 LearningRate 0.0130 Epoch: 30 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:28:02,788-Speed 13214.00 samples/sec Loss 3.9879 LearningRate 0.0130 Epoch: 30 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:28:04,356-Speed 13068.05 samples/sec Loss 4.0341 LearningRate 0.0130 Epoch: 30 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:28:05,923-Speed 13075.42 samples/sec Loss 3.9439 LearningRate 0.0130 Epoch: 30 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:28:07,496-Speed 13036.85 samples/sec Loss 4.0290 LearningRate 0.0130 Epoch: 30 Global Step: 76680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:09,056-Speed 13125.13 samples/sec Loss 3.9829 LearningRate 0.0130 Epoch: 30 Global Step: 76690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:10,633-Speed 13003.70 samples/sec Loss 4.0589 LearningRate 0.0130 Epoch: 30 Global Step: 76700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:12,181-Speed 13234.54 samples/sec Loss 3.9954 LearningRate 0.0129 Epoch: 30 Global Step: 76710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:13,740-Speed 13145.69 samples/sec Loss 4.0325 LearningRate 0.0129 Epoch: 30 Global Step: 76720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:15,302-Speed 13114.55 samples/sec Loss 3.9915 LearningRate 0.0129 Epoch: 30 Global Step: 76730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:16,859-Speed 13169.62 samples/sec Loss 4.0243 LearningRate 0.0129 Epoch: 30 Global Step: 76740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:18,409-Speed 13213.97 samples/sec Loss 4.0373 LearningRate 0.0129 Epoch: 30 Global Step: 76750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:19,968-Speed 13143.25 samples/sec Loss 4.1096 LearningRate 0.0129 Epoch: 30 Global Step: 76760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:21,534-Speed 13087.91 samples/sec Loss 3.9598 LearningRate 0.0129 Epoch: 30 Global Step: 76770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:23,096-Speed 13139.59 samples/sec Loss 4.0090 LearningRate 0.0129 Epoch: 30 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:28:24,661-Speed 13089.78 samples/sec Loss 4.0873 LearningRate 0.0129 Epoch: 30 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:28:26,238-Speed 12994.36 samples/sec Loss 4.0440 LearningRate 0.0129 Epoch: 30 Global Step: 76800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:27,803-Speed 13095.28 samples/sec Loss 3.9483 LearningRate 0.0128 Epoch: 30 Global Step: 76810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:29,364-Speed 13127.47 samples/sec Loss 4.0278 LearningRate 0.0128 Epoch: 30 Global Step: 76820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:30,935-Speed 13041.92 samples/sec Loss 3.9233 LearningRate 0.0128 Epoch: 30 Global Step: 76830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:32,506-Speed 13053.62 samples/sec Loss 4.0204 LearningRate 0.0128 Epoch: 30 Global Step: 76840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:34,086-Speed 12964.35 samples/sec Loss 4.0178 LearningRate 0.0128 Epoch: 30 Global Step: 76850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:35,643-Speed 13166.42 samples/sec Loss 3.9165 LearningRate 0.0128 Epoch: 30 Global Step: 76860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:37,220-Speed 12993.98 samples/sec Loss 4.1083 LearningRate 0.0128 Epoch: 30 Global Step: 76870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:38,797-Speed 12996.26 samples/sec Loss 4.1227 LearningRate 0.0128 Epoch: 30 Global Step: 76880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:40,355-Speed 13148.27 samples/sec Loss 4.1107 LearningRate 0.0128 Epoch: 30 Global Step: 76890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:41,951-Speed 12839.78 samples/sec Loss 3.9638 LearningRate 0.0127 Epoch: 30 Global Step: 76900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:43,517-Speed 13086.47 samples/sec Loss 4.0877 LearningRate 0.0127 Epoch: 30 Global Step: 76910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:45,097-Speed 12966.02 samples/sec Loss 4.1110 LearningRate 0.0127 Epoch: 30 Global Step: 76920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:28:46,678-Speed 12965.03 samples/sec Loss 4.0125 LearningRate 0.0127 Epoch: 30 Global Step: 76930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:48,251-Speed 13023.15 samples/sec Loss 4.0827 LearningRate 0.0127 Epoch: 30 Global Step: 76940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:49,829-Speed 12990.52 samples/sec Loss 4.0439 LearningRate 0.0127 Epoch: 30 Global Step: 76950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:51,416-Speed 12905.87 samples/sec Loss 4.0598 LearningRate 0.0127 Epoch: 30 Global Step: 76960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:52,977-Speed 13132.05 samples/sec Loss 4.0641 LearningRate 0.0127 Epoch: 30 Global Step: 76970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:54,522-Speed 13261.43 samples/sec Loss 4.0502 LearningRate 0.0127 Epoch: 30 Global Step: 76980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:56,071-Speed 13230.98 samples/sec Loss 3.9835 LearningRate 0.0127 Epoch: 30 Global Step: 76990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:57,629-Speed 13150.09 samples/sec Loss 4.1463 LearningRate 0.0126 Epoch: 30 Global Step: 77000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:28:59,191-Speed 13118.79 samples/sec Loss 4.1871 LearningRate 0.0126 Epoch: 30 Global Step: 77010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:00,748-Speed 13158.30 samples/sec Loss 4.0419 LearningRate 0.0126 Epoch: 30 Global Step: 77020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:02,343-Speed 12854.88 samples/sec Loss 4.1070 LearningRate 0.0126 Epoch: 30 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:29:03,914-Speed 13036.30 samples/sec Loss 4.0274 LearningRate 0.0126 Epoch: 30 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:29:05,485-Speed 13042.30 samples/sec Loss 4.0898 LearningRate 0.0126 Epoch: 30 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:29:07,076-Speed 12886.44 samples/sec Loss 4.0170 LearningRate 0.0126 Epoch: 30 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:29:08,611-Speed 13342.55 samples/sec Loss 4.0946 LearningRate 0.0126 Epoch: 30 Global Step: 77070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:10,197-Speed 12920.83 samples/sec Loss 4.1374 LearningRate 0.0126 Epoch: 30 Global Step: 77080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:11,769-Speed 13037.75 samples/sec Loss 4.0160 LearningRate 0.0125 Epoch: 30 Global Step: 77090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:13,333-Speed 13101.17 samples/sec Loss 4.0855 LearningRate 0.0125 Epoch: 30 Global Step: 77100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:14,906-Speed 13028.44 samples/sec Loss 4.0824 LearningRate 0.0125 Epoch: 30 Global Step: 77110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:16,476-Speed 13060.49 samples/sec Loss 4.0042 LearningRate 0.0125 Epoch: 30 Global Step: 77120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:18,036-Speed 13131.69 samples/sec Loss 4.0813 LearningRate 0.0125 Epoch: 30 Global Step: 77130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:19,597-Speed 13124.40 samples/sec Loss 4.0802 LearningRate 0.0125 Epoch: 30 Global Step: 77140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:21,163-Speed 13088.84 samples/sec Loss 4.0630 LearningRate 0.0125 Epoch: 30 Global Step: 77150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:22,743-Speed 12972.85 samples/sec Loss 4.0559 LearningRate 0.0125 Epoch: 30 Global Step: 77160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:24,315-Speed 13033.73 samples/sec Loss 4.0955 LearningRate 0.0125 Epoch: 30 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:29:25,898-Speed 12941.63 samples/sec Loss 4.0529 LearningRate 0.0125 Epoch: 30 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:29:27,456-Speed 13152.05 samples/sec Loss 4.1180 LearningRate 0.0124 Epoch: 30 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:29:29,001-Speed 13266.38 samples/sec Loss 4.0643 LearningRate 0.0124 Epoch: 30 Global Step: 77200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:30,566-Speed 13088.42 samples/sec Loss 4.0297 LearningRate 0.0124 Epoch: 30 Global Step: 77210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:32,148-Speed 12952.55 samples/sec Loss 4.0376 LearningRate 0.0124 Epoch: 30 Global Step: 77220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:33,723-Speed 13009.17 samples/sec Loss 4.0859 LearningRate 0.0124 Epoch: 30 Global Step: 77230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:35,286-Speed 13110.58 samples/sec Loss 4.0879 LearningRate 0.0124 Epoch: 30 Global Step: 77240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:36,855-Speed 13064.40 samples/sec Loss 4.0577 LearningRate 0.0124 Epoch: 30 Global Step: 77250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:38,430-Speed 13007.25 samples/sec Loss 4.0762 LearningRate 0.0124 Epoch: 30 Global Step: 77260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:40,034-Speed 12773.30 samples/sec Loss 4.1643 LearningRate 0.0124 Epoch: 30 Global Step: 77270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:41,606-Speed 13040.14 samples/sec Loss 4.1128 LearningRate 0.0123 Epoch: 30 Global Step: 77280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:43,167-Speed 13126.76 samples/sec Loss 4.0831 LearningRate 0.0123 Epoch: 30 Global Step: 77290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:44,729-Speed 13116.78 samples/sec Loss 4.0888 LearningRate 0.0123 Epoch: 30 Global Step: 77300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:46,318-Speed 12895.37 samples/sec Loss 4.0920 LearningRate 0.0123 Epoch: 30 Global Step: 77310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:47,872-Speed 13189.28 samples/sec Loss 4.0658 LearningRate 0.0123 Epoch: 30 Global Step: 77320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:49,454-Speed 12950.70 samples/sec Loss 4.1688 LearningRate 0.0123 Epoch: 30 Global Step: 77330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:51,005-Speed 13213.47 samples/sec Loss 4.0958 LearningRate 0.0123 Epoch: 30 Global Step: 77340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:52,576-Speed 13046.25 samples/sec Loss 4.1868 LearningRate 0.0123 Epoch: 30 Global Step: 77350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:54,143-Speed 13070.77 samples/sec Loss 4.0346 LearningRate 0.0123 Epoch: 30 Global Step: 77360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:55,713-Speed 13054.47 samples/sec Loss 4.1468 LearningRate 0.0123 Epoch: 30 Global Step: 77370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:57,289-Speed 12998.90 samples/sec Loss 4.1719 LearningRate 0.0122 Epoch: 30 Global Step: 77380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:29:58,849-Speed 13137.21 samples/sec Loss 4.0982 LearningRate 0.0122 Epoch: 30 Global Step: 77390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:00,423-Speed 13023.03 samples/sec Loss 4.1442 LearningRate 0.0122 Epoch: 30 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:01,990-Speed 13082.62 samples/sec Loss 4.0806 LearningRate 0.0122 Epoch: 30 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:03,552-Speed 13117.52 samples/sec Loss 3.9910 LearningRate 0.0122 Epoch: 30 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:05,113-Speed 13126.35 samples/sec Loss 4.0618 LearningRate 0.0122 Epoch: 30 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:06,695-Speed 12970.18 samples/sec Loss 4.1452 LearningRate 0.0122 Epoch: 30 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:08,262-Speed 13079.53 samples/sec Loss 4.1517 LearningRate 0.0122 Epoch: 30 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:09,828-Speed 13084.88 samples/sec Loss 4.1207 LearningRate 0.0122 Epoch: 30 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:11,408-Speed 12962.35 samples/sec Loss 4.0996 LearningRate 0.0122 Epoch: 30 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:12,972-Speed 13106.17 samples/sec Loss 4.1391 LearningRate 0.0121 Epoch: 30 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:30:14,540-Speed 13063.31 samples/sec Loss 4.0626 LearningRate 0.0121 Epoch: 30 Global Step: 77490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:16,090-Speed 13217.69 samples/sec Loss 4.1218 LearningRate 0.0121 Epoch: 30 Global Step: 77500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:17,653-Speed 13112.77 samples/sec Loss 4.1041 LearningRate 0.0121 Epoch: 30 Global Step: 77510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:19,227-Speed 13020.19 samples/sec Loss 4.1074 LearningRate 0.0121 Epoch: 30 Global Step: 77520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:20,784-Speed 13158.43 samples/sec Loss 4.1900 LearningRate 0.0121 Epoch: 30 Global Step: 77530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:22,362-Speed 12991.28 samples/sec Loss 4.1445 LearningRate 0.0121 Epoch: 30 Global Step: 77540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:23,928-Speed 13081.31 samples/sec Loss 4.1382 LearningRate 0.0121 Epoch: 30 Global Step: 77550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:25,473-Speed 13259.76 samples/sec Loss 4.1086 LearningRate 0.0121 Epoch: 30 Global Step: 77560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:27,053-Speed 12978.02 samples/sec Loss 4.1777 LearningRate 0.0121 Epoch: 30 Global Step: 77570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:28,630-Speed 12994.26 samples/sec Loss 4.1132 LearningRate 0.0120 Epoch: 30 Global Step: 77580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:30,191-Speed 13126.68 samples/sec Loss 4.0374 LearningRate 0.0120 Epoch: 30 Global Step: 77590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:31,755-Speed 13097.26 samples/sec Loss 4.2068 LearningRate 0.0120 Epoch: 30 Global Step: 77600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:33,353-Speed 12823.46 samples/sec Loss 4.0916 LearningRate 0.0120 Epoch: 30 Global Step: 77610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:34,919-Speed 13086.86 samples/sec Loss 4.2386 LearningRate 0.0120 Epoch: 30 Global Step: 77620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:36,481-Speed 13111.31 samples/sec Loss 4.1221 LearningRate 0.0120 Epoch: 30 Global Step: 77630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:38,066-Speed 12930.94 samples/sec Loss 4.1238 LearningRate 0.0120 Epoch: 30 Global Step: 77640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:39,640-Speed 13022.77 samples/sec Loss 4.2036 LearningRate 0.0120 Epoch: 30 Global Step: 77650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:41,205-Speed 13087.79 samples/sec Loss 4.1359 LearningRate 0.0120 Epoch: 30 Global Step: 77660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:42,778-Speed 13026.96 samples/sec Loss 4.1829 LearningRate 0.0119 Epoch: 30 Global Step: 77670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:44,359-Speed 12961.53 samples/sec Loss 4.1327 LearningRate 0.0119 Epoch: 30 Global Step: 77680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:45,942-Speed 12947.56 samples/sec Loss 4.1404 LearningRate 0.0119 Epoch: 30 Global Step: 77690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:47,526-Speed 12940.60 samples/sec Loss 4.1297 LearningRate 0.0119 Epoch: 30 Global Step: 77700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:49,088-Speed 13117.95 samples/sec Loss 4.0812 LearningRate 0.0119 Epoch: 30 Global Step: 77710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:30:50,644-Speed 13163.55 samples/sec Loss 4.1758 LearningRate 0.0119 Epoch: 30 Global Step: 77720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:52,216-Speed 13040.31 samples/sec Loss 4.0928 LearningRate 0.0119 Epoch: 30 Global Step: 77730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:53,778-Speed 13114.93 samples/sec Loss 4.1479 LearningRate 0.0119 Epoch: 30 Global Step: 77740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:55,359-Speed 12965.84 samples/sec Loss 4.2424 LearningRate 0.0119 Epoch: 30 Global Step: 77750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:56,938-Speed 12981.08 samples/sec Loss 4.2032 LearningRate 0.0119 Epoch: 30 Global Step: 77760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:30:58,504-Speed 13078.61 samples/sec Loss 4.1514 LearningRate 0.0118 Epoch: 30 Global Step: 77770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:31:00,091-Speed 12913.60 samples/sec Loss 4.1303 LearningRate 0.0118 Epoch: 30 Global Step: 77780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:31:01,672-Speed 12957.88 samples/sec Loss 4.1930 LearningRate 0.0118 Epoch: 30 Global Step: 77790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:31:03,241-Speed 13064.17 samples/sec Loss 4.1449 LearningRate 0.0118 Epoch: 30 Global Step: 77800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:31:04,827-Speed 12917.74 samples/sec Loss 4.1326 LearningRate 0.0118 Epoch: 30 Global Step: 77810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:31:06,394-Speed 13081.21 samples/sec Loss 4.1366 LearningRate 0.0118 Epoch: 30 Global Step: 77820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:07,956-Speed 13119.59 samples/sec Loss 4.1668 LearningRate 0.0118 Epoch: 30 Global Step: 77830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:09,520-Speed 13097.35 samples/sec Loss 4.1444 LearningRate 0.0118 Epoch: 30 Global Step: 77840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:11,076-Speed 13176.21 samples/sec Loss 4.1651 LearningRate 0.0118 Epoch: 30 Global Step: 77850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:12,644-Speed 13065.89 samples/sec Loss 4.1474 LearningRate 0.0118 Epoch: 30 Global Step: 77860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:14,221-Speed 12994.65 samples/sec Loss 4.1349 LearningRate 0.0117 Epoch: 30 Global Step: 77870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:15,765-Speed 13270.40 samples/sec Loss 4.1723 LearningRate 0.0117 Epoch: 30 Global Step: 77880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:17,347-Speed 12951.05 samples/sec Loss 4.2413 LearningRate 0.0117 Epoch: 30 Global Step: 77890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:18,907-Speed 13137.27 samples/sec Loss 4.1829 LearningRate 0.0117 Epoch: 30 Global Step: 77900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:20,477-Speed 13052.39 samples/sec Loss 4.2909 LearningRate 0.0117 Epoch: 30 Global Step: 77910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:22,027-Speed 13218.83 samples/sec Loss 4.1155 LearningRate 0.0117 Epoch: 30 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:23,611-Speed 12937.14 samples/sec Loss 4.1450 LearningRate 0.0117 Epoch: 30 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:25,174-Speed 13105.39 samples/sec Loss 4.1975 LearningRate 0.0117 Epoch: 30 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:26,727-Speed 13190.52 samples/sec Loss 4.0693 LearningRate 0.0117 Epoch: 30 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:28,275-Speed 13238.59 samples/sec Loss 4.0769 LearningRate 0.0117 Epoch: 30 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:29,815-Speed 13312.73 samples/sec Loss 4.0838 LearningRate 0.0116 Epoch: 30 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:31,392-Speed 12989.10 samples/sec Loss 4.1613 LearningRate 0.0116 Epoch: 30 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:32,971-Speed 12976.83 samples/sec Loss 4.1340 LearningRate 0.0116 Epoch: 30 Global Step: 77990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:34,563-Speed 12876.96 samples/sec Loss 4.1114 LearningRate 0.0116 Epoch: 30 Global Step: 78000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:36,122-Speed 13140.46 samples/sec Loss 4.1212 LearningRate 0.0116 Epoch: 30 Global Step: 78010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:37,696-Speed 13019.08 samples/sec Loss 4.2155 LearningRate 0.0116 Epoch: 30 Global Step: 78020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:39,263-Speed 13079.72 samples/sec Loss 4.1607 LearningRate 0.0116 Epoch: 30 Global Step: 78030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:40,825-Speed 13113.19 samples/sec Loss 4.1977 LearningRate 0.0116 Epoch: 30 Global Step: 78040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:42,391-Speed 13086.97 samples/sec Loss 4.2497 LearningRate 0.0116 Epoch: 30 Global Step: 78050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:43,949-Speed 13154.28 samples/sec Loss 4.1553 LearningRate 0.0116 Epoch: 30 Global Step: 78060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:45,517-Speed 13063.52 samples/sec Loss 4.1966 LearningRate 0.0115 Epoch: 30 Global Step: 78070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:47,077-Speed 13138.45 samples/sec Loss 4.1471 LearningRate 0.0115 Epoch: 30 Global Step: 78080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:48,652-Speed 13008.01 samples/sec Loss 4.2327 LearningRate 0.0115 Epoch: 30 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:31:50,192-Speed 13305.54 samples/sec Loss 4.1846 LearningRate 0.0115 Epoch: 30 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:51,759-Speed 13078.27 samples/sec Loss 4.1743 LearningRate 0.0115 Epoch: 30 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:53,335-Speed 12999.16 samples/sec Loss 4.2436 LearningRate 0.0115 Epoch: 30 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:54,906-Speed 13042.65 samples/sec Loss 4.1971 LearningRate 0.0115 Epoch: 30 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:56,488-Speed 12952.74 samples/sec Loss 4.2165 LearningRate 0.0115 Epoch: 30 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:58,051-Speed 13112.36 samples/sec Loss 4.0960 LearningRate 0.0115 Epoch: 30 Global Step: 78150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:31:59,616-Speed 13091.65 samples/sec Loss 4.2184 LearningRate 0.0115 Epoch: 30 Global Step: 78160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:01,174-Speed 13152.76 samples/sec Loss 4.1744 LearningRate 0.0114 Epoch: 30 Global Step: 78170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:02,754-Speed 12972.41 samples/sec Loss 4.2025 LearningRate 0.0114 Epoch: 30 Global Step: 78180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:04,319-Speed 13093.74 samples/sec Loss 4.1799 LearningRate 0.0114 Epoch: 30 Global Step: 78190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:05,900-Speed 12958.31 samples/sec Loss 4.1966 LearningRate 0.0114 Epoch: 30 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:32:07,450-Speed 13225.98 samples/sec Loss 4.1831 LearningRate 0.0114 Epoch: 30 Global Step: 78210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:09,024-Speed 13013.89 samples/sec Loss 4.2056 LearningRate 0.0114 Epoch: 30 Global Step: 78220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:10,625-Speed 12798.43 samples/sec Loss 4.1615 LearningRate 0.0114 Epoch: 30 Global Step: 78230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:12,215-Speed 12892.01 samples/sec Loss 4.2240 LearningRate 0.0114 Epoch: 30 Global Step: 78240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:13,773-Speed 13146.68 samples/sec Loss 4.1901 LearningRate 0.0114 Epoch: 30 Global Step: 78250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:15,377-Speed 12777.80 samples/sec Loss 4.1836 LearningRate 0.0114 Epoch: 30 Global Step: 78260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:16,934-Speed 13161.87 samples/sec Loss 4.2045 LearningRate 0.0113 Epoch: 30 Global Step: 78270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:18,543-Speed 12737.68 samples/sec Loss 4.1963 LearningRate 0.0113 Epoch: 30 Global Step: 78280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:20,102-Speed 13140.23 samples/sec Loss 4.1987 LearningRate 0.0113 Epoch: 30 Global Step: 78290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:21,653-Speed 13206.51 samples/sec Loss 4.2127 LearningRate 0.0113 Epoch: 30 Global Step: 78300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:23,211-Speed 13153.53 samples/sec Loss 4.1927 LearningRate 0.0113 Epoch: 30 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:32:24,785-Speed 13018.86 samples/sec Loss 4.1101 LearningRate 0.0113 Epoch: 30 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:32:26,338-Speed 13196.14 samples/sec Loss 4.2098 LearningRate 0.0113 Epoch: 30 Global Step: 78330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:27,921-Speed 12946.06 samples/sec Loss 4.1341 LearningRate 0.0113 Epoch: 30 Global Step: 78340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:29,493-Speed 13032.80 samples/sec Loss 4.1681 LearningRate 0.0113 Epoch: 30 Global Step: 78350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:31,044-Speed 13214.43 samples/sec Loss 4.2411 LearningRate 0.0113 Epoch: 30 Global Step: 78360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:32,599-Speed 13180.70 samples/sec Loss 4.1888 LearningRate 0.0112 Epoch: 30 Global Step: 78370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:34,180-Speed 12979.08 samples/sec Loss 4.1785 LearningRate 0.0112 Epoch: 30 Global Step: 78380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:35,753-Speed 13023.54 samples/sec Loss 4.2493 LearningRate 0.0112 Epoch: 30 Global Step: 78390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:51,363-Speed 1312.06 samples/sec Loss 4.0890 LearningRate 0.0112 Epoch: 31 Global Step: 78400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:53,006-Speed 12478.70 samples/sec Loss 3.7909 LearningRate 0.0112 Epoch: 31 Global Step: 78410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:54,612-Speed 12771.42 samples/sec Loss 3.7639 LearningRate 0.0112 Epoch: 31 Global Step: 78420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:56,157-Speed 13267.35 samples/sec Loss 3.7240 LearningRate 0.0112 Epoch: 31 Global Step: 78430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:57,737-Speed 12968.32 samples/sec Loss 3.7149 LearningRate 0.0112 Epoch: 31 Global Step: 78440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:32:59,318-Speed 12954.63 samples/sec Loss 3.7517 LearningRate 0.0112 Epoch: 31 Global Step: 78450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:00,905-Speed 12910.86 samples/sec Loss 3.6252 LearningRate 0.0112 Epoch: 31 Global Step: 78460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:02,500-Speed 12849.42 samples/sec Loss 3.6852 LearningRate 0.0111 Epoch: 31 Global Step: 78470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:04,064-Speed 13101.44 samples/sec Loss 3.7231 LearningRate 0.0111 Epoch: 31 Global Step: 78480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:05,645-Speed 12961.82 samples/sec Loss 3.6858 LearningRate 0.0111 Epoch: 31 Global Step: 78490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:07,202-Speed 13154.36 samples/sec Loss 3.7329 LearningRate 0.0111 Epoch: 31 Global Step: 78500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:08,793-Speed 12878.83 samples/sec Loss 3.7352 LearningRate 0.0111 Epoch: 31 Global Step: 78510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:10,366-Speed 13031.46 samples/sec Loss 3.6456 LearningRate 0.0111 Epoch: 31 Global Step: 78520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:11,942-Speed 13018.50 samples/sec Loss 3.7492 LearningRate 0.0111 Epoch: 31 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:13,532-Speed 12897.34 samples/sec Loss 3.7245 LearningRate 0.0111 Epoch: 31 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:15,062-Speed 13391.64 samples/sec Loss 3.8062 LearningRate 0.0111 Epoch: 31 Global Step: 78550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:16,627-Speed 13089.57 samples/sec Loss 3.7212 LearningRate 0.0111 Epoch: 31 Global Step: 78560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:18,211-Speed 12937.77 samples/sec Loss 3.7627 LearningRate 0.0111 Epoch: 31 Global Step: 78570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:19,766-Speed 13178.69 samples/sec Loss 3.7429 LearningRate 0.0110 Epoch: 31 Global Step: 78580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:21,325-Speed 13171.20 samples/sec Loss 3.8079 LearningRate 0.0110 Epoch: 31 Global Step: 78590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:22,897-Speed 13031.25 samples/sec Loss 3.6844 LearningRate 0.0110 Epoch: 31 Global Step: 78600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:24,464-Speed 13073.79 samples/sec Loss 3.7216 LearningRate 0.0110 Epoch: 31 Global Step: 78610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:26,046-Speed 12953.24 samples/sec Loss 3.7590 LearningRate 0.0110 Epoch: 31 Global Step: 78620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:27,636-Speed 12888.13 samples/sec Loss 3.7448 LearningRate 0.0110 Epoch: 31 Global Step: 78630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:29,226-Speed 12893.87 samples/sec Loss 3.8020 LearningRate 0.0110 Epoch: 31 Global Step: 78640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:33:30,788-Speed 13109.94 samples/sec Loss 3.7603 LearningRate 0.0110 Epoch: 31 Global Step: 78650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:32,356-Speed 13072.47 samples/sec Loss 3.7618 LearningRate 0.0110 Epoch: 31 Global Step: 78660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:33,925-Speed 13063.58 samples/sec Loss 3.7816 LearningRate 0.0110 Epoch: 31 Global Step: 78670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:35,508-Speed 12949.44 samples/sec Loss 3.8277 LearningRate 0.0109 Epoch: 31 Global Step: 78680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:37,089-Speed 12965.86 samples/sec Loss 3.8072 LearningRate 0.0109 Epoch: 31 Global Step: 78690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:38,650-Speed 13119.04 samples/sec Loss 3.8536 LearningRate 0.0109 Epoch: 31 Global Step: 78700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:40,206-Speed 13168.64 samples/sec Loss 3.7506 LearningRate 0.0109 Epoch: 31 Global Step: 78710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:41,783-Speed 12992.68 samples/sec Loss 3.7677 LearningRate 0.0109 Epoch: 31 Global Step: 78720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:43,322-Speed 13322.08 samples/sec Loss 3.8064 LearningRate 0.0109 Epoch: 31 Global Step: 78730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:44,898-Speed 12994.49 samples/sec Loss 3.8313 LearningRate 0.0109 Epoch: 31 Global Step: 78740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:33:46,450-Speed 13205.68 samples/sec Loss 3.7730 LearningRate 0.0109 Epoch: 31 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:48,006-Speed 13169.11 samples/sec Loss 3.8496 LearningRate 0.0109 Epoch: 31 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:49,585-Speed 12978.06 samples/sec Loss 3.8765 LearningRate 0.0109 Epoch: 31 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:51,152-Speed 13069.90 samples/sec Loss 3.7628 LearningRate 0.0108 Epoch: 31 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:52,724-Speed 13041.09 samples/sec Loss 3.7732 LearningRate 0.0108 Epoch: 31 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:54,293-Speed 13058.64 samples/sec Loss 3.7906 LearningRate 0.0108 Epoch: 31 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:55,868-Speed 13010.87 samples/sec Loss 3.8461 LearningRate 0.0108 Epoch: 31 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:57,453-Speed 12930.24 samples/sec Loss 3.8331 LearningRate 0.0108 Epoch: 31 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:33:59,011-Speed 13153.81 samples/sec Loss 3.8040 LearningRate 0.0108 Epoch: 31 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:34:00,581-Speed 13050.97 samples/sec Loss 3.8088 LearningRate 0.0108 Epoch: 31 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:34:02,141-Speed 13131.38 samples/sec Loss 3.8588 LearningRate 0.0108 Epoch: 31 Global Step: 78850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:34:03,709-Speed 13068.74 samples/sec Loss 3.7835 LearningRate 0.0108 Epoch: 31 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:34:05,256-Speed 13245.43 samples/sec Loss 3.8678 LearningRate 0.0108 Epoch: 31 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:34:06,828-Speed 13035.29 samples/sec Loss 3.8485 LearningRate 0.0107 Epoch: 31 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:34:08,379-Speed 13209.98 samples/sec Loss 3.8011 LearningRate 0.0107 Epoch: 31 Global Step: 78890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:09,932-Speed 13197.99 samples/sec Loss 3.7737 LearningRate 0.0107 Epoch: 31 Global Step: 78900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:11,504-Speed 13031.42 samples/sec Loss 3.8356 LearningRate 0.0107 Epoch: 31 Global Step: 78910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:13,070-Speed 13114.14 samples/sec Loss 3.8878 LearningRate 0.0107 Epoch: 31 Global Step: 78920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:14,646-Speed 13005.32 samples/sec Loss 3.8505 LearningRate 0.0107 Epoch: 31 Global Step: 78930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:16,220-Speed 13020.48 samples/sec Loss 3.8386 LearningRate 0.0107 Epoch: 31 Global Step: 78940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:17,775-Speed 13176.65 samples/sec Loss 3.8438 LearningRate 0.0107 Epoch: 31 Global Step: 78950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:19,333-Speed 13155.74 samples/sec Loss 3.8741 LearningRate 0.0107 Epoch: 31 Global Step: 78960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:20,891-Speed 13151.19 samples/sec Loss 3.8184 LearningRate 0.0107 Epoch: 31 Global Step: 78970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:22,467-Speed 13007.75 samples/sec Loss 3.8734 LearningRate 0.0107 Epoch: 31 Global Step: 78980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:24,036-Speed 13059.43 samples/sec Loss 3.8952 LearningRate 0.0106 Epoch: 31 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:34:25,585-Speed 13230.43 samples/sec Loss 3.8749 LearningRate 0.0106 Epoch: 31 Global Step: 79000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:27,144-Speed 13147.13 samples/sec Loss 3.8408 LearningRate 0.0106 Epoch: 31 Global Step: 79010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:28,719-Speed 13007.99 samples/sec Loss 3.8245 LearningRate 0.0106 Epoch: 31 Global Step: 79020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:30,258-Speed 13311.24 samples/sec Loss 3.8461 LearningRate 0.0106 Epoch: 31 Global Step: 79030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:31,809-Speed 13238.29 samples/sec Loss 3.8849 LearningRate 0.0106 Epoch: 31 Global Step: 79040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:33,365-Speed 13174.84 samples/sec Loss 3.8424 LearningRate 0.0106 Epoch: 31 Global Step: 79050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:34,928-Speed 13112.03 samples/sec Loss 3.8887 LearningRate 0.0106 Epoch: 31 Global Step: 79060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:36,481-Speed 13188.29 samples/sec Loss 3.8858 LearningRate 0.0106 Epoch: 31 Global Step: 79070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:38,089-Speed 12750.68 samples/sec Loss 3.8526 LearningRate 0.0106 Epoch: 31 Global Step: 79080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:39,662-Speed 13024.76 samples/sec Loss 3.8481 LearningRate 0.0105 Epoch: 31 Global Step: 79090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:41,244-Speed 12953.23 samples/sec Loss 3.8702 LearningRate 0.0105 Epoch: 31 Global Step: 79100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:42,819-Speed 13014.29 samples/sec Loss 3.9747 LearningRate 0.0105 Epoch: 31 Global Step: 79110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:44,406-Speed 12906.27 samples/sec Loss 3.8101 LearningRate 0.0105 Epoch: 31 Global Step: 79120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:34:46,022-Speed 12681.80 samples/sec Loss 3.8539 LearningRate 0.0105 Epoch: 31 Global Step: 79130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:47,565-Speed 13291.49 samples/sec Loss 3.8359 LearningRate 0.0105 Epoch: 31 Global Step: 79140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:49,126-Speed 13124.01 samples/sec Loss 3.8676 LearningRate 0.0105 Epoch: 31 Global Step: 79150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:50,673-Speed 13242.76 samples/sec Loss 3.8820 LearningRate 0.0105 Epoch: 31 Global Step: 79160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:52,237-Speed 13106.09 samples/sec Loss 3.9407 LearningRate 0.0105 Epoch: 31 Global Step: 79170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:53,803-Speed 13085.39 samples/sec Loss 3.9291 LearningRate 0.0105 Epoch: 31 Global Step: 79180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:55,366-Speed 13113.48 samples/sec Loss 3.8638 LearningRate 0.0105 Epoch: 31 Global Step: 79190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:56,929-Speed 13106.70 samples/sec Loss 3.8365 LearningRate 0.0104 Epoch: 31 Global Step: 79200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:34:58,520-Speed 12900.51 samples/sec Loss 3.9275 LearningRate 0.0104 Epoch: 31 Global Step: 79210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:00,080-Speed 13156.65 samples/sec Loss 3.8562 LearningRate 0.0104 Epoch: 31 Global Step: 79220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:01,641-Speed 13123.79 samples/sec Loss 3.9315 LearningRate 0.0104 Epoch: 31 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:35:03,211-Speed 13054.51 samples/sec Loss 3.9085 LearningRate 0.0104 Epoch: 31 Global Step: 79240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:04,789-Speed 12989.12 samples/sec Loss 3.9062 LearningRate 0.0104 Epoch: 31 Global Step: 79250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:06,333-Speed 13298.86 samples/sec Loss 3.8339 LearningRate 0.0104 Epoch: 31 Global Step: 79260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:07,896-Speed 13111.26 samples/sec Loss 3.8340 LearningRate 0.0104 Epoch: 31 Global Step: 79270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:09,457-Speed 13126.32 samples/sec Loss 4.0166 LearningRate 0.0104 Epoch: 31 Global Step: 79280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:11,037-Speed 12968.63 samples/sec Loss 3.9287 LearningRate 0.0104 Epoch: 31 Global Step: 79290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:12,614-Speed 12993.42 samples/sec Loss 3.9136 LearningRate 0.0103 Epoch: 31 Global Step: 79300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:14,177-Speed 13110.35 samples/sec Loss 3.8368 LearningRate 0.0103 Epoch: 31 Global Step: 79310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:15,768-Speed 12878.89 samples/sec Loss 3.9518 LearningRate 0.0103 Epoch: 31 Global Step: 79320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:17,314-Speed 13256.31 samples/sec Loss 3.8995 LearningRate 0.0103 Epoch: 31 Global Step: 79330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:18,886-Speed 13028.92 samples/sec Loss 3.9184 LearningRate 0.0103 Epoch: 31 Global Step: 79340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:20,441-Speed 13176.61 samples/sec Loss 3.9081 LearningRate 0.0103 Epoch: 31 Global Step: 79350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:22,008-Speed 13078.50 samples/sec Loss 3.8616 LearningRate 0.0103 Epoch: 31 Global Step: 79360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:23,570-Speed 13120.26 samples/sec Loss 3.8863 LearningRate 0.0103 Epoch: 31 Global Step: 79370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:35:25,147-Speed 12996.27 samples/sec Loss 3.9097 LearningRate 0.0103 Epoch: 31 Global Step: 79380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:26,713-Speed 13079.27 samples/sec Loss 3.9734 LearningRate 0.0103 Epoch: 31 Global Step: 79390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:28,271-Speed 13155.91 samples/sec Loss 3.9328 LearningRate 0.0103 Epoch: 31 Global Step: 79400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:29,838-Speed 13077.05 samples/sec Loss 3.8995 LearningRate 0.0102 Epoch: 31 Global Step: 79410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:31,400-Speed 13118.03 samples/sec Loss 3.9543 LearningRate 0.0102 Epoch: 31 Global Step: 79420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:32,980-Speed 12991.68 samples/sec Loss 3.8669 LearningRate 0.0102 Epoch: 31 Global Step: 79430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:34,563-Speed 12944.49 samples/sec Loss 3.9621 LearningRate 0.0102 Epoch: 31 Global Step: 79440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:36,145-Speed 12948.81 samples/sec Loss 3.9177 LearningRate 0.0102 Epoch: 31 Global Step: 79450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:37,715-Speed 13054.86 samples/sec Loss 3.9438 LearningRate 0.0102 Epoch: 31 Global Step: 79460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:39,284-Speed 13060.46 samples/sec Loss 3.8960 LearningRate 0.0102 Epoch: 31 Global Step: 79470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:40,862-Speed 12983.58 samples/sec Loss 3.9302 LearningRate 0.0102 Epoch: 31 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:35:42,427-Speed 13096.97 samples/sec Loss 3.9632 LearningRate 0.0102 Epoch: 31 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:35:43,996-Speed 13056.10 samples/sec Loss 3.8737 LearningRate 0.0102 Epoch: 31 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:35:45,545-Speed 13226.21 samples/sec Loss 3.9396 LearningRate 0.0102 Epoch: 31 Global Step: 79510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:47,125-Speed 12971.70 samples/sec Loss 3.8980 LearningRate 0.0101 Epoch: 31 Global Step: 79520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:48,685-Speed 13137.56 samples/sec Loss 3.8916 LearningRate 0.0101 Epoch: 31 Global Step: 79530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:50,238-Speed 13196.55 samples/sec Loss 3.8993 LearningRate 0.0101 Epoch: 31 Global Step: 79540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:51,809-Speed 13041.82 samples/sec Loss 3.9856 LearningRate 0.0101 Epoch: 31 Global Step: 79550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:53,368-Speed 13146.07 samples/sec Loss 3.9277 LearningRate 0.0101 Epoch: 31 Global Step: 79560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:54,937-Speed 13060.62 samples/sec Loss 3.9188 LearningRate 0.0101 Epoch: 31 Global Step: 79570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:56,496-Speed 13140.14 samples/sec Loss 3.9176 LearningRate 0.0101 Epoch: 31 Global Step: 79580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:58,059-Speed 13115.47 samples/sec Loss 3.9089 LearningRate 0.0101 Epoch: 31 Global Step: 79590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:35:59,612-Speed 13191.40 samples/sec Loss 3.9811 LearningRate 0.0101 Epoch: 31 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:01,163-Speed 13207.70 samples/sec Loss 3.9246 LearningRate 0.0101 Epoch: 31 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:02,728-Speed 13101.93 samples/sec Loss 3.9053 LearningRate 0.0100 Epoch: 31 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:04,289-Speed 13148.73 samples/sec Loss 3.9348 LearningRate 0.0100 Epoch: 31 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:05,854-Speed 13086.54 samples/sec Loss 3.9446 LearningRate 0.0100 Epoch: 31 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:07,436-Speed 12960.18 samples/sec Loss 3.9961 LearningRate 0.0100 Epoch: 31 Global Step: 79650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:09,006-Speed 13054.70 samples/sec Loss 3.9444 LearningRate 0.0100 Epoch: 31 Global Step: 79660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:10,575-Speed 13053.98 samples/sec Loss 3.9255 LearningRate 0.0100 Epoch: 31 Global Step: 79670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:12,114-Speed 13313.36 samples/sec Loss 3.9802 LearningRate 0.0100 Epoch: 31 Global Step: 79680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:13,674-Speed 13143.80 samples/sec Loss 4.0526 LearningRate 0.0100 Epoch: 31 Global Step: 79690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:15,232-Speed 13150.24 samples/sec Loss 4.0366 LearningRate 0.0100 Epoch: 31 Global Step: 79700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:16,780-Speed 13240.52 samples/sec Loss 4.0355 LearningRate 0.0100 Epoch: 31 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:18,330-Speed 13215.97 samples/sec Loss 3.9463 LearningRate 0.0100 Epoch: 31 Global Step: 79720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:19,946-Speed 12676.81 samples/sec Loss 3.9966 LearningRate 0.0099 Epoch: 31 Global Step: 79730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:21,531-Speed 12936.54 samples/sec Loss 4.0195 LearningRate 0.0099 Epoch: 31 Global Step: 79740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:23,100-Speed 13064.72 samples/sec Loss 3.9863 LearningRate 0.0099 Epoch: 31 Global Step: 79750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:24,674-Speed 13017.84 samples/sec Loss 3.8994 LearningRate 0.0099 Epoch: 31 Global Step: 79760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:26,236-Speed 13117.59 samples/sec Loss 3.9258 LearningRate 0.0099 Epoch: 31 Global Step: 79770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:27,786-Speed 13222.04 samples/sec Loss 3.9295 LearningRate 0.0099 Epoch: 31 Global Step: 79780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:29,349-Speed 13114.28 samples/sec Loss 3.9404 LearningRate 0.0099 Epoch: 31 Global Step: 79790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:30,919-Speed 13046.22 samples/sec Loss 3.9138 LearningRate 0.0099 Epoch: 31 Global Step: 79800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:32,471-Speed 13204.11 samples/sec Loss 3.9947 LearningRate 0.0099 Epoch: 31 Global Step: 79810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:36:34,026-Speed 13183.58 samples/sec Loss 3.9636 LearningRate 0.0099 Epoch: 31 Global Step: 79820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:35,580-Speed 13185.75 samples/sec Loss 3.8872 LearningRate 0.0099 Epoch: 31 Global Step: 79830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:37,137-Speed 13152.66 samples/sec Loss 3.9907 LearningRate 0.0098 Epoch: 31 Global Step: 79840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:38,700-Speed 13122.08 samples/sec Loss 3.9881 LearningRate 0.0098 Epoch: 31 Global Step: 79850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:40,283-Speed 12939.80 samples/sec Loss 3.9661 LearningRate 0.0098 Epoch: 31 Global Step: 79860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:41,856-Speed 13028.09 samples/sec Loss 3.9501 LearningRate 0.0098 Epoch: 31 Global Step: 79870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:43,418-Speed 13123.57 samples/sec Loss 3.9483 LearningRate 0.0098 Epoch: 31 Global Step: 79880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:44,965-Speed 13235.49 samples/sec Loss 3.9995 LearningRate 0.0098 Epoch: 31 Global Step: 79890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:46,529-Speed 13107.39 samples/sec Loss 3.9277 LearningRate 0.0098 Epoch: 31 Global Step: 79900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:48,089-Speed 13133.54 samples/sec Loss 3.9288 LearningRate 0.0098 Epoch: 31 Global Step: 79910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:49,673-Speed 12933.57 samples/sec Loss 3.9622 LearningRate 0.0098 Epoch: 31 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:36:51,239-Speed 13083.56 samples/sec Loss 3.9902 LearningRate 0.0098 Epoch: 31 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:36:52,817-Speed 12989.41 samples/sec Loss 4.0426 LearningRate 0.0098 Epoch: 31 Global Step: 79940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:36:54,390-Speed 13028.37 samples/sec Loss 3.9403 LearningRate 0.0097 Epoch: 31 Global Step: 79950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:36:55,944-Speed 13182.19 samples/sec Loss 3.9882 LearningRate 0.0097 Epoch: 31 Global Step: 79960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:57,508-Speed 13107.18 samples/sec Loss 3.9745 LearningRate 0.0097 Epoch: 31 Global Step: 79970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:36:59,051-Speed 13279.35 samples/sec Loss 3.9647 LearningRate 0.0097 Epoch: 31 Global Step: 79980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:37:00,635-Speed 12936.44 samples/sec Loss 3.9122 LearningRate 0.0097 Epoch: 31 Global Step: 79990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:37:02,212-Speed 12989.06 samples/sec Loss 3.9570 LearningRate 0.0097 Epoch: 31 Global Step: 80000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:37:24,384-[lfw][80000]XNorm: 7.521472 Training: 2022-01-14 17:37:24,385-[lfw][80000]Accuracy-Flip: 0.99600+-0.00335 Training: 2022-01-14 17:37:24,385-[lfw][80000]Accuracy-Highest: 0.99650 Training: 2022-01-14 17:37:50,057-[cfp_fp][80000]XNorm: 6.385440 Training: 2022-01-14 17:37:50,058-[cfp_fp][80000]Accuracy-Flip: 0.97000+-0.01079 Training: 2022-01-14 17:37:50,059-[cfp_fp][80000]Accuracy-Highest: 0.97000 Training: 2022-01-14 17:38:12,656-[agedb_30][80000]XNorm: 7.265802 Training: 2022-01-14 17:38:12,657-[agedb_30][80000]Accuracy-Flip: 0.96950+-0.00628 Training: 2022-01-14 17:38:12,658-[agedb_30][80000]Accuracy-Highest: 0.96950 Training: 2022-01-14 17:38:14,241-Speed 284.34 samples/sec Loss 3.9450 LearningRate 0.0097 Epoch: 31 Global Step: 80010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:15,812-Speed 13041.71 samples/sec Loss 3.9879 LearningRate 0.0097 Epoch: 31 Global Step: 80020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:17,367-Speed 13179.12 samples/sec Loss 3.9774 LearningRate 0.0097 Epoch: 31 Global Step: 80030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:18,913-Speed 13259.63 samples/sec Loss 3.8998 LearningRate 0.0097 Epoch: 31 Global Step: 80040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:20,489-Speed 12998.83 samples/sec Loss 4.0147 LearningRate 0.0097 Epoch: 31 Global Step: 80050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:22,032-Speed 13275.96 samples/sec Loss 4.0167 LearningRate 0.0096 Epoch: 31 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:23,604-Speed 13041.36 samples/sec Loss 4.0251 LearningRate 0.0096 Epoch: 31 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:25,188-Speed 12936.84 samples/sec Loss 3.9813 LearningRate 0.0096 Epoch: 31 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:26,765-Speed 12987.46 samples/sec Loss 3.8883 LearningRate 0.0096 Epoch: 31 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:28,320-Speed 13187.21 samples/sec Loss 4.0382 LearningRate 0.0096 Epoch: 31 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:29,879-Speed 13143.51 samples/sec Loss 4.0196 LearningRate 0.0096 Epoch: 31 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:31,439-Speed 13128.52 samples/sec Loss 3.9917 LearningRate 0.0096 Epoch: 31 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:32,982-Speed 13283.21 samples/sec Loss 3.9824 LearningRate 0.0096 Epoch: 31 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:34,595-Speed 12702.47 samples/sec Loss 3.9932 LearningRate 0.0096 Epoch: 31 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:36,136-Speed 13295.99 samples/sec Loss 3.9997 LearningRate 0.0096 Epoch: 31 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:37,681-Speed 13271.04 samples/sec Loss 4.0068 LearningRate 0.0096 Epoch: 31 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:39,235-Speed 13178.31 samples/sec Loss 4.0200 LearningRate 0.0095 Epoch: 31 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:40,782-Speed 13251.96 samples/sec Loss 3.9903 LearningRate 0.0095 Epoch: 31 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:42,338-Speed 13169.38 samples/sec Loss 3.9841 LearningRate 0.0095 Epoch: 31 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:38:43,930-Speed 12881.86 samples/sec Loss 3.9918 LearningRate 0.0095 Epoch: 31 Global Step: 80200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:45,503-Speed 13030.94 samples/sec Loss 3.9963 LearningRate 0.0095 Epoch: 31 Global Step: 80210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:47,059-Speed 13171.52 samples/sec Loss 3.9615 LearningRate 0.0095 Epoch: 31 Global Step: 80220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:48,613-Speed 13189.83 samples/sec Loss 4.0472 LearningRate 0.0095 Epoch: 31 Global Step: 80230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:50,171-Speed 13144.48 samples/sec Loss 4.0171 LearningRate 0.0095 Epoch: 31 Global Step: 80240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:51,740-Speed 13068.70 samples/sec Loss 3.9646 LearningRate 0.0095 Epoch: 31 Global Step: 80250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:53,287-Speed 13249.60 samples/sec Loss 4.0507 LearningRate 0.0095 Epoch: 31 Global Step: 80260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:54,849-Speed 13110.63 samples/sec Loss 4.0395 LearningRate 0.0095 Epoch: 31 Global Step: 80270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:56,408-Speed 13150.82 samples/sec Loss 4.0534 LearningRate 0.0094 Epoch: 31 Global Step: 80280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:57,962-Speed 13182.25 samples/sec Loss 4.0450 LearningRate 0.0094 Epoch: 31 Global Step: 80290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:38:59,507-Speed 13262.42 samples/sec Loss 4.0290 LearningRate 0.0094 Epoch: 31 Global Step: 80300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:01,055-Speed 13233.24 samples/sec Loss 4.0229 LearningRate 0.0094 Epoch: 31 Global Step: 80310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:02,624-Speed 13063.74 samples/sec Loss 4.1021 LearningRate 0.0094 Epoch: 31 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:04,166-Speed 13293.48 samples/sec Loss 4.0374 LearningRate 0.0094 Epoch: 31 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:05,711-Speed 13262.39 samples/sec Loss 4.0315 LearningRate 0.0094 Epoch: 31 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:07,264-Speed 13188.48 samples/sec Loss 3.9967 LearningRate 0.0094 Epoch: 31 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:08,814-Speed 13228.14 samples/sec Loss 4.0718 LearningRate 0.0094 Epoch: 31 Global Step: 80360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:10,397-Speed 12939.19 samples/sec Loss 4.0769 LearningRate 0.0094 Epoch: 31 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:11,949-Speed 13204.11 samples/sec Loss 4.0347 LearningRate 0.0094 Epoch: 31 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:13,514-Speed 13099.97 samples/sec Loss 4.0599 LearningRate 0.0093 Epoch: 31 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:15,065-Speed 13204.52 samples/sec Loss 4.0321 LearningRate 0.0093 Epoch: 31 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:16,627-Speed 13116.06 samples/sec Loss 4.0436 LearningRate 0.0093 Epoch: 31 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:18,189-Speed 13121.12 samples/sec Loss 4.0614 LearningRate 0.0093 Epoch: 31 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:19,745-Speed 13168.27 samples/sec Loss 4.0545 LearningRate 0.0093 Epoch: 31 Global Step: 80430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:21,309-Speed 13103.77 samples/sec Loss 4.0113 LearningRate 0.0093 Epoch: 31 Global Step: 80440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:22,881-Speed 13041.09 samples/sec Loss 3.9978 LearningRate 0.0093 Epoch: 31 Global Step: 80450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:24,431-Speed 13217.24 samples/sec Loss 4.0233 LearningRate 0.0093 Epoch: 31 Global Step: 80460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:25,998-Speed 13077.58 samples/sec Loss 3.9615 LearningRate 0.0093 Epoch: 31 Global Step: 80470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:27,562-Speed 13106.00 samples/sec Loss 3.9976 LearningRate 0.0093 Epoch: 31 Global Step: 80480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:29,130-Speed 13064.85 samples/sec Loss 3.9826 LearningRate 0.0093 Epoch: 31 Global Step: 80490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:30,699-Speed 13062.02 samples/sec Loss 4.0174 LearningRate 0.0092 Epoch: 31 Global Step: 80500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:32,247-Speed 13233.89 samples/sec Loss 4.1097 LearningRate 0.0092 Epoch: 31 Global Step: 80510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:33,793-Speed 13258.89 samples/sec Loss 4.0611 LearningRate 0.0092 Epoch: 31 Global Step: 80520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:35,351-Speed 13147.57 samples/sec Loss 4.0813 LearningRate 0.0092 Epoch: 31 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:36,895-Speed 13272.04 samples/sec Loss 4.0985 LearningRate 0.0092 Epoch: 31 Global Step: 80540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:38,474-Speed 12982.17 samples/sec Loss 4.0423 LearningRate 0.0092 Epoch: 31 Global Step: 80550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:40,034-Speed 13130.60 samples/sec Loss 4.0252 LearningRate 0.0092 Epoch: 31 Global Step: 80560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:41,618-Speed 12935.78 samples/sec Loss 4.0081 LearningRate 0.0092 Epoch: 31 Global Step: 80570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:43,162-Speed 13279.02 samples/sec Loss 4.0793 LearningRate 0.0092 Epoch: 31 Global Step: 80580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:44,711-Speed 13232.59 samples/sec Loss 4.0872 LearningRate 0.0092 Epoch: 31 Global Step: 80590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:46,280-Speed 13054.79 samples/sec Loss 4.0302 LearningRate 0.0092 Epoch: 31 Global Step: 80600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:47,843-Speed 13109.44 samples/sec Loss 4.0736 LearningRate 0.0091 Epoch: 31 Global Step: 80610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:49,382-Speed 13315.26 samples/sec Loss 4.1344 LearningRate 0.0091 Epoch: 31 Global Step: 80620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:50,967-Speed 12951.25 samples/sec Loss 4.0248 LearningRate 0.0091 Epoch: 31 Global Step: 80630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:52,529-Speed 13116.04 samples/sec Loss 4.0875 LearningRate 0.0091 Epoch: 31 Global Step: 80640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:54,087-Speed 13150.85 samples/sec Loss 4.1034 LearningRate 0.0091 Epoch: 31 Global Step: 80650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:39:55,616-Speed 13400.71 samples/sec Loss 4.0465 LearningRate 0.0091 Epoch: 31 Global Step: 80660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:57,194-Speed 12985.38 samples/sec Loss 4.0331 LearningRate 0.0091 Epoch: 31 Global Step: 80670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:39:58,762-Speed 13072.33 samples/sec Loss 4.0925 LearningRate 0.0091 Epoch: 31 Global Step: 80680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:00,337-Speed 13007.56 samples/sec Loss 4.0898 LearningRate 0.0091 Epoch: 31 Global Step: 80690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:01,921-Speed 12937.46 samples/sec Loss 4.0547 LearningRate 0.0091 Epoch: 31 Global Step: 80700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:03,458-Speed 13332.34 samples/sec Loss 4.0681 LearningRate 0.0091 Epoch: 31 Global Step: 80710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:05,006-Speed 13236.62 samples/sec Loss 4.1059 LearningRate 0.0090 Epoch: 31 Global Step: 80720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:06,560-Speed 13192.04 samples/sec Loss 4.0645 LearningRate 0.0090 Epoch: 31 Global Step: 80730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:08,149-Speed 12904.12 samples/sec Loss 4.0964 LearningRate 0.0090 Epoch: 31 Global Step: 80740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:09,706-Speed 13165.59 samples/sec Loss 4.1074 LearningRate 0.0090 Epoch: 31 Global Step: 80750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:11,247-Speed 13294.92 samples/sec Loss 4.0833 LearningRate 0.0090 Epoch: 31 Global Step: 80760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:12,795-Speed 13242.52 samples/sec Loss 4.0672 LearningRate 0.0090 Epoch: 31 Global Step: 80770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:14,363-Speed 13059.26 samples/sec Loss 3.9667 LearningRate 0.0090 Epoch: 31 Global Step: 80780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:15,930-Speed 13079.59 samples/sec Loss 4.0779 LearningRate 0.0090 Epoch: 31 Global Step: 80790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:17,479-Speed 13231.89 samples/sec Loss 4.1498 LearningRate 0.0090 Epoch: 31 Global Step: 80800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:19,016-Speed 13334.45 samples/sec Loss 4.0624 LearningRate 0.0090 Epoch: 31 Global Step: 80810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:20,573-Speed 13158.33 samples/sec Loss 4.0773 LearningRate 0.0090 Epoch: 31 Global Step: 80820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:22,130-Speed 13154.76 samples/sec Loss 4.0481 LearningRate 0.0090 Epoch: 31 Global Step: 80830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:23,732-Speed 12944.90 samples/sec Loss 4.0838 LearningRate 0.0089 Epoch: 31 Global Step: 80840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:25,304-Speed 13037.12 samples/sec Loss 4.1045 LearningRate 0.0089 Epoch: 31 Global Step: 80850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:26,896-Speed 12867.25 samples/sec Loss 4.0647 LearningRate 0.0089 Epoch: 31 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:40:28,467-Speed 13046.24 samples/sec Loss 4.1009 LearningRate 0.0089 Epoch: 31 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:40:30,018-Speed 13207.47 samples/sec Loss 4.0423 LearningRate 0.0089 Epoch: 31 Global Step: 80880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:31,578-Speed 13144.06 samples/sec Loss 4.0106 LearningRate 0.0089 Epoch: 31 Global Step: 80890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:33,127-Speed 13228.31 samples/sec Loss 4.1293 LearningRate 0.0089 Epoch: 31 Global Step: 80900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:34,711-Speed 12930.02 samples/sec Loss 4.0628 LearningRate 0.0089 Epoch: 31 Global Step: 80910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:36,274-Speed 13112.15 samples/sec Loss 4.0123 LearningRate 0.0089 Epoch: 31 Global Step: 80920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:53,021-Speed 1223.02 samples/sec Loss 3.9535 LearningRate 0.0089 Epoch: 32 Global Step: 80930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:54,603-Speed 12959.36 samples/sec Loss 3.7128 LearningRate 0.0089 Epoch: 32 Global Step: 80940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:56,186-Speed 12943.79 samples/sec Loss 3.6832 LearningRate 0.0088 Epoch: 32 Global Step: 80950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:57,758-Speed 13027.68 samples/sec Loss 3.6904 LearningRate 0.0088 Epoch: 32 Global Step: 80960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:40:59,337-Speed 12980.79 samples/sec Loss 3.6572 LearningRate 0.0088 Epoch: 32 Global Step: 80970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:00,905-Speed 13066.60 samples/sec Loss 3.6809 LearningRate 0.0088 Epoch: 32 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:02,493-Speed 12901.59 samples/sec Loss 3.5981 LearningRate 0.0088 Epoch: 32 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:04,074-Speed 12977.04 samples/sec Loss 3.5782 LearningRate 0.0088 Epoch: 32 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:05,624-Speed 13222.96 samples/sec Loss 3.6558 LearningRate 0.0088 Epoch: 32 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:07,155-Speed 13386.48 samples/sec Loss 3.5935 LearningRate 0.0088 Epoch: 32 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:08,736-Speed 12961.91 samples/sec Loss 3.6061 LearningRate 0.0088 Epoch: 32 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:10,313-Speed 12988.10 samples/sec Loss 3.5845 LearningRate 0.0088 Epoch: 32 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:11,877-Speed 13100.30 samples/sec Loss 3.6738 LearningRate 0.0088 Epoch: 32 Global Step: 81050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:13,424-Speed 13260.76 samples/sec Loss 3.6637 LearningRate 0.0087 Epoch: 32 Global Step: 81060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:15,011-Speed 12906.10 samples/sec Loss 3.6621 LearningRate 0.0087 Epoch: 32 Global Step: 81070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:16,587-Speed 13000.58 samples/sec Loss 3.7229 LearningRate 0.0087 Epoch: 32 Global Step: 81080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:18,147-Speed 13140.17 samples/sec Loss 3.6829 LearningRate 0.0087 Epoch: 32 Global Step: 81090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:19,712-Speed 13086.94 samples/sec Loss 3.6345 LearningRate 0.0087 Epoch: 32 Global Step: 81100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:21,272-Speed 13137.00 samples/sec Loss 3.6589 LearningRate 0.0087 Epoch: 32 Global Step: 81110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:22,819-Speed 13245.82 samples/sec Loss 3.6340 LearningRate 0.0087 Epoch: 32 Global Step: 81120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:24,395-Speed 13005.86 samples/sec Loss 3.6333 LearningRate 0.0087 Epoch: 32 Global Step: 81130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:25,994-Speed 12807.71 samples/sec Loss 3.7255 LearningRate 0.0087 Epoch: 32 Global Step: 81140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:27,560-Speed 13088.83 samples/sec Loss 3.7086 LearningRate 0.0087 Epoch: 32 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:29,111-Speed 13216.04 samples/sec Loss 3.7277 LearningRate 0.0087 Epoch: 32 Global Step: 81160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:30,698-Speed 12907.96 samples/sec Loss 3.6252 LearningRate 0.0087 Epoch: 32 Global Step: 81170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:32,241-Speed 13276.31 samples/sec Loss 3.8108 LearningRate 0.0086 Epoch: 32 Global Step: 81180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:33,804-Speed 13132.36 samples/sec Loss 3.7601 LearningRate 0.0086 Epoch: 32 Global Step: 81190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:35,384-Speed 12967.56 samples/sec Loss 3.6933 LearningRate 0.0086 Epoch: 32 Global Step: 81200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:36,930-Speed 13249.41 samples/sec Loss 3.6342 LearningRate 0.0086 Epoch: 32 Global Step: 81210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:38,483-Speed 13201.22 samples/sec Loss 3.6338 LearningRate 0.0086 Epoch: 32 Global Step: 81220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:40,061-Speed 12980.26 samples/sec Loss 3.6836 LearningRate 0.0086 Epoch: 32 Global Step: 81230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:41,615-Speed 13193.79 samples/sec Loss 3.8110 LearningRate 0.0086 Epoch: 32 Global Step: 81240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:43,171-Speed 13163.84 samples/sec Loss 3.7160 LearningRate 0.0086 Epoch: 32 Global Step: 81250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:44,740-Speed 13065.30 samples/sec Loss 3.6894 LearningRate 0.0086 Epoch: 32 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:46,302-Speed 13116.84 samples/sec Loss 3.7756 LearningRate 0.0086 Epoch: 32 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:47,906-Speed 12777.59 samples/sec Loss 3.6853 LearningRate 0.0086 Epoch: 32 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:49,472-Speed 13082.05 samples/sec Loss 3.6948 LearningRate 0.0085 Epoch: 32 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:41:51,045-Speed 13028.02 samples/sec Loss 3.7116 LearningRate 0.0085 Epoch: 32 Global Step: 81300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:52,583-Speed 13321.11 samples/sec Loss 3.7289 LearningRate 0.0085 Epoch: 32 Global Step: 81310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:54,134-Speed 13211.81 samples/sec Loss 3.7205 LearningRate 0.0085 Epoch: 32 Global Step: 81320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:55,704-Speed 13054.60 samples/sec Loss 3.7706 LearningRate 0.0085 Epoch: 32 Global Step: 81330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:57,273-Speed 13052.15 samples/sec Loss 3.7786 LearningRate 0.0085 Epoch: 32 Global Step: 81340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:41:58,817-Speed 13277.25 samples/sec Loss 3.8174 LearningRate 0.0085 Epoch: 32 Global Step: 81350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:00,392-Speed 13004.96 samples/sec Loss 3.7316 LearningRate 0.0085 Epoch: 32 Global Step: 81360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:01,971-Speed 12980.65 samples/sec Loss 3.7192 LearningRate 0.0085 Epoch: 32 Global Step: 81370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:03,543-Speed 13038.14 samples/sec Loss 3.7470 LearningRate 0.0085 Epoch: 32 Global Step: 81380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:05,099-Speed 13170.40 samples/sec Loss 3.7386 LearningRate 0.0085 Epoch: 32 Global Step: 81390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:06,661-Speed 13115.71 samples/sec Loss 3.7472 LearningRate 0.0085 Epoch: 32 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:42:08,223-Speed 13120.08 samples/sec Loss 3.7116 LearningRate 0.0084 Epoch: 32 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:42:09,795-Speed 13034.11 samples/sec Loss 3.7076 LearningRate 0.0084 Epoch: 32 Global Step: 81420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:11,355-Speed 13131.08 samples/sec Loss 3.7530 LearningRate 0.0084 Epoch: 32 Global Step: 81430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:12,905-Speed 13226.24 samples/sec Loss 3.7038 LearningRate 0.0084 Epoch: 32 Global Step: 81440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:14,458-Speed 13189.81 samples/sec Loss 3.6770 LearningRate 0.0084 Epoch: 32 Global Step: 81450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:16,031-Speed 13032.46 samples/sec Loss 3.7519 LearningRate 0.0084 Epoch: 32 Global Step: 81460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:17,584-Speed 13193.43 samples/sec Loss 3.8359 LearningRate 0.0084 Epoch: 32 Global Step: 81470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:19,133-Speed 13225.57 samples/sec Loss 3.7706 LearningRate 0.0084 Epoch: 32 Global Step: 81480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:20,685-Speed 13200.04 samples/sec Loss 3.7023 LearningRate 0.0084 Epoch: 32 Global Step: 81490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:22,221-Speed 13339.35 samples/sec Loss 3.7819 LearningRate 0.0084 Epoch: 32 Global Step: 81500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:23,797-Speed 13006.77 samples/sec Loss 3.7975 LearningRate 0.0084 Epoch: 32 Global Step: 81510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:25,373-Speed 12996.71 samples/sec Loss 3.8191 LearningRate 0.0084 Epoch: 32 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:42:26,935-Speed 13122.39 samples/sec Loss 3.7329 LearningRate 0.0083 Epoch: 32 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:42:28,475-Speed 13311.11 samples/sec Loss 3.7793 LearningRate 0.0083 Epoch: 32 Global Step: 81540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:30,044-Speed 13051.04 samples/sec Loss 3.7327 LearningRate 0.0083 Epoch: 32 Global Step: 81550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:31,635-Speed 12886.05 samples/sec Loss 3.6980 LearningRate 0.0083 Epoch: 32 Global Step: 81560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:33,208-Speed 13030.77 samples/sec Loss 3.7458 LearningRate 0.0083 Epoch: 32 Global Step: 81570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:34,754-Speed 13271.33 samples/sec Loss 3.7695 LearningRate 0.0083 Epoch: 32 Global Step: 81580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:36,310-Speed 13162.53 samples/sec Loss 3.7850 LearningRate 0.0083 Epoch: 32 Global Step: 81590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:37,867-Speed 13168.14 samples/sec Loss 3.7875 LearningRate 0.0083 Epoch: 32 Global Step: 81600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:39,436-Speed 13060.38 samples/sec Loss 3.7381 LearningRate 0.0083 Epoch: 32 Global Step: 81610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:41,058-Speed 12631.32 samples/sec Loss 3.8004 LearningRate 0.0083 Epoch: 32 Global Step: 81620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:42,606-Speed 13227.77 samples/sec Loss 3.7916 LearningRate 0.0083 Epoch: 32 Global Step: 81630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:44,175-Speed 13082.38 samples/sec Loss 3.7865 LearningRate 0.0083 Epoch: 32 Global Step: 81640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:45,738-Speed 13104.58 samples/sec Loss 3.7934 LearningRate 0.0082 Epoch: 32 Global Step: 81650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:42:47,316-Speed 12987.34 samples/sec Loss 3.7957 LearningRate 0.0082 Epoch: 32 Global Step: 81660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:48,898-Speed 12957.32 samples/sec Loss 3.7310 LearningRate 0.0082 Epoch: 32 Global Step: 81670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:50,511-Speed 12699.82 samples/sec Loss 3.7722 LearningRate 0.0082 Epoch: 32 Global Step: 81680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:52,078-Speed 13079.93 samples/sec Loss 3.7583 LearningRate 0.0082 Epoch: 32 Global Step: 81690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:53,628-Speed 13222.08 samples/sec Loss 3.8205 LearningRate 0.0082 Epoch: 32 Global Step: 81700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:55,187-Speed 13142.83 samples/sec Loss 3.8425 LearningRate 0.0082 Epoch: 32 Global Step: 81710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:56,776-Speed 12899.62 samples/sec Loss 3.8143 LearningRate 0.0082 Epoch: 32 Global Step: 81720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:58,341-Speed 13090.78 samples/sec Loss 3.7365 LearningRate 0.0082 Epoch: 32 Global Step: 81730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:42:59,899-Speed 13150.33 samples/sec Loss 3.7491 LearningRate 0.0082 Epoch: 32 Global Step: 81740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:01,449-Speed 13219.71 samples/sec Loss 3.8051 LearningRate 0.0082 Epoch: 32 Global Step: 81750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:02,989-Speed 13305.04 samples/sec Loss 3.8243 LearningRate 0.0082 Epoch: 32 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:04,533-Speed 13276.57 samples/sec Loss 3.8105 LearningRate 0.0081 Epoch: 32 Global Step: 81770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:06,104-Speed 13036.39 samples/sec Loss 3.7393 LearningRate 0.0081 Epoch: 32 Global Step: 81780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:07,662-Speed 13153.27 samples/sec Loss 3.7895 LearningRate 0.0081 Epoch: 32 Global Step: 81790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:09,210-Speed 13239.07 samples/sec Loss 3.8270 LearningRate 0.0081 Epoch: 32 Global Step: 81800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:10,772-Speed 13116.31 samples/sec Loss 3.7583 LearningRate 0.0081 Epoch: 32 Global Step: 81810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:12,361-Speed 12896.89 samples/sec Loss 3.8862 LearningRate 0.0081 Epoch: 32 Global Step: 81820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:13,915-Speed 13193.03 samples/sec Loss 3.8375 LearningRate 0.0081 Epoch: 32 Global Step: 81830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:15,491-Speed 12997.89 samples/sec Loss 3.8976 LearningRate 0.0081 Epoch: 32 Global Step: 81840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:17,044-Speed 13199.57 samples/sec Loss 3.7655 LearningRate 0.0081 Epoch: 32 Global Step: 81850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:18,624-Speed 12967.17 samples/sec Loss 3.7787 LearningRate 0.0081 Epoch: 32 Global Step: 81860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:20,204-Speed 12966.05 samples/sec Loss 3.7257 LearningRate 0.0081 Epoch: 32 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:21,760-Speed 13172.95 samples/sec Loss 3.7591 LearningRate 0.0080 Epoch: 32 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:23,330-Speed 13048.22 samples/sec Loss 3.8362 LearningRate 0.0080 Epoch: 32 Global Step: 81890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:24,861-Speed 13389.41 samples/sec Loss 3.7083 LearningRate 0.0080 Epoch: 32 Global Step: 81900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:26,395-Speed 13350.06 samples/sec Loss 3.7100 LearningRate 0.0080 Epoch: 32 Global Step: 81910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:27,947-Speed 13208.91 samples/sec Loss 3.7616 LearningRate 0.0080 Epoch: 32 Global Step: 81920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:29,495-Speed 13235.77 samples/sec Loss 3.8044 LearningRate 0.0080 Epoch: 32 Global Step: 81930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:31,064-Speed 13057.66 samples/sec Loss 3.7370 LearningRate 0.0080 Epoch: 32 Global Step: 81940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:32,635-Speed 13045.34 samples/sec Loss 3.7842 LearningRate 0.0080 Epoch: 32 Global Step: 81950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:34,213-Speed 12985.15 samples/sec Loss 3.7606 LearningRate 0.0080 Epoch: 32 Global Step: 81960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:35,747-Speed 13360.30 samples/sec Loss 3.8304 LearningRate 0.0080 Epoch: 32 Global Step: 81970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:37,287-Speed 13307.08 samples/sec Loss 3.7622 LearningRate 0.0080 Epoch: 32 Global Step: 81980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:38,833-Speed 13253.66 samples/sec Loss 3.8300 LearningRate 0.0080 Epoch: 32 Global Step: 81990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:40,378-Speed 13265.65 samples/sec Loss 3.8779 LearningRate 0.0079 Epoch: 32 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:41,934-Speed 13167.34 samples/sec Loss 3.7863 LearningRate 0.0079 Epoch: 32 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:43,512-Speed 13008.33 samples/sec Loss 3.8355 LearningRate 0.0079 Epoch: 32 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:45,057-Speed 13253.70 samples/sec Loss 3.8005 LearningRate 0.0079 Epoch: 32 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:46,594-Speed 13333.58 samples/sec Loss 3.8196 LearningRate 0.0079 Epoch: 32 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:48,136-Speed 13291.91 samples/sec Loss 3.8077 LearningRate 0.0079 Epoch: 32 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:49,713-Speed 12993.78 samples/sec Loss 3.7896 LearningRate 0.0079 Epoch: 32 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:51,279-Speed 13084.29 samples/sec Loss 3.8690 LearningRate 0.0079 Epoch: 32 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:52,857-Speed 12988.27 samples/sec Loss 3.8415 LearningRate 0.0079 Epoch: 32 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:43:54,405-Speed 13236.21 samples/sec Loss 3.7889 LearningRate 0.0079 Epoch: 32 Global Step: 82090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:55,956-Speed 13205.61 samples/sec Loss 3.9073 LearningRate 0.0079 Epoch: 32 Global Step: 82100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:57,510-Speed 13191.60 samples/sec Loss 3.8549 LearningRate 0.0079 Epoch: 32 Global Step: 82110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:43:59,076-Speed 13089.50 samples/sec Loss 3.9013 LearningRate 0.0079 Epoch: 32 Global Step: 82120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:00,631-Speed 13177.98 samples/sec Loss 3.8632 LearningRate 0.0078 Epoch: 32 Global Step: 82130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:02,188-Speed 13154.79 samples/sec Loss 3.8965 LearningRate 0.0078 Epoch: 32 Global Step: 82140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:03,759-Speed 13044.94 samples/sec Loss 3.8293 LearningRate 0.0078 Epoch: 32 Global Step: 82150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:05,311-Speed 13202.96 samples/sec Loss 3.8163 LearningRate 0.0078 Epoch: 32 Global Step: 82160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:06,870-Speed 13143.69 samples/sec Loss 3.7966 LearningRate 0.0078 Epoch: 32 Global Step: 82170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:08,409-Speed 13318.23 samples/sec Loss 3.7958 LearningRate 0.0078 Epoch: 32 Global Step: 82180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:09,964-Speed 13180.50 samples/sec Loss 3.8168 LearningRate 0.0078 Epoch: 32 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:11,505-Speed 13298.38 samples/sec Loss 3.7565 LearningRate 0.0078 Epoch: 32 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:13,056-Speed 13212.96 samples/sec Loss 3.8924 LearningRate 0.0078 Epoch: 32 Global Step: 82210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:14,625-Speed 13059.00 samples/sec Loss 3.7994 LearningRate 0.0078 Epoch: 32 Global Step: 82220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:16,199-Speed 13019.14 samples/sec Loss 3.8848 LearningRate 0.0078 Epoch: 32 Global Step: 82230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:17,739-Speed 13298.73 samples/sec Loss 3.8498 LearningRate 0.0078 Epoch: 32 Global Step: 82240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:19,292-Speed 13200.02 samples/sec Loss 3.7949 LearningRate 0.0077 Epoch: 32 Global Step: 82250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:20,832-Speed 13308.69 samples/sec Loss 3.8707 LearningRate 0.0077 Epoch: 32 Global Step: 82260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:22,392-Speed 13133.43 samples/sec Loss 3.8579 LearningRate 0.0077 Epoch: 32 Global Step: 82270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:23,934-Speed 13295.52 samples/sec Loss 3.8924 LearningRate 0.0077 Epoch: 32 Global Step: 82280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:25,496-Speed 13115.04 samples/sec Loss 3.7789 LearningRate 0.0077 Epoch: 32 Global Step: 82290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:27,060-Speed 13103.37 samples/sec Loss 3.8101 LearningRate 0.0077 Epoch: 32 Global Step: 82300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:28,612-Speed 13204.10 samples/sec Loss 3.9341 LearningRate 0.0077 Epoch: 32 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:30,171-Speed 13137.48 samples/sec Loss 3.8232 LearningRate 0.0077 Epoch: 32 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:31,729-Speed 13166.30 samples/sec Loss 3.9035 LearningRate 0.0077 Epoch: 32 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:33,308-Speed 13004.82 samples/sec Loss 3.8714 LearningRate 0.0077 Epoch: 32 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:34,869-Speed 13128.38 samples/sec Loss 3.8426 LearningRate 0.0077 Epoch: 32 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:36,427-Speed 13146.56 samples/sec Loss 3.8796 LearningRate 0.0077 Epoch: 32 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:37,980-Speed 13195.27 samples/sec Loss 3.8131 LearningRate 0.0076 Epoch: 32 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:39,555-Speed 13009.06 samples/sec Loss 3.8718 LearningRate 0.0076 Epoch: 32 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:41,119-Speed 13105.72 samples/sec Loss 3.8997 LearningRate 0.0076 Epoch: 32 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:42,684-Speed 13092.01 samples/sec Loss 3.9266 LearningRate 0.0076 Epoch: 32 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:44,234-Speed 13223.23 samples/sec Loss 3.8096 LearningRate 0.0076 Epoch: 32 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:45,809-Speed 13005.84 samples/sec Loss 3.9150 LearningRate 0.0076 Epoch: 32 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:47,366-Speed 13157.86 samples/sec Loss 3.8548 LearningRate 0.0076 Epoch: 32 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:48,911-Speed 13268.59 samples/sec Loss 3.8836 LearningRate 0.0076 Epoch: 32 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:50,459-Speed 13239.35 samples/sec Loss 3.8947 LearningRate 0.0076 Epoch: 32 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:52,005-Speed 13248.46 samples/sec Loss 3.7797 LearningRate 0.0076 Epoch: 32 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:44:53,545-Speed 13332.42 samples/sec Loss 3.7867 LearningRate 0.0076 Epoch: 32 Global Step: 82470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:55,119-Speed 13013.50 samples/sec Loss 3.8571 LearningRate 0.0076 Epoch: 32 Global Step: 82480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:56,683-Speed 13097.80 samples/sec Loss 3.8769 LearningRate 0.0075 Epoch: 32 Global Step: 82490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:58,273-Speed 12895.99 samples/sec Loss 3.8430 LearningRate 0.0075 Epoch: 32 Global Step: 82500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:44:59,847-Speed 13010.94 samples/sec Loss 3.9284 LearningRate 0.0075 Epoch: 32 Global Step: 82510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:01,400-Speed 13194.89 samples/sec Loss 3.8971 LearningRate 0.0075 Epoch: 32 Global Step: 82520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:02,960-Speed 13141.35 samples/sec Loss 3.9078 LearningRate 0.0075 Epoch: 32 Global Step: 82530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:04,520-Speed 13133.71 samples/sec Loss 3.8983 LearningRate 0.0075 Epoch: 32 Global Step: 82540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:06,100-Speed 12967.29 samples/sec Loss 3.8190 LearningRate 0.0075 Epoch: 32 Global Step: 82550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:07,630-Speed 13399.25 samples/sec Loss 3.9363 LearningRate 0.0075 Epoch: 32 Global Step: 82560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:09,193-Speed 13105.24 samples/sec Loss 3.8938 LearningRate 0.0075 Epoch: 32 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:45:10,769-Speed 12999.22 samples/sec Loss 3.8990 LearningRate 0.0075 Epoch: 32 Global Step: 82580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:12,324-Speed 13177.92 samples/sec Loss 3.8776 LearningRate 0.0075 Epoch: 32 Global Step: 82590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:13,887-Speed 13113.17 samples/sec Loss 3.9107 LearningRate 0.0075 Epoch: 32 Global Step: 82600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:15,462-Speed 13011.49 samples/sec Loss 3.8872 LearningRate 0.0075 Epoch: 32 Global Step: 82610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:17,023-Speed 13119.38 samples/sec Loss 3.8462 LearningRate 0.0074 Epoch: 32 Global Step: 82620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:18,562-Speed 13321.45 samples/sec Loss 3.8986 LearningRate 0.0074 Epoch: 32 Global Step: 82630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:20,128-Speed 13080.26 samples/sec Loss 3.8971 LearningRate 0.0074 Epoch: 32 Global Step: 82640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:21,695-Speed 13081.29 samples/sec Loss 3.9074 LearningRate 0.0074 Epoch: 32 Global Step: 82650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:23,244-Speed 13227.34 samples/sec Loss 3.9427 LearningRate 0.0074 Epoch: 32 Global Step: 82660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:24,787-Speed 13283.99 samples/sec Loss 3.8655 LearningRate 0.0074 Epoch: 32 Global Step: 82670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:26,359-Speed 13032.28 samples/sec Loss 3.8504 LearningRate 0.0074 Epoch: 32 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:45:27,931-Speed 13043.86 samples/sec Loss 3.8563 LearningRate 0.0074 Epoch: 32 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:45:29,478-Speed 13237.41 samples/sec Loss 3.8596 LearningRate 0.0074 Epoch: 32 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:45:31,042-Speed 13106.51 samples/sec Loss 3.8941 LearningRate 0.0074 Epoch: 32 Global Step: 82710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:32,606-Speed 13092.92 samples/sec Loss 3.8335 LearningRate 0.0074 Epoch: 32 Global Step: 82720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:34,173-Speed 13095.22 samples/sec Loss 3.8709 LearningRate 0.0074 Epoch: 32 Global Step: 82730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:35,713-Speed 13301.57 samples/sec Loss 3.9265 LearningRate 0.0073 Epoch: 32 Global Step: 82740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:37,291-Speed 12986.39 samples/sec Loss 3.8732 LearningRate 0.0073 Epoch: 32 Global Step: 82750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:38,842-Speed 13213.70 samples/sec Loss 3.9308 LearningRate 0.0073 Epoch: 32 Global Step: 82760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:40,418-Speed 13001.43 samples/sec Loss 3.8332 LearningRate 0.0073 Epoch: 32 Global Step: 82770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:41,976-Speed 13151.44 samples/sec Loss 3.8854 LearningRate 0.0073 Epoch: 32 Global Step: 82780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:43,576-Speed 12808.75 samples/sec Loss 3.9790 LearningRate 0.0073 Epoch: 32 Global Step: 82790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:45,125-Speed 13223.61 samples/sec Loss 3.9316 LearningRate 0.0073 Epoch: 32 Global Step: 82800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:46,692-Speed 13082.27 samples/sec Loss 3.8638 LearningRate 0.0073 Epoch: 32 Global Step: 82810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:48,253-Speed 13127.04 samples/sec Loss 3.8257 LearningRate 0.0073 Epoch: 32 Global Step: 82820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:49,792-Speed 13310.17 samples/sec Loss 3.9177 LearningRate 0.0073 Epoch: 32 Global Step: 82830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:51,339-Speed 13248.99 samples/sec Loss 3.8690 LearningRate 0.0073 Epoch: 32 Global Step: 82840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:52,886-Speed 13250.00 samples/sec Loss 3.8913 LearningRate 0.0073 Epoch: 32 Global Step: 82850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:54,439-Speed 13196.30 samples/sec Loss 3.9947 LearningRate 0.0073 Epoch: 32 Global Step: 82860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:55,989-Speed 13212.64 samples/sec Loss 3.8955 LearningRate 0.0072 Epoch: 32 Global Step: 82870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:45:57,547-Speed 13153.29 samples/sec Loss 3.8604 LearningRate 0.0072 Epoch: 32 Global Step: 82880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:45:59,073-Speed 13429.27 samples/sec Loss 3.9168 LearningRate 0.0072 Epoch: 32 Global Step: 82890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:00,616-Speed 13281.55 samples/sec Loss 3.9460 LearningRate 0.0072 Epoch: 32 Global Step: 82900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:02,166-Speed 13219.84 samples/sec Loss 3.8378 LearningRate 0.0072 Epoch: 32 Global Step: 82910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:03,719-Speed 13196.49 samples/sec Loss 3.9035 LearningRate 0.0072 Epoch: 32 Global Step: 82920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:05,295-Speed 13002.88 samples/sec Loss 3.9275 LearningRate 0.0072 Epoch: 32 Global Step: 82930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:06,888-Speed 12857.97 samples/sec Loss 3.8189 LearningRate 0.0072 Epoch: 32 Global Step: 82940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:08,430-Speed 13312.52 samples/sec Loss 3.9215 LearningRate 0.0072 Epoch: 32 Global Step: 82950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:09,995-Speed 13095.64 samples/sec Loss 3.8789 LearningRate 0.0072 Epoch: 32 Global Step: 82960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:11,556-Speed 13130.40 samples/sec Loss 3.8906 LearningRate 0.0072 Epoch: 32 Global Step: 82970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:13,118-Speed 13118.75 samples/sec Loss 3.8439 LearningRate 0.0072 Epoch: 32 Global Step: 82980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:14,693-Speed 13009.64 samples/sec Loss 3.9129 LearningRate 0.0071 Epoch: 32 Global Step: 82990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:46:16,259-Speed 13088.38 samples/sec Loss 3.9158 LearningRate 0.0071 Epoch: 32 Global Step: 83000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:17,816-Speed 13161.99 samples/sec Loss 3.9230 LearningRate 0.0071 Epoch: 32 Global Step: 83010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:19,375-Speed 13144.30 samples/sec Loss 3.9175 LearningRate 0.0071 Epoch: 32 Global Step: 83020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:20,921-Speed 13252.01 samples/sec Loss 3.9929 LearningRate 0.0071 Epoch: 32 Global Step: 83030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:22,472-Speed 13214.94 samples/sec Loss 3.9305 LearningRate 0.0071 Epoch: 32 Global Step: 83040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:24,027-Speed 13180.29 samples/sec Loss 3.9444 LearningRate 0.0071 Epoch: 32 Global Step: 83050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:25,584-Speed 13158.25 samples/sec Loss 3.9677 LearningRate 0.0071 Epoch: 32 Global Step: 83060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:27,135-Speed 13212.09 samples/sec Loss 3.9507 LearningRate 0.0071 Epoch: 32 Global Step: 83070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:28,696-Speed 13124.48 samples/sec Loss 3.8758 LearningRate 0.0071 Epoch: 32 Global Step: 83080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:30,254-Speed 13157.29 samples/sec Loss 3.9174 LearningRate 0.0071 Epoch: 32 Global Step: 83090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:31,797-Speed 13278.96 samples/sec Loss 3.9547 LearningRate 0.0071 Epoch: 32 Global Step: 83100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:33,346-Speed 13236.68 samples/sec Loss 3.8871 LearningRate 0.0071 Epoch: 32 Global Step: 83110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:34,911-Speed 13092.50 samples/sec Loss 3.8892 LearningRate 0.0070 Epoch: 32 Global Step: 83120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:36,460-Speed 13223.20 samples/sec Loss 4.0014 LearningRate 0.0070 Epoch: 32 Global Step: 83130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:38,022-Speed 13124.10 samples/sec Loss 3.9921 LearningRate 0.0070 Epoch: 32 Global Step: 83140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:39,554-Speed 13372.84 samples/sec Loss 3.8967 LearningRate 0.0070 Epoch: 32 Global Step: 83150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:41,103-Speed 13227.20 samples/sec Loss 3.9036 LearningRate 0.0070 Epoch: 32 Global Step: 83160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:42,690-Speed 12913.64 samples/sec Loss 3.9142 LearningRate 0.0070 Epoch: 32 Global Step: 83170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:44,243-Speed 13196.32 samples/sec Loss 3.9500 LearningRate 0.0070 Epoch: 32 Global Step: 83180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:45,780-Speed 13328.71 samples/sec Loss 3.8993 LearningRate 0.0070 Epoch: 32 Global Step: 83190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:47,377-Speed 12829.34 samples/sec Loss 3.8697 LearningRate 0.0070 Epoch: 32 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:46:48,964-Speed 12914.55 samples/sec Loss 3.9222 LearningRate 0.0070 Epoch: 32 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:46:50,547-Speed 12943.07 samples/sec Loss 3.9187 LearningRate 0.0070 Epoch: 32 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:46:52,107-Speed 13134.58 samples/sec Loss 3.9376 LearningRate 0.0070 Epoch: 32 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:46:53,673-Speed 13087.93 samples/sec Loss 3.8954 LearningRate 0.0070 Epoch: 32 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:46:55,244-Speed 13045.54 samples/sec Loss 3.9344 LearningRate 0.0069 Epoch: 32 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:46:56,785-Speed 13297.19 samples/sec Loss 3.9133 LearningRate 0.0069 Epoch: 32 Global Step: 83260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:58,345-Speed 13134.20 samples/sec Loss 3.8778 LearningRate 0.0069 Epoch: 32 Global Step: 83270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:46:59,903-Speed 13149.15 samples/sec Loss 3.8735 LearningRate 0.0069 Epoch: 32 Global Step: 83280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:01,450-Speed 13248.89 samples/sec Loss 3.9319 LearningRate 0.0069 Epoch: 32 Global Step: 83290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:02,997-Speed 13243.22 samples/sec Loss 3.8796 LearningRate 0.0069 Epoch: 32 Global Step: 83300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:04,552-Speed 13178.72 samples/sec Loss 3.9277 LearningRate 0.0069 Epoch: 32 Global Step: 83310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:06,094-Speed 13285.71 samples/sec Loss 4.0082 LearningRate 0.0069 Epoch: 32 Global Step: 83320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:07,646-Speed 13202.50 samples/sec Loss 3.9314 LearningRate 0.0069 Epoch: 32 Global Step: 83330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:09,197-Speed 13210.56 samples/sec Loss 3.8608 LearningRate 0.0069 Epoch: 32 Global Step: 83340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:10,749-Speed 13201.37 samples/sec Loss 3.9426 LearningRate 0.0069 Epoch: 32 Global Step: 83350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:12,310-Speed 13132.08 samples/sec Loss 3.9262 LearningRate 0.0069 Epoch: 32 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:13,883-Speed 13048.84 samples/sec Loss 3.8832 LearningRate 0.0069 Epoch: 32 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:15,417-Speed 13362.64 samples/sec Loss 4.0088 LearningRate 0.0068 Epoch: 32 Global Step: 83380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:16,959-Speed 13285.50 samples/sec Loss 3.9023 LearningRate 0.0068 Epoch: 32 Global Step: 83390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:18,507-Speed 13239.52 samples/sec Loss 4.0323 LearningRate 0.0068 Epoch: 32 Global Step: 83400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:20,046-Speed 13315.03 samples/sec Loss 3.9737 LearningRate 0.0068 Epoch: 32 Global Step: 83410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:21,591-Speed 13258.09 samples/sec Loss 3.9139 LearningRate 0.0068 Epoch: 32 Global Step: 83420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:23,145-Speed 13189.75 samples/sec Loss 3.9610 LearningRate 0.0068 Epoch: 32 Global Step: 83430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:24,699-Speed 13185.19 samples/sec Loss 3.9413 LearningRate 0.0068 Epoch: 32 Global Step: 83440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:26,321-Speed 12633.70 samples/sec Loss 3.8995 LearningRate 0.0068 Epoch: 32 Global Step: 83450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:41,037-Speed 1391.86 samples/sec Loss 3.7878 LearningRate 0.0068 Epoch: 33 Global Step: 83460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:42,642-Speed 12770.35 samples/sec Loss 3.5122 LearningRate 0.0068 Epoch: 33 Global Step: 83470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:47:44,225-Speed 12944.98 samples/sec Loss 3.5034 LearningRate 0.0068 Epoch: 33 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:45,799-Speed 13014.36 samples/sec Loss 3.5648 LearningRate 0.0068 Epoch: 33 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:47,373-Speed 13021.47 samples/sec Loss 3.5475 LearningRate 0.0068 Epoch: 33 Global Step: 83500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:48,950-Speed 12993.55 samples/sec Loss 3.6002 LearningRate 0.0067 Epoch: 33 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:50,531-Speed 12962.23 samples/sec Loss 3.5229 LearningRate 0.0067 Epoch: 33 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:52,102-Speed 13047.79 samples/sec Loss 3.5964 LearningRate 0.0067 Epoch: 33 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:53,690-Speed 12924.14 samples/sec Loss 3.5678 LearningRate 0.0067 Epoch: 33 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:55,237-Speed 13248.16 samples/sec Loss 3.6047 LearningRate 0.0067 Epoch: 33 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:56,786-Speed 13221.92 samples/sec Loss 3.5850 LearningRate 0.0067 Epoch: 33 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:58,356-Speed 13053.13 samples/sec Loss 3.5359 LearningRate 0.0067 Epoch: 33 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:47:59,884-Speed 13413.79 samples/sec Loss 3.6129 LearningRate 0.0067 Epoch: 33 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:01,442-Speed 13147.57 samples/sec Loss 3.5911 LearningRate 0.0067 Epoch: 33 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:02,989-Speed 13246.89 samples/sec Loss 3.5997 LearningRate 0.0067 Epoch: 33 Global Step: 83600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:04,536-Speed 13249.94 samples/sec Loss 3.5964 LearningRate 0.0067 Epoch: 33 Global Step: 83610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:06,126-Speed 12882.18 samples/sec Loss 3.6464 LearningRate 0.0067 Epoch: 33 Global Step: 83620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:07,685-Speed 13143.62 samples/sec Loss 3.6128 LearningRate 0.0067 Epoch: 33 Global Step: 83630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:09,261-Speed 13016.29 samples/sec Loss 3.6432 LearningRate 0.0066 Epoch: 33 Global Step: 83640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:10,823-Speed 13124.07 samples/sec Loss 3.6129 LearningRate 0.0066 Epoch: 33 Global Step: 83650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:12,386-Speed 13107.05 samples/sec Loss 3.5747 LearningRate 0.0066 Epoch: 33 Global Step: 83660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:13,947-Speed 13130.28 samples/sec Loss 3.6587 LearningRate 0.0066 Epoch: 33 Global Step: 83670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:15,513-Speed 13088.69 samples/sec Loss 3.5418 LearningRate 0.0066 Epoch: 33 Global Step: 83680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:17,066-Speed 13190.31 samples/sec Loss 3.6217 LearningRate 0.0066 Epoch: 33 Global Step: 83690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:18,634-Speed 13073.89 samples/sec Loss 3.6531 LearningRate 0.0066 Epoch: 33 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:20,199-Speed 13093.69 samples/sec Loss 3.6180 LearningRate 0.0066 Epoch: 33 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:21,787-Speed 12902.42 samples/sec Loss 3.5775 LearningRate 0.0066 Epoch: 33 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:23,343-Speed 13163.76 samples/sec Loss 3.6432 LearningRate 0.0066 Epoch: 33 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:24,927-Speed 12936.95 samples/sec Loss 3.6258 LearningRate 0.0066 Epoch: 33 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:26,483-Speed 13167.31 samples/sec Loss 3.5653 LearningRate 0.0066 Epoch: 33 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:28,056-Speed 13023.31 samples/sec Loss 3.5885 LearningRate 0.0066 Epoch: 33 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:29,608-Speed 13209.25 samples/sec Loss 3.6439 LearningRate 0.0065 Epoch: 33 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:31,172-Speed 13107.92 samples/sec Loss 3.5745 LearningRate 0.0065 Epoch: 33 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:32,736-Speed 13094.89 samples/sec Loss 3.5826 LearningRate 0.0065 Epoch: 33 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:34,288-Speed 13205.94 samples/sec Loss 3.6094 LearningRate 0.0065 Epoch: 33 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:35,850-Speed 13122.82 samples/sec Loss 3.6729 LearningRate 0.0065 Epoch: 33 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:37,408-Speed 13150.05 samples/sec Loss 3.5930 LearningRate 0.0065 Epoch: 33 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:38,964-Speed 13171.53 samples/sec Loss 3.6010 LearningRate 0.0065 Epoch: 33 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:40,534-Speed 13048.50 samples/sec Loss 3.5938 LearningRate 0.0065 Epoch: 33 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:42,082-Speed 13234.26 samples/sec Loss 3.5873 LearningRate 0.0065 Epoch: 33 Global Step: 83850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:43,630-Speed 13239.08 samples/sec Loss 3.7173 LearningRate 0.0065 Epoch: 33 Global Step: 83860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:45,211-Speed 12977.01 samples/sec Loss 3.6113 LearningRate 0.0065 Epoch: 33 Global Step: 83870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:46,763-Speed 13204.29 samples/sec Loss 3.6347 LearningRate 0.0065 Epoch: 33 Global Step: 83880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:48,314-Speed 13211.29 samples/sec Loss 3.6971 LearningRate 0.0065 Epoch: 33 Global Step: 83890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:49,969-Speed 12378.93 samples/sec Loss 3.5355 LearningRate 0.0065 Epoch: 33 Global Step: 83900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:51,532-Speed 13116.20 samples/sec Loss 3.7291 LearningRate 0.0064 Epoch: 33 Global Step: 83910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:53,080-Speed 13228.99 samples/sec Loss 3.5501 LearningRate 0.0064 Epoch: 33 Global Step: 83920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:54,656-Speed 13005.72 samples/sec Loss 3.5916 LearningRate 0.0064 Epoch: 33 Global Step: 83930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:56,237-Speed 12960.50 samples/sec Loss 3.6749 LearningRate 0.0064 Epoch: 33 Global Step: 83940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:48:57,788-Speed 13206.19 samples/sec Loss 3.6491 LearningRate 0.0064 Epoch: 33 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:48:59,353-Speed 13097.96 samples/sec Loss 3.5540 LearningRate 0.0064 Epoch: 33 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:00,937-Speed 12941.98 samples/sec Loss 3.7098 LearningRate 0.0064 Epoch: 33 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:02,496-Speed 13136.33 samples/sec Loss 3.6973 LearningRate 0.0064 Epoch: 33 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:04,044-Speed 13244.87 samples/sec Loss 3.6561 LearningRate 0.0064 Epoch: 33 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:05,607-Speed 13111.05 samples/sec Loss 3.6372 LearningRate 0.0064 Epoch: 33 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:07,169-Speed 13114.01 samples/sec Loss 3.7544 LearningRate 0.0064 Epoch: 33 Global Step: 84010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:08,740-Speed 13045.69 samples/sec Loss 3.7350 LearningRate 0.0064 Epoch: 33 Global Step: 84020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:10,276-Speed 13337.33 samples/sec Loss 3.6656 LearningRate 0.0064 Epoch: 33 Global Step: 84030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:11,843-Speed 13073.91 samples/sec Loss 3.6953 LearningRate 0.0063 Epoch: 33 Global Step: 84040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:13,411-Speed 13073.56 samples/sec Loss 3.6729 LearningRate 0.0063 Epoch: 33 Global Step: 84050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:14,974-Speed 13112.25 samples/sec Loss 3.7142 LearningRate 0.0063 Epoch: 33 Global Step: 84060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:16,525-Speed 13210.15 samples/sec Loss 3.7513 LearningRate 0.0063 Epoch: 33 Global Step: 84070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:18,073-Speed 13236.43 samples/sec Loss 3.6362 LearningRate 0.0063 Epoch: 33 Global Step: 84080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:19,644-Speed 13047.24 samples/sec Loss 3.6832 LearningRate 0.0063 Epoch: 33 Global Step: 84090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:21,205-Speed 13126.25 samples/sec Loss 3.6089 LearningRate 0.0063 Epoch: 33 Global Step: 84100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:22,771-Speed 13082.61 samples/sec Loss 3.6659 LearningRate 0.0063 Epoch: 33 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:24,324-Speed 13195.60 samples/sec Loss 3.6986 LearningRate 0.0063 Epoch: 33 Global Step: 84120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:25,874-Speed 13220.31 samples/sec Loss 3.6434 LearningRate 0.0063 Epoch: 33 Global Step: 84130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:27,433-Speed 13140.60 samples/sec Loss 3.6295 LearningRate 0.0063 Epoch: 33 Global Step: 84140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:29,011-Speed 12988.54 samples/sec Loss 3.6953 LearningRate 0.0063 Epoch: 33 Global Step: 84150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:30,571-Speed 13134.67 samples/sec Loss 3.6413 LearningRate 0.0063 Epoch: 33 Global Step: 84160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:32,138-Speed 13075.79 samples/sec Loss 3.6799 LearningRate 0.0063 Epoch: 33 Global Step: 84170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:33,731-Speed 12864.12 samples/sec Loss 3.7809 LearningRate 0.0062 Epoch: 33 Global Step: 84180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:35,294-Speed 13110.62 samples/sec Loss 3.6951 LearningRate 0.0062 Epoch: 33 Global Step: 84190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:36,861-Speed 13083.37 samples/sec Loss 3.6958 LearningRate 0.0062 Epoch: 33 Global Step: 84200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:38,418-Speed 13162.92 samples/sec Loss 3.7066 LearningRate 0.0062 Epoch: 33 Global Step: 84210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:39,991-Speed 13022.83 samples/sec Loss 3.6253 LearningRate 0.0062 Epoch: 33 Global Step: 84220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:41,581-Speed 12885.94 samples/sec Loss 3.6881 LearningRate 0.0062 Epoch: 33 Global Step: 84230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:49:43,156-Speed 13010.80 samples/sec Loss 3.6096 LearningRate 0.0062 Epoch: 33 Global Step: 84240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:44,708-Speed 13204.14 samples/sec Loss 3.6464 LearningRate 0.0062 Epoch: 33 Global Step: 84250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:46,259-Speed 13214.61 samples/sec Loss 3.8214 LearningRate 0.0062 Epoch: 33 Global Step: 84260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:47,842-Speed 12937.54 samples/sec Loss 3.6221 LearningRate 0.0062 Epoch: 33 Global Step: 84270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:49,419-Speed 13001.91 samples/sec Loss 3.6732 LearningRate 0.0062 Epoch: 33 Global Step: 84280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:51,024-Speed 12763.98 samples/sec Loss 3.6621 LearningRate 0.0062 Epoch: 33 Global Step: 84290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:52,594-Speed 13050.40 samples/sec Loss 3.6540 LearningRate 0.0062 Epoch: 33 Global Step: 84300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:54,155-Speed 13124.48 samples/sec Loss 3.6291 LearningRate 0.0061 Epoch: 33 Global Step: 84310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:55,712-Speed 13159.78 samples/sec Loss 3.8034 LearningRate 0.0061 Epoch: 33 Global Step: 84320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:57,280-Speed 13069.42 samples/sec Loss 3.7389 LearningRate 0.0061 Epoch: 33 Global Step: 84330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:49:58,846-Speed 13090.14 samples/sec Loss 3.6912 LearningRate 0.0061 Epoch: 33 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:00,415-Speed 13058.88 samples/sec Loss 3.6284 LearningRate 0.0061 Epoch: 33 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:01,967-Speed 13199.55 samples/sec Loss 3.7067 LearningRate 0.0061 Epoch: 33 Global Step: 84360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:03,540-Speed 13032.85 samples/sec Loss 3.6220 LearningRate 0.0061 Epoch: 33 Global Step: 84370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:05,120-Speed 12969.11 samples/sec Loss 3.6817 LearningRate 0.0061 Epoch: 33 Global Step: 84380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:06,665-Speed 13255.10 samples/sec Loss 3.6582 LearningRate 0.0061 Epoch: 33 Global Step: 84390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:08,225-Speed 13135.27 samples/sec Loss 3.7366 LearningRate 0.0061 Epoch: 33 Global Step: 84400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:09,818-Speed 12886.25 samples/sec Loss 3.6873 LearningRate 0.0061 Epoch: 33 Global Step: 84410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:11,364-Speed 13252.02 samples/sec Loss 3.8049 LearningRate 0.0061 Epoch: 33 Global Step: 84420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:12,943-Speed 12975.88 samples/sec Loss 3.6634 LearningRate 0.0061 Epoch: 33 Global Step: 84430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:14,518-Speed 13016.44 samples/sec Loss 3.6872 LearningRate 0.0061 Epoch: 33 Global Step: 84440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:16,086-Speed 13060.93 samples/sec Loss 3.7092 LearningRate 0.0060 Epoch: 33 Global Step: 84450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:17,642-Speed 13173.62 samples/sec Loss 3.7469 LearningRate 0.0060 Epoch: 33 Global Step: 84460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:19,240-Speed 12848.64 samples/sec Loss 3.6912 LearningRate 0.0060 Epoch: 33 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:20,799-Speed 13144.21 samples/sec Loss 3.7346 LearningRate 0.0060 Epoch: 33 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:22,357-Speed 13149.31 samples/sec Loss 3.6693 LearningRate 0.0060 Epoch: 33 Global Step: 84490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:23,931-Speed 13017.52 samples/sec Loss 3.7418 LearningRate 0.0060 Epoch: 33 Global Step: 84500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:25,501-Speed 13052.47 samples/sec Loss 3.6770 LearningRate 0.0060 Epoch: 33 Global Step: 84510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:27,067-Speed 13087.45 samples/sec Loss 3.7674 LearningRate 0.0060 Epoch: 33 Global Step: 84520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:28,615-Speed 13239.04 samples/sec Loss 3.6596 LearningRate 0.0060 Epoch: 33 Global Step: 84530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:30,191-Speed 13000.40 samples/sec Loss 3.6983 LearningRate 0.0060 Epoch: 33 Global Step: 84540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:31,761-Speed 13047.05 samples/sec Loss 3.6917 LearningRate 0.0060 Epoch: 33 Global Step: 84550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:33,345-Speed 12938.21 samples/sec Loss 3.7043 LearningRate 0.0060 Epoch: 33 Global Step: 84560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:34,947-Speed 12796.22 samples/sec Loss 3.7199 LearningRate 0.0060 Epoch: 33 Global Step: 84570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:36,480-Speed 13361.86 samples/sec Loss 3.6234 LearningRate 0.0060 Epoch: 33 Global Step: 84580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:38,033-Speed 13200.06 samples/sec Loss 3.7769 LearningRate 0.0059 Epoch: 33 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:39,614-Speed 12956.74 samples/sec Loss 3.7671 LearningRate 0.0059 Epoch: 33 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:41,183-Speed 13064.84 samples/sec Loss 3.7131 LearningRate 0.0059 Epoch: 33 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:42,724-Speed 13319.49 samples/sec Loss 3.7428 LearningRate 0.0059 Epoch: 33 Global Step: 84620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:44,287-Speed 13116.92 samples/sec Loss 3.7314 LearningRate 0.0059 Epoch: 33 Global Step: 84630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:45,859-Speed 13031.85 samples/sec Loss 3.7725 LearningRate 0.0059 Epoch: 33 Global Step: 84640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:47,418-Speed 13141.49 samples/sec Loss 3.7852 LearningRate 0.0059 Epoch: 33 Global Step: 84650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:48,991-Speed 13026.85 samples/sec Loss 3.7857 LearningRate 0.0059 Epoch: 33 Global Step: 84660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:50,569-Speed 12985.97 samples/sec Loss 3.7476 LearningRate 0.0059 Epoch: 33 Global Step: 84670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:52,143-Speed 13024.75 samples/sec Loss 3.7756 LearningRate 0.0059 Epoch: 33 Global Step: 84680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:53,699-Speed 13170.70 samples/sec Loss 3.7161 LearningRate 0.0059 Epoch: 33 Global Step: 84690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:55,261-Speed 13121.43 samples/sec Loss 3.7953 LearningRate 0.0059 Epoch: 33 Global Step: 84700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:56,828-Speed 13070.37 samples/sec Loss 3.7657 LearningRate 0.0059 Epoch: 33 Global Step: 84710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:50:58,377-Speed 13235.14 samples/sec Loss 3.7452 LearningRate 0.0059 Epoch: 33 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:50:59,950-Speed 13024.86 samples/sec Loss 3.7696 LearningRate 0.0058 Epoch: 33 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:01,512-Speed 13122.59 samples/sec Loss 3.7073 LearningRate 0.0058 Epoch: 33 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:03,058-Speed 13250.13 samples/sec Loss 3.7283 LearningRate 0.0058 Epoch: 33 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:04,611-Speed 13200.83 samples/sec Loss 3.7759 LearningRate 0.0058 Epoch: 33 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:06,167-Speed 13164.79 samples/sec Loss 3.7669 LearningRate 0.0058 Epoch: 33 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:07,724-Speed 13158.58 samples/sec Loss 3.7349 LearningRate 0.0058 Epoch: 33 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:09,325-Speed 12814.89 samples/sec Loss 3.7075 LearningRate 0.0058 Epoch: 33 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:10,896-Speed 13049.35 samples/sec Loss 3.7630 LearningRate 0.0058 Epoch: 33 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:12,453-Speed 13176.46 samples/sec Loss 3.7295 LearningRate 0.0058 Epoch: 33 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:14,022-Speed 13063.92 samples/sec Loss 3.6505 LearningRate 0.0058 Epoch: 33 Global Step: 84820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 17:51:15,565-Speed 13276.70 samples/sec Loss 3.7624 LearningRate 0.0058 Epoch: 33 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:17,109-Speed 13270.84 samples/sec Loss 3.7452 LearningRate 0.0058 Epoch: 33 Global Step: 84840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:18,686-Speed 12996.17 samples/sec Loss 3.7749 LearningRate 0.0058 Epoch: 33 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:20,250-Speed 13097.42 samples/sec Loss 3.7703 LearningRate 0.0058 Epoch: 33 Global Step: 84860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:21,812-Speed 13117.19 samples/sec Loss 3.6903 LearningRate 0.0057 Epoch: 33 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:23,385-Speed 13029.27 samples/sec Loss 3.6930 LearningRate 0.0057 Epoch: 33 Global Step: 84880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:24,925-Speed 13305.05 samples/sec Loss 3.7054 LearningRate 0.0057 Epoch: 33 Global Step: 84890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:26,467-Speed 13287.71 samples/sec Loss 3.7530 LearningRate 0.0057 Epoch: 33 Global Step: 84900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:51:28,033-Speed 13084.38 samples/sec Loss 3.7249 LearningRate 0.0057 Epoch: 33 Global Step: 84910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:29,618-Speed 12930.83 samples/sec Loss 3.7272 LearningRate 0.0057 Epoch: 33 Global Step: 84920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:31,167-Speed 13228.84 samples/sec Loss 3.7673 LearningRate 0.0057 Epoch: 33 Global Step: 84930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:32,724-Speed 13157.29 samples/sec Loss 3.7127 LearningRate 0.0057 Epoch: 33 Global Step: 84940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:34,298-Speed 13023.03 samples/sec Loss 3.7407 LearningRate 0.0057 Epoch: 33 Global Step: 84950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:35,846-Speed 13239.93 samples/sec Loss 3.7361 LearningRate 0.0057 Epoch: 33 Global Step: 84960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:37,411-Speed 13089.35 samples/sec Loss 3.7491 LearningRate 0.0057 Epoch: 33 Global Step: 84970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:38,962-Speed 13211.84 samples/sec Loss 3.7687 LearningRate 0.0057 Epoch: 33 Global Step: 84980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:40,531-Speed 13059.11 samples/sec Loss 3.7467 LearningRate 0.0057 Epoch: 33 Global Step: 84990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:51:42,098-Speed 13076.50 samples/sec Loss 3.8192 LearningRate 0.0057 Epoch: 33 Global Step: 85000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:52:04,272-[lfw][85000]XNorm: 7.405321 Training: 2022-01-14 17:52:04,272-[lfw][85000]Accuracy-Flip: 0.99567+-0.00382 Training: 2022-01-14 17:52:04,273-[lfw][85000]Accuracy-Highest: 0.99650 Training: 2022-01-14 17:52:29,661-[cfp_fp][85000]XNorm: 6.284276 Training: 2022-01-14 17:52:29,662-[cfp_fp][85000]Accuracy-Flip: 0.96929+-0.00886 Training: 2022-01-14 17:52:29,662-[cfp_fp][85000]Accuracy-Highest: 0.97000 Training: 2022-01-14 17:52:52,637-[agedb_30][85000]XNorm: 7.148998 Training: 2022-01-14 17:52:52,638-[agedb_30][85000]Accuracy-Flip: 0.96850+-0.00724 Training: 2022-01-14 17:52:52,638-[agedb_30][85000]Accuracy-Highest: 0.96950 Training: 2022-01-14 17:52:54,211-Speed 284.01 samples/sec Loss 3.7508 LearningRate 0.0056 Epoch: 33 Global Step: 85010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:52:55,767-Speed 13165.18 samples/sec Loss 3.7147 LearningRate 0.0056 Epoch: 33 Global Step: 85020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:52:57,324-Speed 13172.10 samples/sec Loss 3.7294 LearningRate 0.0056 Epoch: 33 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:52:58,872-Speed 13238.32 samples/sec Loss 3.7864 LearningRate 0.0056 Epoch: 33 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:53:00,447-Speed 13010.59 samples/sec Loss 3.7301 LearningRate 0.0056 Epoch: 33 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:53:01,995-Speed 13234.12 samples/sec Loss 3.7623 LearningRate 0.0056 Epoch: 33 Global Step: 85060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:03,554-Speed 13146.66 samples/sec Loss 3.7820 LearningRate 0.0056 Epoch: 33 Global Step: 85070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:05,135-Speed 12958.75 samples/sec Loss 3.7065 LearningRate 0.0056 Epoch: 33 Global Step: 85080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:06,694-Speed 13149.81 samples/sec Loss 3.7420 LearningRate 0.0056 Epoch: 33 Global Step: 85090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:08,240-Speed 13254.27 samples/sec Loss 3.8213 LearningRate 0.0056 Epoch: 33 Global Step: 85100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:09,815-Speed 13014.77 samples/sec Loss 3.7523 LearningRate 0.0056 Epoch: 33 Global Step: 85110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:11,356-Speed 13291.43 samples/sec Loss 3.7246 LearningRate 0.0056 Epoch: 33 Global Step: 85120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:12,899-Speed 13288.50 samples/sec Loss 3.7970 LearningRate 0.0056 Epoch: 33 Global Step: 85130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:14,449-Speed 13210.77 samples/sec Loss 3.8341 LearningRate 0.0056 Epoch: 33 Global Step: 85140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:16,000-Speed 13210.30 samples/sec Loss 3.8283 LearningRate 0.0056 Epoch: 33 Global Step: 85150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:17,571-Speed 13046.09 samples/sec Loss 3.8290 LearningRate 0.0055 Epoch: 33 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:53:19,172-Speed 12799.33 samples/sec Loss 3.7404 LearningRate 0.0055 Epoch: 33 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:53:20,713-Speed 13298.67 samples/sec Loss 3.6833 LearningRate 0.0055 Epoch: 33 Global Step: 85180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:22,270-Speed 13168.85 samples/sec Loss 3.7612 LearningRate 0.0055 Epoch: 33 Global Step: 85190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:23,823-Speed 13192.61 samples/sec Loss 3.7217 LearningRate 0.0055 Epoch: 33 Global Step: 85200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:25,367-Speed 13272.72 samples/sec Loss 3.7672 LearningRate 0.0055 Epoch: 33 Global Step: 85210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:26,903-Speed 13337.54 samples/sec Loss 3.7856 LearningRate 0.0055 Epoch: 33 Global Step: 85220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:28,503-Speed 12809.60 samples/sec Loss 3.7168 LearningRate 0.0055 Epoch: 33 Global Step: 85230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:30,070-Speed 13078.81 samples/sec Loss 3.7084 LearningRate 0.0055 Epoch: 33 Global Step: 85240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:31,623-Speed 13196.93 samples/sec Loss 3.8490 LearningRate 0.0055 Epoch: 33 Global Step: 85250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:33,192-Speed 13063.05 samples/sec Loss 3.8418 LearningRate 0.0055 Epoch: 33 Global Step: 85260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:34,781-Speed 12893.89 samples/sec Loss 3.8020 LearningRate 0.0055 Epoch: 33 Global Step: 85270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:36,367-Speed 12920.00 samples/sec Loss 3.6737 LearningRate 0.0055 Epoch: 33 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:53:37,913-Speed 13250.82 samples/sec Loss 3.7828 LearningRate 0.0055 Epoch: 33 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:53:39,468-Speed 13184.64 samples/sec Loss 3.7661 LearningRate 0.0054 Epoch: 33 Global Step: 85300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:41,039-Speed 13044.74 samples/sec Loss 3.7653 LearningRate 0.0054 Epoch: 33 Global Step: 85310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:42,638-Speed 12816.13 samples/sec Loss 3.7477 LearningRate 0.0054 Epoch: 33 Global Step: 85320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:44,207-Speed 13060.74 samples/sec Loss 3.8257 LearningRate 0.0054 Epoch: 33 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:45,778-Speed 13043.02 samples/sec Loss 3.7905 LearningRate 0.0054 Epoch: 33 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:47,328-Speed 13222.56 samples/sec Loss 3.7639 LearningRate 0.0054 Epoch: 33 Global Step: 85350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:48,885-Speed 13162.57 samples/sec Loss 3.7901 LearningRate 0.0054 Epoch: 33 Global Step: 85360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:50,438-Speed 13191.28 samples/sec Loss 3.7906 LearningRate 0.0054 Epoch: 33 Global Step: 85370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:52,007-Speed 13061.45 samples/sec Loss 3.7577 LearningRate 0.0054 Epoch: 33 Global Step: 85380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:53,570-Speed 13125.86 samples/sec Loss 3.8212 LearningRate 0.0054 Epoch: 33 Global Step: 85390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:55,119-Speed 13230.76 samples/sec Loss 3.7640 LearningRate 0.0054 Epoch: 33 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:53:56,663-Speed 13272.37 samples/sec Loss 3.8265 LearningRate 0.0054 Epoch: 33 Global Step: 85410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:58,240-Speed 12995.99 samples/sec Loss 3.8103 LearningRate 0.0054 Epoch: 33 Global Step: 85420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:53:59,831-Speed 12875.86 samples/sec Loss 3.7167 LearningRate 0.0054 Epoch: 33 Global Step: 85430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:01,406-Speed 13011.51 samples/sec Loss 3.7566 LearningRate 0.0054 Epoch: 33 Global Step: 85440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:02,974-Speed 13066.89 samples/sec Loss 3.8103 LearningRate 0.0053 Epoch: 33 Global Step: 85450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:04,527-Speed 13199.60 samples/sec Loss 3.7816 LearningRate 0.0053 Epoch: 33 Global Step: 85460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:06,071-Speed 13262.27 samples/sec Loss 3.7573 LearningRate 0.0053 Epoch: 33 Global Step: 85470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:07,613-Speed 13293.58 samples/sec Loss 3.7856 LearningRate 0.0053 Epoch: 33 Global Step: 85480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:09,182-Speed 13058.05 samples/sec Loss 3.8108 LearningRate 0.0053 Epoch: 33 Global Step: 85490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:10,741-Speed 13139.58 samples/sec Loss 3.7882 LearningRate 0.0053 Epoch: 33 Global Step: 85500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:12,313-Speed 13034.97 samples/sec Loss 3.8649 LearningRate 0.0053 Epoch: 33 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:54:13,870-Speed 13158.99 samples/sec Loss 3.7450 LearningRate 0.0053 Epoch: 33 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:54:15,437-Speed 13082.95 samples/sec Loss 3.8056 LearningRate 0.0053 Epoch: 33 Global Step: 85530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:17,020-Speed 12939.40 samples/sec Loss 3.7573 LearningRate 0.0053 Epoch: 33 Global Step: 85540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:18,593-Speed 13033.50 samples/sec Loss 3.8402 LearningRate 0.0053 Epoch: 33 Global Step: 85550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:20,172-Speed 12981.97 samples/sec Loss 3.8090 LearningRate 0.0053 Epoch: 33 Global Step: 85560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:21,737-Speed 13092.53 samples/sec Loss 3.8199 LearningRate 0.0053 Epoch: 33 Global Step: 85570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:23,299-Speed 13134.97 samples/sec Loss 3.8299 LearningRate 0.0053 Epoch: 33 Global Step: 85580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:24,866-Speed 13077.22 samples/sec Loss 3.8618 LearningRate 0.0052 Epoch: 33 Global Step: 85590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:26,433-Speed 13071.50 samples/sec Loss 3.7942 LearningRate 0.0052 Epoch: 33 Global Step: 85600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:27,999-Speed 13104.96 samples/sec Loss 3.8739 LearningRate 0.0052 Epoch: 33 Global Step: 85610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:29,560-Speed 13124.28 samples/sec Loss 3.8197 LearningRate 0.0052 Epoch: 33 Global Step: 85620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:31,109-Speed 13227.47 samples/sec Loss 3.8130 LearningRate 0.0052 Epoch: 33 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:54:32,667-Speed 13149.63 samples/sec Loss 3.8026 LearningRate 0.0052 Epoch: 33 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:54:34,234-Speed 13076.32 samples/sec Loss 3.8428 LearningRate 0.0052 Epoch: 33 Global Step: 85650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:35,799-Speed 13095.46 samples/sec Loss 3.7923 LearningRate 0.0052 Epoch: 33 Global Step: 85660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:37,346-Speed 13246.54 samples/sec Loss 3.8011 LearningRate 0.0052 Epoch: 33 Global Step: 85670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:38,892-Speed 13250.63 samples/sec Loss 3.8406 LearningRate 0.0052 Epoch: 33 Global Step: 85680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:40,464-Speed 13035.26 samples/sec Loss 3.7551 LearningRate 0.0052 Epoch: 33 Global Step: 85690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:42,009-Speed 13260.56 samples/sec Loss 3.7363 LearningRate 0.0052 Epoch: 33 Global Step: 85700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:43,569-Speed 13135.65 samples/sec Loss 3.9001 LearningRate 0.0052 Epoch: 33 Global Step: 85710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:45,121-Speed 13208.87 samples/sec Loss 3.8540 LearningRate 0.0052 Epoch: 33 Global Step: 85720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:46,699-Speed 12979.31 samples/sec Loss 3.7619 LearningRate 0.0052 Epoch: 33 Global Step: 85730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:48,267-Speed 13069.48 samples/sec Loss 3.8245 LearningRate 0.0051 Epoch: 33 Global Step: 85740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:49,835-Speed 13071.01 samples/sec Loss 3.7624 LearningRate 0.0051 Epoch: 33 Global Step: 85750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:51,385-Speed 13221.16 samples/sec Loss 3.7321 LearningRate 0.0051 Epoch: 33 Global Step: 85760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:52,909-Speed 13442.35 samples/sec Loss 3.8103 LearningRate 0.0051 Epoch: 33 Global Step: 85770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:54,482-Speed 13024.10 samples/sec Loss 3.8664 LearningRate 0.0051 Epoch: 33 Global Step: 85780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:56,045-Speed 13113.74 samples/sec Loss 3.8657 LearningRate 0.0051 Epoch: 33 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:57,653-Speed 12744.47 samples/sec Loss 3.8625 LearningRate 0.0051 Epoch: 33 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:54:59,220-Speed 13079.99 samples/sec Loss 3.7121 LearningRate 0.0051 Epoch: 33 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:00,760-Speed 13305.05 samples/sec Loss 3.8815 LearningRate 0.0051 Epoch: 33 Global Step: 85820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:02,308-Speed 13238.51 samples/sec Loss 3.8239 LearningRate 0.0051 Epoch: 33 Global Step: 85830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:03,889-Speed 12957.53 samples/sec Loss 3.8396 LearningRate 0.0051 Epoch: 33 Global Step: 85840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:05,426-Speed 13330.56 samples/sec Loss 3.7826 LearningRate 0.0051 Epoch: 33 Global Step: 85850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:06,999-Speed 13030.49 samples/sec Loss 3.7799 LearningRate 0.0051 Epoch: 33 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:08,552-Speed 13193.73 samples/sec Loss 3.8043 LearningRate 0.0051 Epoch: 33 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:10,110-Speed 13150.04 samples/sec Loss 3.7610 LearningRate 0.0051 Epoch: 33 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:11,668-Speed 13177.12 samples/sec Loss 3.9070 LearningRate 0.0050 Epoch: 33 Global Step: 85890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:13,216-Speed 13241.73 samples/sec Loss 3.8120 LearningRate 0.0050 Epoch: 33 Global Step: 85900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:14,808-Speed 12863.94 samples/sec Loss 3.8328 LearningRate 0.0050 Epoch: 33 Global Step: 85910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:16,371-Speed 13115.43 samples/sec Loss 3.8325 LearningRate 0.0050 Epoch: 33 Global Step: 85920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:17,916-Speed 13263.07 samples/sec Loss 3.8168 LearningRate 0.0050 Epoch: 33 Global Step: 85930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:19,477-Speed 13134.06 samples/sec Loss 3.8405 LearningRate 0.0050 Epoch: 33 Global Step: 85940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:21,061-Speed 12930.90 samples/sec Loss 3.7492 LearningRate 0.0050 Epoch: 33 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:22,610-Speed 13235.87 samples/sec Loss 3.8347 LearningRate 0.0050 Epoch: 33 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:24,161-Speed 13210.56 samples/sec Loss 3.7194 LearningRate 0.0050 Epoch: 33 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:25,744-Speed 12938.38 samples/sec Loss 3.8734 LearningRate 0.0050 Epoch: 33 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:40,841-Speed 1356.73 samples/sec Loss 3.7251 LearningRate 0.0050 Epoch: 34 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:42,420-Speed 12978.53 samples/sec Loss 3.5037 LearningRate 0.0050 Epoch: 34 Global Step: 86000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:43,990-Speed 13049.69 samples/sec Loss 3.5256 LearningRate 0.0050 Epoch: 34 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:45,565-Speed 13011.25 samples/sec Loss 3.5000 LearningRate 0.0050 Epoch: 34 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:55:47,122-Speed 13161.12 samples/sec Loss 3.5117 LearningRate 0.0050 Epoch: 34 Global Step: 86030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:48,668-Speed 13252.94 samples/sec Loss 3.4516 LearningRate 0.0050 Epoch: 34 Global Step: 86040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:50,253-Speed 12926.27 samples/sec Loss 3.5266 LearningRate 0.0049 Epoch: 34 Global Step: 86050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:51,811-Speed 13152.91 samples/sec Loss 3.4561 LearningRate 0.0049 Epoch: 34 Global Step: 86060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:53,371-Speed 13142.26 samples/sec Loss 3.4838 LearningRate 0.0049 Epoch: 34 Global Step: 86070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:54,920-Speed 13229.37 samples/sec Loss 3.5377 LearningRate 0.0049 Epoch: 34 Global Step: 86080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:56,489-Speed 13059.98 samples/sec Loss 3.5704 LearningRate 0.0049 Epoch: 34 Global Step: 86090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:58,042-Speed 13205.01 samples/sec Loss 3.4668 LearningRate 0.0049 Epoch: 34 Global Step: 86100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:55:59,609-Speed 13069.70 samples/sec Loss 3.5252 LearningRate 0.0049 Epoch: 34 Global Step: 86110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:01,163-Speed 13187.32 samples/sec Loss 3.5511 LearningRate 0.0049 Epoch: 34 Global Step: 86120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:02,719-Speed 13169.73 samples/sec Loss 3.4961 LearningRate 0.0049 Epoch: 34 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:04,295-Speed 13005.42 samples/sec Loss 3.5128 LearningRate 0.0049 Epoch: 34 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:05,848-Speed 13196.81 samples/sec Loss 3.5838 LearningRate 0.0049 Epoch: 34 Global Step: 86150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:07,420-Speed 13033.70 samples/sec Loss 3.5488 LearningRate 0.0049 Epoch: 34 Global Step: 86160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:09,011-Speed 12882.37 samples/sec Loss 3.5585 LearningRate 0.0049 Epoch: 34 Global Step: 86170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:10,573-Speed 13120.69 samples/sec Loss 3.5619 LearningRate 0.0049 Epoch: 34 Global Step: 86180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:12,122-Speed 13220.96 samples/sec Loss 3.5463 LearningRate 0.0049 Epoch: 34 Global Step: 86190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:13,692-Speed 13061.83 samples/sec Loss 3.5381 LearningRate 0.0048 Epoch: 34 Global Step: 86200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:15,251-Speed 13142.18 samples/sec Loss 3.6078 LearningRate 0.0048 Epoch: 34 Global Step: 86210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:16,825-Speed 13034.64 samples/sec Loss 3.5102 LearningRate 0.0048 Epoch: 34 Global Step: 86220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:18,385-Speed 13143.07 samples/sec Loss 3.5459 LearningRate 0.0048 Epoch: 34 Global Step: 86230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:19,966-Speed 12956.41 samples/sec Loss 3.4940 LearningRate 0.0048 Epoch: 34 Global Step: 86240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:21,573-Speed 12752.78 samples/sec Loss 3.5417 LearningRate 0.0048 Epoch: 34 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:23,163-Speed 12907.01 samples/sec Loss 3.5723 LearningRate 0.0048 Epoch: 34 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:24,716-Speed 13191.25 samples/sec Loss 3.5777 LearningRate 0.0048 Epoch: 34 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:26,300-Speed 12937.33 samples/sec Loss 3.5379 LearningRate 0.0048 Epoch: 34 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:27,862-Speed 13115.39 samples/sec Loss 3.5208 LearningRate 0.0048 Epoch: 34 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:29,421-Speed 13143.47 samples/sec Loss 3.5365 LearningRate 0.0048 Epoch: 34 Global Step: 86300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:30,982-Speed 13128.08 samples/sec Loss 3.6044 LearningRate 0.0048 Epoch: 34 Global Step: 86310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:32,556-Speed 13018.55 samples/sec Loss 3.5161 LearningRate 0.0048 Epoch: 34 Global Step: 86320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:34,136-Speed 12969.59 samples/sec Loss 3.5402 LearningRate 0.0048 Epoch: 34 Global Step: 86330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:35,714-Speed 12979.62 samples/sec Loss 3.6172 LearningRate 0.0048 Epoch: 34 Global Step: 86340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:37,285-Speed 13046.61 samples/sec Loss 3.5424 LearningRate 0.0047 Epoch: 34 Global Step: 86350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:38,866-Speed 12963.81 samples/sec Loss 3.4894 LearningRate 0.0047 Epoch: 34 Global Step: 86360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:40,441-Speed 13008.14 samples/sec Loss 3.6031 LearningRate 0.0047 Epoch: 34 Global Step: 86370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:42,010-Speed 13057.63 samples/sec Loss 3.5335 LearningRate 0.0047 Epoch: 34 Global Step: 86380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:43,566-Speed 13171.18 samples/sec Loss 3.5202 LearningRate 0.0047 Epoch: 34 Global Step: 86390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:45,153-Speed 12911.58 samples/sec Loss 3.5128 LearningRate 0.0047 Epoch: 34 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:56:46,676-Speed 13460.60 samples/sec Loss 3.5554 LearningRate 0.0047 Epoch: 34 Global Step: 86410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:48,250-Speed 13015.22 samples/sec Loss 3.5086 LearningRate 0.0047 Epoch: 34 Global Step: 86420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:49,838-Speed 12899.10 samples/sec Loss 3.5879 LearningRate 0.0047 Epoch: 34 Global Step: 86430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:51,412-Speed 13018.26 samples/sec Loss 3.5771 LearningRate 0.0047 Epoch: 34 Global Step: 86440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:52,979-Speed 13085.39 samples/sec Loss 3.5836 LearningRate 0.0047 Epoch: 34 Global Step: 86450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:54,583-Speed 12768.66 samples/sec Loss 3.5249 LearningRate 0.0047 Epoch: 34 Global Step: 86460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:56,183-Speed 12804.19 samples/sec Loss 3.5834 LearningRate 0.0047 Epoch: 34 Global Step: 86470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:57,762-Speed 12976.49 samples/sec Loss 3.5779 LearningRate 0.0047 Epoch: 34 Global Step: 86480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:56:59,339-Speed 12996.41 samples/sec Loss 3.6033 LearningRate 0.0047 Epoch: 34 Global Step: 86490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:00,901-Speed 13113.84 samples/sec Loss 3.6305 LearningRate 0.0047 Epoch: 34 Global Step: 86500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:02,450-Speed 13233.39 samples/sec Loss 3.5573 LearningRate 0.0046 Epoch: 34 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:04,007-Speed 13166.63 samples/sec Loss 3.4421 LearningRate 0.0046 Epoch: 34 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:05,565-Speed 13155.09 samples/sec Loss 3.5704 LearningRate 0.0046 Epoch: 34 Global Step: 86530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:07,117-Speed 13197.31 samples/sec Loss 3.5229 LearningRate 0.0046 Epoch: 34 Global Step: 86540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:08,684-Speed 13082.73 samples/sec Loss 3.5887 LearningRate 0.0046 Epoch: 34 Global Step: 86550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:10,257-Speed 13018.66 samples/sec Loss 3.6007 LearningRate 0.0046 Epoch: 34 Global Step: 86560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:11,812-Speed 13180.23 samples/sec Loss 3.5279 LearningRate 0.0046 Epoch: 34 Global Step: 86570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:13,381-Speed 13062.90 samples/sec Loss 3.6352 LearningRate 0.0046 Epoch: 34 Global Step: 86580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:14,948-Speed 13074.40 samples/sec Loss 3.5458 LearningRate 0.0046 Epoch: 34 Global Step: 86590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:16,512-Speed 13122.22 samples/sec Loss 3.5606 LearningRate 0.0046 Epoch: 34 Global Step: 86600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:18,086-Speed 13023.07 samples/sec Loss 3.6182 LearningRate 0.0046 Epoch: 34 Global Step: 86610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:19,685-Speed 12810.72 samples/sec Loss 3.6156 LearningRate 0.0046 Epoch: 34 Global Step: 86620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:21,247-Speed 13120.30 samples/sec Loss 3.6128 LearningRate 0.0046 Epoch: 34 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:22,835-Speed 12907.63 samples/sec Loss 3.5625 LearningRate 0.0046 Epoch: 34 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:24,399-Speed 13099.63 samples/sec Loss 3.6450 LearningRate 0.0046 Epoch: 34 Global Step: 86650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:25,961-Speed 13114.48 samples/sec Loss 3.6500 LearningRate 0.0046 Epoch: 34 Global Step: 86660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:27,540-Speed 12976.92 samples/sec Loss 3.5960 LearningRate 0.0045 Epoch: 34 Global Step: 86670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:29,108-Speed 13072.63 samples/sec Loss 3.6221 LearningRate 0.0045 Epoch: 34 Global Step: 86680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:30,680-Speed 13035.35 samples/sec Loss 3.5526 LearningRate 0.0045 Epoch: 34 Global Step: 86690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:32,267-Speed 12906.41 samples/sec Loss 3.7244 LearningRate 0.0045 Epoch: 34 Global Step: 86700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:33,846-Speed 12981.71 samples/sec Loss 3.5768 LearningRate 0.0045 Epoch: 34 Global Step: 86710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:35,420-Speed 13013.77 samples/sec Loss 3.6238 LearningRate 0.0045 Epoch: 34 Global Step: 86720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:36,951-Speed 13385.65 samples/sec Loss 3.6330 LearningRate 0.0045 Epoch: 34 Global Step: 86730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:38,510-Speed 13147.33 samples/sec Loss 3.6405 LearningRate 0.0045 Epoch: 34 Global Step: 86740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:40,093-Speed 12940.19 samples/sec Loss 3.6427 LearningRate 0.0045 Epoch: 34 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:41,655-Speed 13119.37 samples/sec Loss 3.6141 LearningRate 0.0045 Epoch: 34 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:43,204-Speed 13236.18 samples/sec Loss 3.6399 LearningRate 0.0045 Epoch: 34 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:44,772-Speed 13075.94 samples/sec Loss 3.6130 LearningRate 0.0045 Epoch: 34 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:46,334-Speed 13117.82 samples/sec Loss 3.5638 LearningRate 0.0045 Epoch: 34 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:57:47,883-Speed 13229.81 samples/sec Loss 3.5805 LearningRate 0.0045 Epoch: 34 Global Step: 86800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:49,448-Speed 13090.42 samples/sec Loss 3.6130 LearningRate 0.0045 Epoch: 34 Global Step: 86810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:51,031-Speed 12946.13 samples/sec Loss 3.6007 LearningRate 0.0045 Epoch: 34 Global Step: 86820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:52,602-Speed 13040.12 samples/sec Loss 3.6642 LearningRate 0.0044 Epoch: 34 Global Step: 86830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:54,193-Speed 12880.17 samples/sec Loss 3.6454 LearningRate 0.0044 Epoch: 34 Global Step: 86840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:55,758-Speed 13089.39 samples/sec Loss 3.5394 LearningRate 0.0044 Epoch: 34 Global Step: 86850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:57,333-Speed 13012.60 samples/sec Loss 3.5780 LearningRate 0.0044 Epoch: 34 Global Step: 86860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:57:58,925-Speed 12870.34 samples/sec Loss 3.6443 LearningRate 0.0044 Epoch: 34 Global Step: 86870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:00,508-Speed 12945.93 samples/sec Loss 3.5680 LearningRate 0.0044 Epoch: 34 Global Step: 86880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:02,090-Speed 12954.26 samples/sec Loss 3.6625 LearningRate 0.0044 Epoch: 34 Global Step: 86890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:03,682-Speed 12868.57 samples/sec Loss 3.6317 LearningRate 0.0044 Epoch: 34 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:05,257-Speed 13011.88 samples/sec Loss 3.6164 LearningRate 0.0044 Epoch: 34 Global Step: 86910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:06,816-Speed 13146.65 samples/sec Loss 3.6282 LearningRate 0.0044 Epoch: 34 Global Step: 86920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:08,390-Speed 13014.95 samples/sec Loss 3.5927 LearningRate 0.0044 Epoch: 34 Global Step: 86930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:09,993-Speed 12780.33 samples/sec Loss 3.5636 LearningRate 0.0044 Epoch: 34 Global Step: 86940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:11,559-Speed 13087.84 samples/sec Loss 3.6641 LearningRate 0.0044 Epoch: 34 Global Step: 86950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:13,110-Speed 13212.88 samples/sec Loss 3.5891 LearningRate 0.0044 Epoch: 34 Global Step: 86960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:14,679-Speed 13059.10 samples/sec Loss 3.6972 LearningRate 0.0044 Epoch: 34 Global Step: 86970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:16,229-Speed 13218.22 samples/sec Loss 3.6025 LearningRate 0.0044 Epoch: 34 Global Step: 86980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:17,816-Speed 12913.87 samples/sec Loss 3.6034 LearningRate 0.0043 Epoch: 34 Global Step: 86990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:19,392-Speed 12999.56 samples/sec Loss 3.6420 LearningRate 0.0043 Epoch: 34 Global Step: 87000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:20,969-Speed 12991.48 samples/sec Loss 3.5880 LearningRate 0.0043 Epoch: 34 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:22,564-Speed 12848.34 samples/sec Loss 3.6088 LearningRate 0.0043 Epoch: 34 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:24,138-Speed 13019.57 samples/sec Loss 3.6202 LearningRate 0.0043 Epoch: 34 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:25,712-Speed 13024.46 samples/sec Loss 3.6195 LearningRate 0.0043 Epoch: 34 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:27,264-Speed 13199.96 samples/sec Loss 3.5262 LearningRate 0.0043 Epoch: 34 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:28,821-Speed 13166.60 samples/sec Loss 3.5932 LearningRate 0.0043 Epoch: 34 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:30,389-Speed 13064.80 samples/sec Loss 3.6394 LearningRate 0.0043 Epoch: 34 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:31,976-Speed 12912.24 samples/sec Loss 3.6075 LearningRate 0.0043 Epoch: 34 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:33,553-Speed 12991.20 samples/sec Loss 3.6638 LearningRate 0.0043 Epoch: 34 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:35,125-Speed 13033.99 samples/sec Loss 3.5992 LearningRate 0.0043 Epoch: 34 Global Step: 87100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:36,689-Speed 13098.13 samples/sec Loss 3.6698 LearningRate 0.0043 Epoch: 34 Global Step: 87110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:38,247-Speed 13160.01 samples/sec Loss 3.6756 LearningRate 0.0043 Epoch: 34 Global Step: 87120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:39,818-Speed 13040.03 samples/sec Loss 3.6725 LearningRate 0.0043 Epoch: 34 Global Step: 87130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:41,409-Speed 12873.02 samples/sec Loss 3.6018 LearningRate 0.0043 Epoch: 34 Global Step: 87140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:42,990-Speed 12966.56 samples/sec Loss 3.6819 LearningRate 0.0043 Epoch: 34 Global Step: 87150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:44,566-Speed 13003.08 samples/sec Loss 3.6313 LearningRate 0.0042 Epoch: 34 Global Step: 87160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:46,103-Speed 13329.27 samples/sec Loss 3.6679 LearningRate 0.0042 Epoch: 34 Global Step: 87170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:47,645-Speed 13291.19 samples/sec Loss 3.6096 LearningRate 0.0042 Epoch: 34 Global Step: 87180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:49,218-Speed 13027.84 samples/sec Loss 3.6145 LearningRate 0.0042 Epoch: 34 Global Step: 87190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:50,786-Speed 13066.80 samples/sec Loss 3.6336 LearningRate 0.0042 Epoch: 34 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:52,342-Speed 13172.06 samples/sec Loss 3.6566 LearningRate 0.0042 Epoch: 34 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:58:53,898-Speed 13164.33 samples/sec Loss 3.6502 LearningRate 0.0042 Epoch: 34 Global Step: 87220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:55,465-Speed 13080.01 samples/sec Loss 3.6687 LearningRate 0.0042 Epoch: 34 Global Step: 87230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:57,021-Speed 13165.80 samples/sec Loss 3.6438 LearningRate 0.0042 Epoch: 34 Global Step: 87240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:58:58,584-Speed 13108.55 samples/sec Loss 3.7161 LearningRate 0.0042 Epoch: 34 Global Step: 87250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:00,145-Speed 13128.08 samples/sec Loss 3.6925 LearningRate 0.0042 Epoch: 34 Global Step: 87260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:01,726-Speed 12956.47 samples/sec Loss 3.6548 LearningRate 0.0042 Epoch: 34 Global Step: 87270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:03,310-Speed 12941.85 samples/sec Loss 3.5625 LearningRate 0.0042 Epoch: 34 Global Step: 87280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:04,897-Speed 12914.02 samples/sec Loss 3.5907 LearningRate 0.0042 Epoch: 34 Global Step: 87290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:06,445-Speed 13236.01 samples/sec Loss 3.6362 LearningRate 0.0042 Epoch: 34 Global Step: 87300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:08,006-Speed 13124.31 samples/sec Loss 3.6934 LearningRate 0.0042 Epoch: 34 Global Step: 87310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:09,596-Speed 12891.25 samples/sec Loss 3.6145 LearningRate 0.0041 Epoch: 34 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:59:11,156-Speed 13131.95 samples/sec Loss 3.6383 LearningRate 0.0041 Epoch: 34 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:59:12,725-Speed 13062.66 samples/sec Loss 3.7362 LearningRate 0.0041 Epoch: 34 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:59:14,306-Speed 12958.79 samples/sec Loss 3.6138 LearningRate 0.0041 Epoch: 34 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:59:15,882-Speed 13007.40 samples/sec Loss 3.6692 LearningRate 0.0041 Epoch: 34 Global Step: 87360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:59:17,439-Speed 13154.21 samples/sec Loss 3.6115 LearningRate 0.0041 Epoch: 34 Global Step: 87370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 17:59:18,972-Speed 13369.82 samples/sec Loss 3.6538 LearningRate 0.0041 Epoch: 34 Global Step: 87380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:20,552-Speed 12972.49 samples/sec Loss 3.6251 LearningRate 0.0041 Epoch: 34 Global Step: 87390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:22,119-Speed 13073.00 samples/sec Loss 3.6073 LearningRate 0.0041 Epoch: 34 Global Step: 87400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:23,693-Speed 13016.85 samples/sec Loss 3.6884 LearningRate 0.0041 Epoch: 34 Global Step: 87410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:25,264-Speed 13042.56 samples/sec Loss 3.5779 LearningRate 0.0041 Epoch: 34 Global Step: 87420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:26,853-Speed 12891.81 samples/sec Loss 3.5562 LearningRate 0.0041 Epoch: 34 Global Step: 87430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:28,398-Speed 13268.40 samples/sec Loss 3.6541 LearningRate 0.0041 Epoch: 34 Global Step: 87440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:29,969-Speed 13037.33 samples/sec Loss 3.6676 LearningRate 0.0041 Epoch: 34 Global Step: 87450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:31,530-Speed 13129.25 samples/sec Loss 3.5803 LearningRate 0.0041 Epoch: 34 Global Step: 87460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:33,087-Speed 13167.48 samples/sec Loss 3.6809 LearningRate 0.0041 Epoch: 34 Global Step: 87470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:34,654-Speed 13075.43 samples/sec Loss 3.6845 LearningRate 0.0041 Epoch: 34 Global Step: 87480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:36,231-Speed 12993.22 samples/sec Loss 3.7095 LearningRate 0.0040 Epoch: 34 Global Step: 87490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:37,804-Speed 13023.48 samples/sec Loss 3.6255 LearningRate 0.0040 Epoch: 34 Global Step: 87500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:39,367-Speed 13108.07 samples/sec Loss 3.6345 LearningRate 0.0040 Epoch: 34 Global Step: 87510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:40,933-Speed 13090.69 samples/sec Loss 3.6826 LearningRate 0.0040 Epoch: 34 Global Step: 87520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:42,504-Speed 13037.19 samples/sec Loss 3.6657 LearningRate 0.0040 Epoch: 34 Global Step: 87530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:44,068-Speed 13102.17 samples/sec Loss 3.6968 LearningRate 0.0040 Epoch: 34 Global Step: 87540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 17:59:45,635-Speed 13075.27 samples/sec Loss 3.6892 LearningRate 0.0040 Epoch: 34 Global Step: 87550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:47,211-Speed 13002.19 samples/sec Loss 3.6449 LearningRate 0.0040 Epoch: 34 Global Step: 87560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:48,795-Speed 12935.26 samples/sec Loss 3.6342 LearningRate 0.0040 Epoch: 34 Global Step: 87570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:50,369-Speed 13017.44 samples/sec Loss 3.7213 LearningRate 0.0040 Epoch: 34 Global Step: 87580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:51,968-Speed 12817.63 samples/sec Loss 3.7302 LearningRate 0.0040 Epoch: 34 Global Step: 87590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:53,537-Speed 13059.88 samples/sec Loss 3.6553 LearningRate 0.0040 Epoch: 34 Global Step: 87600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:55,104-Speed 13084.62 samples/sec Loss 3.6761 LearningRate 0.0040 Epoch: 34 Global Step: 87610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:56,671-Speed 13071.85 samples/sec Loss 3.6509 LearningRate 0.0040 Epoch: 34 Global Step: 87620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:58,230-Speed 13147.12 samples/sec Loss 3.6666 LearningRate 0.0040 Epoch: 34 Global Step: 87630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 17:59:59,795-Speed 13095.32 samples/sec Loss 3.6922 LearningRate 0.0040 Epoch: 34 Global Step: 87640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:00:01,366-Speed 13040.76 samples/sec Loss 3.6757 LearningRate 0.0040 Epoch: 34 Global Step: 87650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:02,927-Speed 13129.12 samples/sec Loss 3.6918 LearningRate 0.0039 Epoch: 34 Global Step: 87660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:04,507-Speed 12963.10 samples/sec Loss 3.6426 LearningRate 0.0039 Epoch: 34 Global Step: 87670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:06,088-Speed 12958.02 samples/sec Loss 3.6223 LearningRate 0.0039 Epoch: 34 Global Step: 87680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:07,654-Speed 13086.19 samples/sec Loss 3.6727 LearningRate 0.0039 Epoch: 34 Global Step: 87690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:09,229-Speed 13016.57 samples/sec Loss 3.6816 LearningRate 0.0039 Epoch: 34 Global Step: 87700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:10,803-Speed 13017.75 samples/sec Loss 3.6370 LearningRate 0.0039 Epoch: 34 Global Step: 87710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:12,394-Speed 12875.18 samples/sec Loss 3.6607 LearningRate 0.0039 Epoch: 34 Global Step: 87720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:13,986-Speed 12873.88 samples/sec Loss 3.6210 LearningRate 0.0039 Epoch: 34 Global Step: 87730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:15,533-Speed 13244.69 samples/sec Loss 3.6619 LearningRate 0.0039 Epoch: 34 Global Step: 87740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:17,072-Speed 13311.56 samples/sec Loss 3.6780 LearningRate 0.0039 Epoch: 34 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:18,619-Speed 13254.73 samples/sec Loss 3.6617 LearningRate 0.0039 Epoch: 34 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:20,206-Speed 12908.58 samples/sec Loss 3.7338 LearningRate 0.0039 Epoch: 34 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:21,769-Speed 13108.66 samples/sec Loss 3.7141 LearningRate 0.0039 Epoch: 34 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:23,349-Speed 12971.08 samples/sec Loss 3.6492 LearningRate 0.0039 Epoch: 34 Global Step: 87790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:24,932-Speed 12942.07 samples/sec Loss 3.6448 LearningRate 0.0039 Epoch: 34 Global Step: 87800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:26,501-Speed 13056.95 samples/sec Loss 3.6670 LearningRate 0.0039 Epoch: 34 Global Step: 87810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:28,056-Speed 13179.03 samples/sec Loss 3.6701 LearningRate 0.0039 Epoch: 34 Global Step: 87820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:29,632-Speed 13002.59 samples/sec Loss 3.6211 LearningRate 0.0038 Epoch: 34 Global Step: 87830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:31,211-Speed 12972.58 samples/sec Loss 3.6312 LearningRate 0.0038 Epoch: 34 Global Step: 87840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:32,793-Speed 12949.39 samples/sec Loss 3.6761 LearningRate 0.0038 Epoch: 34 Global Step: 87850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:34,375-Speed 12956.52 samples/sec Loss 3.6680 LearningRate 0.0038 Epoch: 34 Global Step: 87860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:35,934-Speed 13146.57 samples/sec Loss 3.6814 LearningRate 0.0038 Epoch: 34 Global Step: 87870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:37,509-Speed 13011.39 samples/sec Loss 3.7479 LearningRate 0.0038 Epoch: 34 Global Step: 87880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:39,073-Speed 13100.42 samples/sec Loss 3.6864 LearningRate 0.0038 Epoch: 34 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:40,630-Speed 13162.52 samples/sec Loss 3.6788 LearningRate 0.0038 Epoch: 34 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:42,194-Speed 13096.49 samples/sec Loss 3.6972 LearningRate 0.0038 Epoch: 34 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:43,773-Speed 12988.62 samples/sec Loss 3.7351 LearningRate 0.0038 Epoch: 34 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:45,341-Speed 13070.79 samples/sec Loss 3.6764 LearningRate 0.0038 Epoch: 34 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:46,923-Speed 12950.36 samples/sec Loss 3.7112 LearningRate 0.0038 Epoch: 34 Global Step: 87940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:48,477-Speed 13185.56 samples/sec Loss 3.6668 LearningRate 0.0038 Epoch: 34 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:50,055-Speed 12983.02 samples/sec Loss 3.6840 LearningRate 0.0038 Epoch: 34 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:51,619-Speed 13100.67 samples/sec Loss 3.6188 LearningRate 0.0038 Epoch: 34 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:53,178-Speed 13148.12 samples/sec Loss 3.6759 LearningRate 0.0038 Epoch: 34 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:54,723-Speed 13257.19 samples/sec Loss 3.6340 LearningRate 0.0038 Epoch: 34 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:00:56,285-Speed 13119.82 samples/sec Loss 3.6121 LearningRate 0.0038 Epoch: 34 Global Step: 88000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:57,849-Speed 13105.54 samples/sec Loss 3.6668 LearningRate 0.0037 Epoch: 34 Global Step: 88010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:00:59,387-Speed 13321.29 samples/sec Loss 3.6927 LearningRate 0.0037 Epoch: 34 Global Step: 88020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:00,981-Speed 12855.96 samples/sec Loss 3.6542 LearningRate 0.0037 Epoch: 34 Global Step: 88030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:02,564-Speed 12945.98 samples/sec Loss 3.6845 LearningRate 0.0037 Epoch: 34 Global Step: 88040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:04,136-Speed 13035.59 samples/sec Loss 3.7577 LearningRate 0.0037 Epoch: 34 Global Step: 88050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:05,697-Speed 13121.34 samples/sec Loss 3.6242 LearningRate 0.0037 Epoch: 34 Global Step: 88060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:07,249-Speed 13208.65 samples/sec Loss 3.7169 LearningRate 0.0037 Epoch: 34 Global Step: 88070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:08,802-Speed 13195.63 samples/sec Loss 3.7271 LearningRate 0.0037 Epoch: 34 Global Step: 88080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:10,404-Speed 12790.14 samples/sec Loss 3.6640 LearningRate 0.0037 Epoch: 34 Global Step: 88090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:11,965-Speed 13120.03 samples/sec Loss 3.7007 LearningRate 0.0037 Epoch: 34 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:13,587-Speed 12639.83 samples/sec Loss 3.6229 LearningRate 0.0037 Epoch: 34 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:15,164-Speed 12991.32 samples/sec Loss 3.7018 LearningRate 0.0037 Epoch: 34 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:16,735-Speed 13042.72 samples/sec Loss 3.6666 LearningRate 0.0037 Epoch: 34 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:18,314-Speed 12979.79 samples/sec Loss 3.6944 LearningRate 0.0037 Epoch: 34 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:19,883-Speed 13052.48 samples/sec Loss 3.6174 LearningRate 0.0037 Epoch: 34 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:21,439-Speed 13167.63 samples/sec Loss 3.7042 LearningRate 0.0037 Epoch: 34 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:22,993-Speed 13189.64 samples/sec Loss 3.6892 LearningRate 0.0037 Epoch: 34 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:24,571-Speed 12989.83 samples/sec Loss 3.5845 LearningRate 0.0036 Epoch: 34 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:26,124-Speed 13188.42 samples/sec Loss 3.7653 LearningRate 0.0036 Epoch: 34 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:27,676-Speed 13208.89 samples/sec Loss 3.6952 LearningRate 0.0036 Epoch: 34 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:29,243-Speed 13076.71 samples/sec Loss 3.7109 LearningRate 0.0036 Epoch: 34 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:30,800-Speed 13155.59 samples/sec Loss 3.7288 LearningRate 0.0036 Epoch: 34 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:01:32,387-Speed 12915.59 samples/sec Loss 3.6523 LearningRate 0.0036 Epoch: 34 Global Step: 88230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:33,958-Speed 13046.92 samples/sec Loss 3.7238 LearningRate 0.0036 Epoch: 34 Global Step: 88240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:35,505-Speed 13239.67 samples/sec Loss 3.6851 LearningRate 0.0036 Epoch: 34 Global Step: 88250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:37,037-Speed 13369.58 samples/sec Loss 3.6693 LearningRate 0.0036 Epoch: 34 Global Step: 88260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:38,620-Speed 12947.39 samples/sec Loss 3.6966 LearningRate 0.0036 Epoch: 34 Global Step: 88270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:40,210-Speed 12885.53 samples/sec Loss 3.7458 LearningRate 0.0036 Epoch: 34 Global Step: 88280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:41,773-Speed 13107.39 samples/sec Loss 3.7228 LearningRate 0.0036 Epoch: 34 Global Step: 88290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:43,352-Speed 12978.90 samples/sec Loss 3.6328 LearningRate 0.0036 Epoch: 34 Global Step: 88300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:44,899-Speed 13244.79 samples/sec Loss 3.6781 LearningRate 0.0036 Epoch: 34 Global Step: 88310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:46,480-Speed 12965.21 samples/sec Loss 3.7342 LearningRate 0.0036 Epoch: 34 Global Step: 88320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:48,040-Speed 13129.82 samples/sec Loss 3.6540 LearningRate 0.0036 Epoch: 34 Global Step: 88330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:49,598-Speed 13155.33 samples/sec Loss 3.7642 LearningRate 0.0036 Epoch: 34 Global Step: 88340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:51,158-Speed 13139.85 samples/sec Loss 3.7432 LearningRate 0.0036 Epoch: 34 Global Step: 88350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:52,708-Speed 13216.81 samples/sec Loss 3.7664 LearningRate 0.0035 Epoch: 34 Global Step: 88360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:54,275-Speed 13078.98 samples/sec Loss 3.7585 LearningRate 0.0035 Epoch: 34 Global Step: 88370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:55,828-Speed 13192.31 samples/sec Loss 3.6398 LearningRate 0.0035 Epoch: 34 Global Step: 88380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:01:57,381-Speed 13190.99 samples/sec Loss 3.7406 LearningRate 0.0035 Epoch: 34 Global Step: 88390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:01:58,963-Speed 12956.15 samples/sec Loss 3.6845 LearningRate 0.0035 Epoch: 34 Global Step: 88400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:00,528-Speed 13086.54 samples/sec Loss 3.6571 LearningRate 0.0035 Epoch: 34 Global Step: 88410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:02,074-Speed 13259.18 samples/sec Loss 3.7088 LearningRate 0.0035 Epoch: 34 Global Step: 88420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:03,645-Speed 13046.89 samples/sec Loss 3.6236 LearningRate 0.0035 Epoch: 34 Global Step: 88430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:05,217-Speed 13031.66 samples/sec Loss 3.7152 LearningRate 0.0035 Epoch: 34 Global Step: 88440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:06,782-Speed 13092.00 samples/sec Loss 3.7114 LearningRate 0.0035 Epoch: 34 Global Step: 88450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:08,371-Speed 12898.30 samples/sec Loss 3.7306 LearningRate 0.0035 Epoch: 34 Global Step: 88460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:09,949-Speed 12985.83 samples/sec Loss 3.6870 LearningRate 0.0035 Epoch: 34 Global Step: 88470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:11,503-Speed 13188.23 samples/sec Loss 3.7104 LearningRate 0.0035 Epoch: 34 Global Step: 88480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:13,072-Speed 13062.69 samples/sec Loss 3.7138 LearningRate 0.0035 Epoch: 34 Global Step: 88490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:14,711-Speed 12500.47 samples/sec Loss 3.7531 LearningRate 0.0035 Epoch: 34 Global Step: 88500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:16,166-Speed 14085.41 samples/sec Loss 3.7118 LearningRate 0.0035 Epoch: 34 Global Step: 88510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:29,458-Speed 1540.91 samples/sec Loss 3.5983 LearningRate 0.0035 Epoch: 35 Global Step: 88520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:31,085-Speed 12611.16 samples/sec Loss 3.3973 LearningRate 0.0035 Epoch: 35 Global Step: 88530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:32,775-Speed 12123.56 samples/sec Loss 3.5096 LearningRate 0.0034 Epoch: 35 Global Step: 88540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:34,346-Speed 13044.59 samples/sec Loss 3.4621 LearningRate 0.0034 Epoch: 35 Global Step: 88550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:35,917-Speed 13046.69 samples/sec Loss 3.4505 LearningRate 0.0034 Epoch: 35 Global Step: 88560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:37,453-Speed 13340.46 samples/sec Loss 3.5070 LearningRate 0.0034 Epoch: 35 Global Step: 88570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:39,052-Speed 12811.72 samples/sec Loss 3.4875 LearningRate 0.0034 Epoch: 35 Global Step: 88580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:40,658-Speed 12782.56 samples/sec Loss 3.4404 LearningRate 0.0034 Epoch: 35 Global Step: 88590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:42,216-Speed 13157.34 samples/sec Loss 3.5188 LearningRate 0.0034 Epoch: 35 Global Step: 88600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:43,778-Speed 13118.70 samples/sec Loss 3.4741 LearningRate 0.0034 Epoch: 35 Global Step: 88610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:45,349-Speed 13046.29 samples/sec Loss 3.5294 LearningRate 0.0034 Epoch: 35 Global Step: 88620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:46,903-Speed 13182.80 samples/sec Loss 3.5001 LearningRate 0.0034 Epoch: 35 Global Step: 88630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:48,486-Speed 12945.47 samples/sec Loss 3.5197 LearningRate 0.0034 Epoch: 35 Global Step: 88640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:50,064-Speed 12986.81 samples/sec Loss 3.4269 LearningRate 0.0034 Epoch: 35 Global Step: 88650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:51,615-Speed 13211.72 samples/sec Loss 3.3606 LearningRate 0.0034 Epoch: 35 Global Step: 88660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:53,202-Speed 12915.76 samples/sec Loss 3.4728 LearningRate 0.0034 Epoch: 35 Global Step: 88670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:02:54,756-Speed 13178.53 samples/sec Loss 3.4359 LearningRate 0.0034 Epoch: 35 Global Step: 88680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:56,322-Speed 13088.10 samples/sec Loss 3.4563 LearningRate 0.0034 Epoch: 35 Global Step: 88690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:57,894-Speed 13034.37 samples/sec Loss 3.4309 LearningRate 0.0034 Epoch: 35 Global Step: 88700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:02:59,449-Speed 13176.28 samples/sec Loss 3.5540 LearningRate 0.0034 Epoch: 35 Global Step: 88710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:03:01,032-Speed 12946.22 samples/sec Loss 3.5512 LearningRate 0.0034 Epoch: 35 Global Step: 88720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:03:02,612-Speed 12963.67 samples/sec Loss 3.5178 LearningRate 0.0033 Epoch: 35 Global Step: 88730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:03:04,197-Speed 12931.98 samples/sec Loss 3.4651 LearningRate 0.0033 Epoch: 35 Global Step: 88740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:03:05,754-Speed 13161.66 samples/sec Loss 3.5578 LearningRate 0.0033 Epoch: 35 Global Step: 88750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:03:07,310-Speed 13170.76 samples/sec Loss 3.5276 LearningRate 0.0033 Epoch: 35 Global Step: 88760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:03:08,864-Speed 13193.98 samples/sec Loss 3.5262 LearningRate 0.0033 Epoch: 35 Global Step: 88770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:03:10,420-Speed 13167.28 samples/sec Loss 3.4394 LearningRate 0.0033 Epoch: 35 Global Step: 88780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:12,021-Speed 12794.55 samples/sec Loss 3.5462 LearningRate 0.0033 Epoch: 35 Global Step: 88790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:13,576-Speed 13183.91 samples/sec Loss 3.4833 LearningRate 0.0033 Epoch: 35 Global Step: 88800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:15,137-Speed 13121.54 samples/sec Loss 3.4437 LearningRate 0.0033 Epoch: 35 Global Step: 88810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:16,698-Speed 13130.23 samples/sec Loss 3.4765 LearningRate 0.0033 Epoch: 35 Global Step: 88820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:18,267-Speed 13062.36 samples/sec Loss 3.3833 LearningRate 0.0033 Epoch: 35 Global Step: 88830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:19,825-Speed 13149.59 samples/sec Loss 3.4178 LearningRate 0.0033 Epoch: 35 Global Step: 88840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:21,402-Speed 12997.05 samples/sec Loss 3.4883 LearningRate 0.0033 Epoch: 35 Global Step: 88850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:22,964-Speed 13110.50 samples/sec Loss 3.5064 LearningRate 0.0033 Epoch: 35 Global Step: 88860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:24,521-Speed 13164.49 samples/sec Loss 3.5190 LearningRate 0.0033 Epoch: 35 Global Step: 88870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:26,119-Speed 12825.70 samples/sec Loss 3.4888 LearningRate 0.0033 Epoch: 35 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:03:27,678-Speed 13143.35 samples/sec Loss 3.5772 LearningRate 0.0033 Epoch: 35 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:03:29,209-Speed 13384.59 samples/sec Loss 3.5721 LearningRate 0.0033 Epoch: 35 Global Step: 88900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:30,771-Speed 13115.73 samples/sec Loss 3.5148 LearningRate 0.0032 Epoch: 35 Global Step: 88910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:32,337-Speed 13087.18 samples/sec Loss 3.5226 LearningRate 0.0032 Epoch: 35 Global Step: 88920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:33,892-Speed 13184.28 samples/sec Loss 3.4283 LearningRate 0.0032 Epoch: 35 Global Step: 88930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:35,459-Speed 13090.17 samples/sec Loss 3.4864 LearningRate 0.0032 Epoch: 35 Global Step: 88940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:37,025-Speed 13085.44 samples/sec Loss 3.4854 LearningRate 0.0032 Epoch: 35 Global Step: 88950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:38,572-Speed 13251.72 samples/sec Loss 3.5332 LearningRate 0.0032 Epoch: 35 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:40,146-Speed 13012.18 samples/sec Loss 3.5406 LearningRate 0.0032 Epoch: 35 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:41,713-Speed 13078.76 samples/sec Loss 3.5289 LearningRate 0.0032 Epoch: 35 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:43,298-Speed 12932.57 samples/sec Loss 3.5021 LearningRate 0.0032 Epoch: 35 Global Step: 88990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:44,856-Speed 13145.47 samples/sec Loss 3.4328 LearningRate 0.0032 Epoch: 35 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:03:46,429-Speed 13027.59 samples/sec Loss 3.6222 LearningRate 0.0032 Epoch: 35 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:03:47,990-Speed 13130.35 samples/sec Loss 3.5228 LearningRate 0.0032 Epoch: 35 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:03:49,533-Speed 13275.77 samples/sec Loss 3.4608 LearningRate 0.0032 Epoch: 35 Global Step: 89030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:51,107-Speed 13017.47 samples/sec Loss 3.4904 LearningRate 0.0032 Epoch: 35 Global Step: 89040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:52,692-Speed 12933.64 samples/sec Loss 3.4412 LearningRate 0.0032 Epoch: 35 Global Step: 89050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:54,259-Speed 13073.30 samples/sec Loss 3.5315 LearningRate 0.0032 Epoch: 35 Global Step: 89060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:55,796-Speed 13333.81 samples/sec Loss 3.4213 LearningRate 0.0032 Epoch: 35 Global Step: 89070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:57,343-Speed 13245.77 samples/sec Loss 3.5057 LearningRate 0.0032 Epoch: 35 Global Step: 89080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:03:58,899-Speed 13168.14 samples/sec Loss 3.4775 LearningRate 0.0032 Epoch: 35 Global Step: 89090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:00,482-Speed 12942.78 samples/sec Loss 3.4933 LearningRate 0.0031 Epoch: 35 Global Step: 89100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:02,046-Speed 13099.81 samples/sec Loss 3.5206 LearningRate 0.0031 Epoch: 35 Global Step: 89110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:03,614-Speed 13071.57 samples/sec Loss 3.4777 LearningRate 0.0031 Epoch: 35 Global Step: 89120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:05,206-Speed 12871.72 samples/sec Loss 3.4496 LearningRate 0.0031 Epoch: 35 Global Step: 89130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:04:06,761-Speed 13170.45 samples/sec Loss 3.5427 LearningRate 0.0031 Epoch: 35 Global Step: 89140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:04:08,316-Speed 13179.23 samples/sec Loss 3.4934 LearningRate 0.0031 Epoch: 35 Global Step: 89150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:04:09,887-Speed 13048.44 samples/sec Loss 3.5295 LearningRate 0.0031 Epoch: 35 Global Step: 89160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:11,455-Speed 13067.35 samples/sec Loss 3.6167 LearningRate 0.0031 Epoch: 35 Global Step: 89170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:13,042-Speed 12906.43 samples/sec Loss 3.5604 LearningRate 0.0031 Epoch: 35 Global Step: 89180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:14,600-Speed 13157.95 samples/sec Loss 3.5113 LearningRate 0.0031 Epoch: 35 Global Step: 89190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:16,157-Speed 13159.75 samples/sec Loss 3.5735 LearningRate 0.0031 Epoch: 35 Global Step: 89200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:17,722-Speed 13092.29 samples/sec Loss 3.5723 LearningRate 0.0031 Epoch: 35 Global Step: 89210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:19,282-Speed 13133.51 samples/sec Loss 3.5132 LearningRate 0.0031 Epoch: 35 Global Step: 89220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:20,839-Speed 13160.83 samples/sec Loss 3.5670 LearningRate 0.0031 Epoch: 35 Global Step: 89230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:22,409-Speed 13053.75 samples/sec Loss 3.5197 LearningRate 0.0031 Epoch: 35 Global Step: 89240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:23,993-Speed 12934.38 samples/sec Loss 3.5989 LearningRate 0.0031 Epoch: 35 Global Step: 89250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:25,570-Speed 12990.96 samples/sec Loss 3.5363 LearningRate 0.0031 Epoch: 35 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:04:27,155-Speed 12929.00 samples/sec Loss 3.4566 LearningRate 0.0031 Epoch: 35 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:04:28,717-Speed 13119.65 samples/sec Loss 3.5342 LearningRate 0.0031 Epoch: 35 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:04:30,269-Speed 13199.04 samples/sec Loss 3.4887 LearningRate 0.0031 Epoch: 35 Global Step: 89290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:31,834-Speed 13101.22 samples/sec Loss 3.5216 LearningRate 0.0030 Epoch: 35 Global Step: 89300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:33,419-Speed 12932.65 samples/sec Loss 3.5060 LearningRate 0.0030 Epoch: 35 Global Step: 89310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:34,989-Speed 13049.06 samples/sec Loss 3.5688 LearningRate 0.0030 Epoch: 35 Global Step: 89320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:36,551-Speed 13117.47 samples/sec Loss 3.4822 LearningRate 0.0030 Epoch: 35 Global Step: 89330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:38,143-Speed 12870.87 samples/sec Loss 3.4948 LearningRate 0.0030 Epoch: 35 Global Step: 89340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:39,709-Speed 13096.38 samples/sec Loss 3.5108 LearningRate 0.0030 Epoch: 35 Global Step: 89350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:41,261-Speed 13205.92 samples/sec Loss 3.5628 LearningRate 0.0030 Epoch: 35 Global Step: 89360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:42,808-Speed 13246.70 samples/sec Loss 3.5652 LearningRate 0.0030 Epoch: 35 Global Step: 89370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:44,368-Speed 13138.17 samples/sec Loss 3.5537 LearningRate 0.0030 Epoch: 35 Global Step: 89380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:45,912-Speed 13268.95 samples/sec Loss 3.5622 LearningRate 0.0030 Epoch: 35 Global Step: 89390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:47,481-Speed 13055.49 samples/sec Loss 3.5874 LearningRate 0.0030 Epoch: 35 Global Step: 89400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:49,061-Speed 12975.22 samples/sec Loss 3.5058 LearningRate 0.0030 Epoch: 35 Global Step: 89410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:50,633-Speed 13029.98 samples/sec Loss 3.5727 LearningRate 0.0030 Epoch: 35 Global Step: 89420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:52,238-Speed 12769.92 samples/sec Loss 3.6150 LearningRate 0.0030 Epoch: 35 Global Step: 89430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:53,805-Speed 13080.37 samples/sec Loss 3.5793 LearningRate 0.0030 Epoch: 35 Global Step: 89440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:55,371-Speed 13077.76 samples/sec Loss 3.5605 LearningRate 0.0030 Epoch: 35 Global Step: 89450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:56,958-Speed 12913.58 samples/sec Loss 3.5361 LearningRate 0.0030 Epoch: 35 Global Step: 89460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:04:58,538-Speed 12970.63 samples/sec Loss 3.5086 LearningRate 0.0030 Epoch: 35 Global Step: 89470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:00,097-Speed 13141.87 samples/sec Loss 3.5381 LearningRate 0.0030 Epoch: 35 Global Step: 89480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:01,676-Speed 12976.11 samples/sec Loss 3.5058 LearningRate 0.0029 Epoch: 35 Global Step: 89490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:03,244-Speed 13069.80 samples/sec Loss 3.5206 LearningRate 0.0029 Epoch: 35 Global Step: 89500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:04,810-Speed 13082.22 samples/sec Loss 3.5124 LearningRate 0.0029 Epoch: 35 Global Step: 89510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:06,380-Speed 13049.95 samples/sec Loss 3.5095 LearningRate 0.0029 Epoch: 35 Global Step: 89520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:07,940-Speed 13137.64 samples/sec Loss 3.5483 LearningRate 0.0029 Epoch: 35 Global Step: 89530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:09,514-Speed 13022.69 samples/sec Loss 3.5484 LearningRate 0.0029 Epoch: 35 Global Step: 89540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:11,110-Speed 12831.65 samples/sec Loss 3.4920 LearningRate 0.0029 Epoch: 35 Global Step: 89550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:12,688-Speed 12983.16 samples/sec Loss 3.5361 LearningRate 0.0029 Epoch: 35 Global Step: 89560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:14,256-Speed 13071.86 samples/sec Loss 3.4962 LearningRate 0.0029 Epoch: 35 Global Step: 89570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:15,850-Speed 12853.37 samples/sec Loss 3.5710 LearningRate 0.0029 Epoch: 35 Global Step: 89580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:05:17,409-Speed 13149.05 samples/sec Loss 3.5989 LearningRate 0.0029 Epoch: 35 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:18,992-Speed 12942.58 samples/sec Loss 3.5245 LearningRate 0.0029 Epoch: 35 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:20,578-Speed 12920.97 samples/sec Loss 3.5969 LearningRate 0.0029 Epoch: 35 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:22,150-Speed 13032.60 samples/sec Loss 3.5192 LearningRate 0.0029 Epoch: 35 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:23,715-Speed 13099.63 samples/sec Loss 3.5189 LearningRate 0.0029 Epoch: 35 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:25,283-Speed 13065.68 samples/sec Loss 3.6395 LearningRate 0.0029 Epoch: 35 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:26,859-Speed 12997.03 samples/sec Loss 3.5567 LearningRate 0.0029 Epoch: 35 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:28,413-Speed 13194.73 samples/sec Loss 3.5209 LearningRate 0.0029 Epoch: 35 Global Step: 89660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:29,989-Speed 12996.48 samples/sec Loss 3.6037 LearningRate 0.0029 Epoch: 35 Global Step: 89670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:31,561-Speed 13031.55 samples/sec Loss 3.5837 LearningRate 0.0029 Epoch: 35 Global Step: 89680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:33,137-Speed 12999.92 samples/sec Loss 3.6328 LearningRate 0.0028 Epoch: 35 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:05:34,717-Speed 12977.58 samples/sec Loss 3.5968 LearningRate 0.0028 Epoch: 35 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:05:36,296-Speed 12979.01 samples/sec Loss 3.5822 LearningRate 0.0028 Epoch: 35 Global Step: 89710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:37,852-Speed 13162.41 samples/sec Loss 3.5401 LearningRate 0.0028 Epoch: 35 Global Step: 89720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:39,413-Speed 13132.30 samples/sec Loss 3.5495 LearningRate 0.0028 Epoch: 35 Global Step: 89730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:40,975-Speed 13114.51 samples/sec Loss 3.5530 LearningRate 0.0028 Epoch: 35 Global Step: 89740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:42,538-Speed 13111.51 samples/sec Loss 3.5479 LearningRate 0.0028 Epoch: 35 Global Step: 89750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:44,125-Speed 12915.85 samples/sec Loss 3.5350 LearningRate 0.0028 Epoch: 35 Global Step: 89760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:45,693-Speed 13066.12 samples/sec Loss 3.6215 LearningRate 0.0028 Epoch: 35 Global Step: 89770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:47,255-Speed 13113.61 samples/sec Loss 3.4713 LearningRate 0.0028 Epoch: 35 Global Step: 89780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:48,798-Speed 13289.03 samples/sec Loss 3.5212 LearningRate 0.0028 Epoch: 35 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:50,374-Speed 12997.32 samples/sec Loss 3.5381 LearningRate 0.0028 Epoch: 35 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:05:51,929-Speed 13176.99 samples/sec Loss 3.5609 LearningRate 0.0028 Epoch: 35 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:05:53,505-Speed 13002.58 samples/sec Loss 3.5472 LearningRate 0.0028 Epoch: 35 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:05:55,061-Speed 13165.50 samples/sec Loss 3.5787 LearningRate 0.0028 Epoch: 35 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:05:56,667-Speed 12762.12 samples/sec Loss 3.6113 LearningRate 0.0028 Epoch: 35 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:05:58,249-Speed 12948.60 samples/sec Loss 3.5526 LearningRate 0.0028 Epoch: 35 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:05:59,807-Speed 13154.98 samples/sec Loss 3.5569 LearningRate 0.0028 Epoch: 35 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:06:01,379-Speed 13035.19 samples/sec Loss 3.6371 LearningRate 0.0028 Epoch: 35 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:06:02,939-Speed 13135.89 samples/sec Loss 3.5876 LearningRate 0.0028 Epoch: 35 Global Step: 89880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:04,522-Speed 12945.37 samples/sec Loss 3.5806 LearningRate 0.0028 Epoch: 35 Global Step: 89890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:06,105-Speed 12941.55 samples/sec Loss 3.5648 LearningRate 0.0027 Epoch: 35 Global Step: 89900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:07,662-Speed 13156.94 samples/sec Loss 3.5387 LearningRate 0.0027 Epoch: 35 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:09,224-Speed 13122.38 samples/sec Loss 3.5900 LearningRate 0.0027 Epoch: 35 Global Step: 89920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:10,779-Speed 13180.25 samples/sec Loss 3.6438 LearningRate 0.0027 Epoch: 35 Global Step: 89930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:12,344-Speed 13090.37 samples/sec Loss 3.5456 LearningRate 0.0027 Epoch: 35 Global Step: 89940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:13,939-Speed 12848.93 samples/sec Loss 3.5151 LearningRate 0.0027 Epoch: 35 Global Step: 89950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:15,499-Speed 13129.48 samples/sec Loss 3.5843 LearningRate 0.0027 Epoch: 35 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:17,072-Speed 13027.13 samples/sec Loss 3.4325 LearningRate 0.0027 Epoch: 35 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:06:18,670-Speed 12828.69 samples/sec Loss 3.5644 LearningRate 0.0027 Epoch: 35 Global Step: 89980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:06:20,243-Speed 13020.88 samples/sec Loss 3.6235 LearningRate 0.0027 Epoch: 35 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:06:21,821-Speed 12991.33 samples/sec Loss 3.5595 LearningRate 0.0027 Epoch: 35 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:06:45,816-[lfw][90000]XNorm: 7.169914 Training: 2022-01-14 18:06:45,817-[lfw][90000]Accuracy-Flip: 0.99583+-0.00359 Training: 2022-01-14 18:06:45,817-[lfw][90000]Accuracy-Highest: 0.99650 Training: 2022-01-14 18:07:11,357-[cfp_fp][90000]XNorm: 6.104506 Training: 2022-01-14 18:07:11,358-[cfp_fp][90000]Accuracy-Flip: 0.97043+-0.00982 Training: 2022-01-14 18:07:11,358-[cfp_fp][90000]Accuracy-Highest: 0.97043 Training: 2022-01-14 18:07:33,870-[agedb_30][90000]XNorm: 6.944675 Training: 2022-01-14 18:07:33,870-[agedb_30][90000]Accuracy-Flip: 0.97100+-0.00727 Training: 2022-01-14 18:07:33,871-[agedb_30][90000]Accuracy-Highest: 0.97100 Training: 2022-01-14 18:07:35,473-Speed 278.06 samples/sec Loss 3.5746 LearningRate 0.0027 Epoch: 35 Global Step: 90010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:37,067-Speed 12868.28 samples/sec Loss 3.5461 LearningRate 0.0027 Epoch: 35 Global Step: 90020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:38,623-Speed 13172.26 samples/sec Loss 3.5830 LearningRate 0.0027 Epoch: 35 Global Step: 90030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:40,210-Speed 12911.48 samples/sec Loss 3.6055 LearningRate 0.0027 Epoch: 35 Global Step: 90040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:41,806-Speed 12843.42 samples/sec Loss 3.5943 LearningRate 0.0027 Epoch: 35 Global Step: 90050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:43,381-Speed 13014.16 samples/sec Loss 3.5544 LearningRate 0.0027 Epoch: 35 Global Step: 90060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:44,972-Speed 12879.66 samples/sec Loss 3.5526 LearningRate 0.0027 Epoch: 35 Global Step: 90070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:46,529-Speed 13159.25 samples/sec Loss 3.5169 LearningRate 0.0027 Epoch: 35 Global Step: 90080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:48,081-Speed 13209.01 samples/sec Loss 3.6193 LearningRate 0.0027 Epoch: 35 Global Step: 90090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:49,658-Speed 13004.96 samples/sec Loss 3.6155 LearningRate 0.0026 Epoch: 35 Global Step: 90100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:07:51,255-Speed 12828.20 samples/sec Loss 3.5596 LearningRate 0.0026 Epoch: 35 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:07:52,858-Speed 12807.77 samples/sec Loss 3.5034 LearningRate 0.0026 Epoch: 35 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:07:54,431-Speed 13031.72 samples/sec Loss 3.6365 LearningRate 0.0026 Epoch: 35 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:07:56,019-Speed 12911.13 samples/sec Loss 3.4977 LearningRate 0.0026 Epoch: 35 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:07:57,608-Speed 12895.10 samples/sec Loss 3.5903 LearningRate 0.0026 Epoch: 35 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:07:59,224-Speed 12682.56 samples/sec Loss 3.5088 LearningRate 0.0026 Epoch: 35 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:08:00,768-Speed 13273.74 samples/sec Loss 3.5684 LearningRate 0.0026 Epoch: 35 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:08:02,336-Speed 13075.86 samples/sec Loss 3.5609 LearningRate 0.0026 Epoch: 35 Global Step: 90180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:03,906-Speed 13058.29 samples/sec Loss 3.6178 LearningRate 0.0026 Epoch: 35 Global Step: 90190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:05,501-Speed 12850.42 samples/sec Loss 3.5654 LearningRate 0.0026 Epoch: 35 Global Step: 90200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:07,064-Speed 13112.06 samples/sec Loss 3.5900 LearningRate 0.0026 Epoch: 35 Global Step: 90210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:08,711-Speed 12451.35 samples/sec Loss 3.6272 LearningRate 0.0026 Epoch: 35 Global Step: 90220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:10,281-Speed 13056.13 samples/sec Loss 3.5916 LearningRate 0.0026 Epoch: 35 Global Step: 90230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:11,852-Speed 13042.99 samples/sec Loss 3.6095 LearningRate 0.0026 Epoch: 35 Global Step: 90240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:13,458-Speed 12770.55 samples/sec Loss 3.6260 LearningRate 0.0026 Epoch: 35 Global Step: 90250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:15,055-Speed 12829.64 samples/sec Loss 3.5651 LearningRate 0.0026 Epoch: 35 Global Step: 90260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:16,662-Speed 12757.57 samples/sec Loss 3.5429 LearningRate 0.0026 Epoch: 35 Global Step: 90270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:18,222-Speed 13145.80 samples/sec Loss 3.5747 LearningRate 0.0026 Epoch: 35 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:08:19,780-Speed 13151.73 samples/sec Loss 3.6176 LearningRate 0.0026 Epoch: 35 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:08:21,346-Speed 13087.95 samples/sec Loss 3.6473 LearningRate 0.0026 Epoch: 35 Global Step: 90300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:22,908-Speed 13124.68 samples/sec Loss 3.5763 LearningRate 0.0025 Epoch: 35 Global Step: 90310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:24,533-Speed 12613.07 samples/sec Loss 3.6582 LearningRate 0.0025 Epoch: 35 Global Step: 90320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:26,086-Speed 13205.08 samples/sec Loss 3.6202 LearningRate 0.0025 Epoch: 35 Global Step: 90330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:27,674-Speed 12910.61 samples/sec Loss 3.5265 LearningRate 0.0025 Epoch: 35 Global Step: 90340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:29,269-Speed 12853.60 samples/sec Loss 3.5640 LearningRate 0.0025 Epoch: 35 Global Step: 90350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:30,861-Speed 12867.62 samples/sec Loss 3.5639 LearningRate 0.0025 Epoch: 35 Global Step: 90360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:32,508-Speed 12445.66 samples/sec Loss 3.5745 LearningRate 0.0025 Epoch: 35 Global Step: 90370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:34,176-Speed 13132.85 samples/sec Loss 3.5558 LearningRate 0.0025 Epoch: 35 Global Step: 90380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:37,608-Speed 13101.85 samples/sec Loss 3.5975 LearningRate 0.0025 Epoch: 35 Global Step: 90390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:39,189-Speed 12980.82 samples/sec Loss 3.5741 LearningRate 0.0025 Epoch: 35 Global Step: 90400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:40,746-Speed 13158.67 samples/sec Loss 3.5586 LearningRate 0.0025 Epoch: 35 Global Step: 90410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:42,376-Speed 12576.85 samples/sec Loss 3.6214 LearningRate 0.0025 Epoch: 35 Global Step: 90420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:43,947-Speed 13099.92 samples/sec Loss 3.6413 LearningRate 0.0025 Epoch: 35 Global Step: 90430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:45,536-Speed 12893.90 samples/sec Loss 3.6494 LearningRate 0.0025 Epoch: 35 Global Step: 90440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:47,098-Speed 13128.77 samples/sec Loss 3.5568 LearningRate 0.0025 Epoch: 35 Global Step: 90450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:48,673-Speed 13017.30 samples/sec Loss 3.5280 LearningRate 0.0025 Epoch: 35 Global Step: 90460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:50,281-Speed 12738.73 samples/sec Loss 3.5324 LearningRate 0.0025 Epoch: 35 Global Step: 90470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:51,839-Speed 13166.62 samples/sec Loss 3.6083 LearningRate 0.0025 Epoch: 35 Global Step: 90480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:53,410-Speed 13046.18 samples/sec Loss 3.5242 LearningRate 0.0025 Epoch: 35 Global Step: 90490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:08:55,006-Speed 12838.35 samples/sec Loss 3.6043 LearningRate 0.0025 Epoch: 35 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:08:56,584-Speed 13000.45 samples/sec Loss 3.6121 LearningRate 0.0025 Epoch: 35 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:08:58,156-Speed 13035.50 samples/sec Loss 3.6496 LearningRate 0.0025 Epoch: 35 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:08:59,712-Speed 13167.66 samples/sec Loss 3.5761 LearningRate 0.0024 Epoch: 35 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:01,281-Speed 13067.40 samples/sec Loss 3.6341 LearningRate 0.0024 Epoch: 35 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:02,876-Speed 12854.17 samples/sec Loss 3.5207 LearningRate 0.0024 Epoch: 35 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:04,446-Speed 13054.24 samples/sec Loss 3.5528 LearningRate 0.0024 Epoch: 35 Global Step: 90560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:06,037-Speed 12881.43 samples/sec Loss 3.6203 LearningRate 0.0024 Epoch: 35 Global Step: 90570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:07,632-Speed 12852.58 samples/sec Loss 3.6084 LearningRate 0.0024 Epoch: 35 Global Step: 90580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:09,198-Speed 13088.66 samples/sec Loss 3.6476 LearningRate 0.0024 Epoch: 35 Global Step: 90590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:10,759-Speed 13126.99 samples/sec Loss 3.6090 LearningRate 0.0024 Epoch: 35 Global Step: 90600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:12,343-Speed 12943.14 samples/sec Loss 3.5623 LearningRate 0.0024 Epoch: 35 Global Step: 90610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:13,945-Speed 12801.11 samples/sec Loss 3.6119 LearningRate 0.0024 Epoch: 35 Global Step: 90620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:15,527-Speed 12957.80 samples/sec Loss 3.6155 LearningRate 0.0024 Epoch: 35 Global Step: 90630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:17,119-Speed 12873.56 samples/sec Loss 3.5614 LearningRate 0.0024 Epoch: 35 Global Step: 90640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:18,685-Speed 13086.45 samples/sec Loss 3.5711 LearningRate 0.0024 Epoch: 35 Global Step: 90650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:20,237-Speed 13209.23 samples/sec Loss 3.5918 LearningRate 0.0024 Epoch: 35 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:21,794-Speed 13159.13 samples/sec Loss 3.6220 LearningRate 0.0024 Epoch: 35 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:23,424-Speed 12572.81 samples/sec Loss 3.6359 LearningRate 0.0024 Epoch: 35 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:24,981-Speed 13160.62 samples/sec Loss 3.5339 LearningRate 0.0024 Epoch: 35 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:26,587-Speed 12766.05 samples/sec Loss 3.5599 LearningRate 0.0024 Epoch: 35 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:28,147-Speed 13157.57 samples/sec Loss 3.5747 LearningRate 0.0024 Epoch: 35 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:29,687-Speed 13304.60 samples/sec Loss 3.6742 LearningRate 0.0024 Epoch: 35 Global Step: 90720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:31,270-Speed 12957.43 samples/sec Loss 3.5396 LearningRate 0.0024 Epoch: 35 Global Step: 90730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:32,837-Speed 13083.97 samples/sec Loss 3.5353 LearningRate 0.0024 Epoch: 35 Global Step: 90740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:34,459-Speed 12639.84 samples/sec Loss 3.6233 LearningRate 0.0023 Epoch: 35 Global Step: 90750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:36,015-Speed 13173.39 samples/sec Loss 3.6369 LearningRate 0.0023 Epoch: 35 Global Step: 90760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:37,584-Speed 13061.40 samples/sec Loss 3.5826 LearningRate 0.0023 Epoch: 35 Global Step: 90770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:39,176-Speed 12879.04 samples/sec Loss 3.5396 LearningRate 0.0023 Epoch: 35 Global Step: 90780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:40,772-Speed 12842.52 samples/sec Loss 3.5750 LearningRate 0.0023 Epoch: 35 Global Step: 90790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:42,371-Speed 12821.38 samples/sec Loss 3.6095 LearningRate 0.0023 Epoch: 35 Global Step: 90800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:43,934-Speed 13109.53 samples/sec Loss 3.5741 LearningRate 0.0023 Epoch: 35 Global Step: 90810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:45,559-Speed 12610.21 samples/sec Loss 3.6155 LearningRate 0.0023 Epoch: 35 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:47,132-Speed 13032.09 samples/sec Loss 3.5919 LearningRate 0.0023 Epoch: 35 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:48,702-Speed 13089.25 samples/sec Loss 3.5999 LearningRate 0.0023 Epoch: 35 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:50,311-Speed 12737.16 samples/sec Loss 3.6714 LearningRate 0.0023 Epoch: 35 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:51,864-Speed 13202.30 samples/sec Loss 3.6045 LearningRate 0.0023 Epoch: 35 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:53,432-Speed 13065.19 samples/sec Loss 3.5581 LearningRate 0.0023 Epoch: 35 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:55,060-Speed 12587.72 samples/sec Loss 3.5824 LearningRate 0.0023 Epoch: 35 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 18:09:56,604-Speed 13284.20 samples/sec Loss 3.6010 LearningRate 0.0023 Epoch: 35 Global Step: 90890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:58,201-Speed 12827.24 samples/sec Loss 3.5927 LearningRate 0.0023 Epoch: 35 Global Step: 90900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:09:59,784-Speed 12948.29 samples/sec Loss 3.6067 LearningRate 0.0023 Epoch: 35 Global Step: 90910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:10:01,414-Speed 12573.60 samples/sec Loss 3.5674 LearningRate 0.0023 Epoch: 35 Global Step: 90920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:10:02,986-Speed 13037.27 samples/sec Loss 3.5770 LearningRate 0.0023 Epoch: 35 Global Step: 90930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:10:04,604-Speed 12671.44 samples/sec Loss 3.6203 LearningRate 0.0023 Epoch: 35 Global Step: 90940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:10:06,177-Speed 13029.00 samples/sec Loss 3.5740 LearningRate 0.0023 Epoch: 35 Global Step: 90950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:10:07,768-Speed 12882.74 samples/sec Loss 3.6458 LearningRate 0.0023 Epoch: 35 Global Step: 90960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 18:10:09,329-Speed 13152.86 samples/sec Loss 3.6117 LearningRate 0.0022 Epoch: 35 Global Step: 90970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 18:10:10,902-Speed 13032.07 samples/sec Loss 3.5782 LearningRate 0.0022 Epoch: 35 Global Step: 90980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:12,536-Speed 12541.36 samples/sec Loss 3.6117 LearningRate 0.0022 Epoch: 35 Global Step: 90990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:14,144-Speed 12750.78 samples/sec Loss 3.4969 LearningRate 0.0022 Epoch: 35 Global Step: 91000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:15,713-Speed 13053.01 samples/sec Loss 3.6279 LearningRate 0.0022 Epoch: 35 Global Step: 91010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:17,300-Speed 12913.62 samples/sec Loss 3.6037 LearningRate 0.0022 Epoch: 35 Global Step: 91020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:19,063-Speed 11636.43 samples/sec Loss 3.5758 LearningRate 0.0022 Epoch: 35 Global Step: 91030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:20,537-Speed 13911.45 samples/sec Loss 3.6080 LearningRate 0.0022 Epoch: 35 Global Step: 91040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:34,794-Speed 1436.61 samples/sec Loss 3.4878 LearningRate 0.0022 Epoch: 36 Global Step: 91050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:36,374-Speed 12972.08 samples/sec Loss 3.3620 LearningRate 0.0022 Epoch: 36 Global Step: 91060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:37,981-Speed 12754.01 samples/sec Loss 3.4865 LearningRate 0.0022 Epoch: 36 Global Step: 91070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:39,574-Speed 12865.64 samples/sec Loss 3.4495 LearningRate 0.0022 Epoch: 36 Global Step: 91080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:41,149-Speed 13006.47 samples/sec Loss 3.4544 LearningRate 0.0022 Epoch: 36 Global Step: 91090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:42,742-Speed 12877.19 samples/sec Loss 3.4861 LearningRate 0.0022 Epoch: 36 Global Step: 91100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:44,322-Speed 12968.01 samples/sec Loss 3.4692 LearningRate 0.0022 Epoch: 36 Global Step: 91110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:45,890-Speed 13075.69 samples/sec Loss 3.4509 LearningRate 0.0022 Epoch: 36 Global Step: 91120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:47,460-Speed 13055.64 samples/sec Loss 3.4889 LearningRate 0.0022 Epoch: 36 Global Step: 91130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:49,042-Speed 12947.72 samples/sec Loss 3.4683 LearningRate 0.0022 Epoch: 36 Global Step: 91140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:50,615-Speed 13023.88 samples/sec Loss 3.5038 LearningRate 0.0022 Epoch: 36 Global Step: 91150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:52,193-Speed 12991.22 samples/sec Loss 3.4361 LearningRate 0.0022 Epoch: 36 Global Step: 91160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:10:53,765-Speed 13035.06 samples/sec Loss 3.4357 LearningRate 0.0022 Epoch: 36 Global Step: 91170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:10:55,353-Speed 12902.65 samples/sec Loss 3.5292 LearningRate 0.0022 Epoch: 36 Global Step: 91180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:10:56,898-Speed 13270.97 samples/sec Loss 3.4578 LearningRate 0.0022 Epoch: 36 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:10:58,431-Speed 13371.17 samples/sec Loss 3.4969 LearningRate 0.0021 Epoch: 36 Global Step: 91200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:10:59,992-Speed 13125.21 samples/sec Loss 3.4694 LearningRate 0.0021 Epoch: 36 Global Step: 91210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:01,560-Speed 13074.95 samples/sec Loss 3.4981 LearningRate 0.0021 Epoch: 36 Global Step: 91220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:03,114-Speed 13194.81 samples/sec Loss 3.4477 LearningRate 0.0021 Epoch: 36 Global Step: 91230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:04,681-Speed 13075.15 samples/sec Loss 3.4704 LearningRate 0.0021 Epoch: 36 Global Step: 91240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:06,259-Speed 13009.52 samples/sec Loss 3.4538 LearningRate 0.0021 Epoch: 36 Global Step: 91250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:07,848-Speed 12900.71 samples/sec Loss 3.4534 LearningRate 0.0021 Epoch: 36 Global Step: 91260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:09,405-Speed 13160.57 samples/sec Loss 3.4777 LearningRate 0.0021 Epoch: 36 Global Step: 91270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:10,972-Speed 13082.68 samples/sec Loss 3.4900 LearningRate 0.0021 Epoch: 36 Global Step: 91280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:12,546-Speed 13039.71 samples/sec Loss 3.4412 LearningRate 0.0021 Epoch: 36 Global Step: 91290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:14,113-Speed 13072.33 samples/sec Loss 3.4285 LearningRate 0.0021 Epoch: 36 Global Step: 91300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:15,696-Speed 12949.18 samples/sec Loss 3.4309 LearningRate 0.0021 Epoch: 36 Global Step: 91310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:17,262-Speed 13083.90 samples/sec Loss 3.4671 LearningRate 0.0021 Epoch: 36 Global Step: 91320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:18,855-Speed 12865.59 samples/sec Loss 3.4966 LearningRate 0.0021 Epoch: 36 Global Step: 91330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:20,425-Speed 13056.64 samples/sec Loss 3.3799 LearningRate 0.0021 Epoch: 36 Global Step: 91340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:21,981-Speed 13172.59 samples/sec Loss 3.5057 LearningRate 0.0021 Epoch: 36 Global Step: 91350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:23,549-Speed 13095.61 samples/sec Loss 3.4663 LearningRate 0.0021 Epoch: 36 Global Step: 91360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:25,128-Speed 12982.95 samples/sec Loss 3.4283 LearningRate 0.0021 Epoch: 36 Global Step: 91370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:26,675-Speed 13271.73 samples/sec Loss 3.4084 LearningRate 0.0021 Epoch: 36 Global Step: 91380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:28,238-Speed 13108.10 samples/sec Loss 3.4797 LearningRate 0.0021 Epoch: 36 Global Step: 91390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:29,820-Speed 12956.63 samples/sec Loss 3.4477 LearningRate 0.0021 Epoch: 36 Global Step: 91400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:31,369-Speed 13262.34 samples/sec Loss 3.4528 LearningRate 0.0021 Epoch: 36 Global Step: 91410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:32,957-Speed 12905.00 samples/sec Loss 3.5003 LearningRate 0.0021 Epoch: 36 Global Step: 91420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:34,569-Speed 12711.72 samples/sec Loss 3.4252 LearningRate 0.0021 Epoch: 36 Global Step: 91430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:36,137-Speed 13070.67 samples/sec Loss 3.4793 LearningRate 0.0020 Epoch: 36 Global Step: 91440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:37,705-Speed 13076.08 samples/sec Loss 3.4300 LearningRate 0.0020 Epoch: 36 Global Step: 91450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:39,314-Speed 12734.84 samples/sec Loss 3.4474 LearningRate 0.0020 Epoch: 36 Global Step: 91460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:11:40,890-Speed 13005.12 samples/sec Loss 3.4759 LearningRate 0.0020 Epoch: 36 Global Step: 91470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:42,469-Speed 12987.23 samples/sec Loss 3.4361 LearningRate 0.0020 Epoch: 36 Global Step: 91480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:44,041-Speed 13036.83 samples/sec Loss 3.4951 LearningRate 0.0020 Epoch: 36 Global Step: 91490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:45,614-Speed 13030.35 samples/sec Loss 3.4063 LearningRate 0.0020 Epoch: 36 Global Step: 91500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:47,175-Speed 13129.34 samples/sec Loss 3.4220 LearningRate 0.0020 Epoch: 36 Global Step: 91510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:48,733-Speed 13171.89 samples/sec Loss 3.4075 LearningRate 0.0020 Epoch: 36 Global Step: 91520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:50,295-Speed 13118.52 samples/sec Loss 3.4504 LearningRate 0.0020 Epoch: 36 Global Step: 91530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:51,865-Speed 13050.57 samples/sec Loss 3.5492 LearningRate 0.0020 Epoch: 36 Global Step: 91540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:53,444-Speed 13003.63 samples/sec Loss 3.4617 LearningRate 0.0020 Epoch: 36 Global Step: 91550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:55,007-Speed 13108.30 samples/sec Loss 3.4205 LearningRate 0.0020 Epoch: 36 Global Step: 91560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:11:56,600-Speed 12869.56 samples/sec Loss 3.3949 LearningRate 0.0020 Epoch: 36 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:11:58,162-Speed 13114.27 samples/sec Loss 3.4266 LearningRate 0.0020 Epoch: 36 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:11:59,740-Speed 12986.98 samples/sec Loss 3.4176 LearningRate 0.0020 Epoch: 36 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:01,314-Speed 13052.54 samples/sec Loss 3.4906 LearningRate 0.0020 Epoch: 36 Global Step: 91600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:02,901-Speed 12911.68 samples/sec Loss 3.3873 LearningRate 0.0020 Epoch: 36 Global Step: 91610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:04,497-Speed 12836.00 samples/sec Loss 3.4866 LearningRate 0.0020 Epoch: 36 Global Step: 91620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:06,048-Speed 13218.07 samples/sec Loss 3.4294 LearningRate 0.0020 Epoch: 36 Global Step: 91630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:07,615-Speed 13074.58 samples/sec Loss 3.3788 LearningRate 0.0020 Epoch: 36 Global Step: 91640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:09,198-Speed 12949.62 samples/sec Loss 3.4446 LearningRate 0.0020 Epoch: 36 Global Step: 91650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:10,776-Speed 12990.54 samples/sec Loss 3.4789 LearningRate 0.0020 Epoch: 36 Global Step: 91660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:12,362-Speed 12922.21 samples/sec Loss 3.4204 LearningRate 0.0020 Epoch: 36 Global Step: 91670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:13,950-Speed 12910.54 samples/sec Loss 3.4496 LearningRate 0.0019 Epoch: 36 Global Step: 91680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:15,508-Speed 13144.71 samples/sec Loss 3.4462 LearningRate 0.0019 Epoch: 36 Global Step: 91690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:17,071-Speed 13135.73 samples/sec Loss 3.3578 LearningRate 0.0019 Epoch: 36 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:18,666-Speed 12847.93 samples/sec Loss 3.5361 LearningRate 0.0019 Epoch: 36 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:20,247-Speed 12967.71 samples/sec Loss 3.4665 LearningRate 0.0019 Epoch: 36 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:21,830-Speed 12946.66 samples/sec Loss 3.4651 LearningRate 0.0019 Epoch: 36 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:23,451-Speed 12642.60 samples/sec Loss 3.4736 LearningRate 0.0019 Epoch: 36 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:25,020-Speed 13061.41 samples/sec Loss 3.4728 LearningRate 0.0019 Epoch: 36 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:26,632-Speed 12709.20 samples/sec Loss 3.3616 LearningRate 0.0019 Epoch: 36 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:28,195-Speed 13117.01 samples/sec Loss 3.4777 LearningRate 0.0019 Epoch: 36 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:29,764-Speed 13062.88 samples/sec Loss 3.5341 LearningRate 0.0019 Epoch: 36 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:31,323-Speed 13146.93 samples/sec Loss 3.4204 LearningRate 0.0019 Epoch: 36 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:32,905-Speed 12957.22 samples/sec Loss 3.5582 LearningRate 0.0019 Epoch: 36 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:34,471-Speed 13092.68 samples/sec Loss 3.4077 LearningRate 0.0019 Epoch: 36 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:12:36,018-Speed 13251.45 samples/sec Loss 3.4510 LearningRate 0.0019 Epoch: 36 Global Step: 91820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:37,649-Speed 12566.42 samples/sec Loss 3.4957 LearningRate 0.0019 Epoch: 36 Global Step: 91830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:39,218-Speed 13059.26 samples/sec Loss 3.5694 LearningRate 0.0019 Epoch: 36 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:40,787-Speed 13064.26 samples/sec Loss 3.4921 LearningRate 0.0019 Epoch: 36 Global Step: 91850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:42,381-Speed 12864.85 samples/sec Loss 3.4551 LearningRate 0.0019 Epoch: 36 Global Step: 91860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:43,952-Speed 13054.34 samples/sec Loss 3.4421 LearningRate 0.0019 Epoch: 36 Global Step: 91870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:45,534-Speed 12948.17 samples/sec Loss 3.4622 LearningRate 0.0019 Epoch: 36 Global Step: 91880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:47,098-Speed 13113.14 samples/sec Loss 3.5220 LearningRate 0.0019 Epoch: 36 Global Step: 91890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:48,663-Speed 13098.12 samples/sec Loss 3.5432 LearningRate 0.0019 Epoch: 36 Global Step: 91900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:50,231-Speed 13060.67 samples/sec Loss 3.4375 LearningRate 0.0019 Epoch: 36 Global Step: 91910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:51,815-Speed 12945.56 samples/sec Loss 3.4647 LearningRate 0.0018 Epoch: 36 Global Step: 91920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:53,390-Speed 13023.53 samples/sec Loss 3.5160 LearningRate 0.0018 Epoch: 36 Global Step: 91930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:54,991-Speed 12799.45 samples/sec Loss 3.5463 LearningRate 0.0018 Epoch: 36 Global Step: 91940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:12:56,557-Speed 13094.47 samples/sec Loss 3.5164 LearningRate 0.0018 Epoch: 36 Global Step: 91950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:58,120-Speed 13110.38 samples/sec Loss 3.4480 LearningRate 0.0018 Epoch: 36 Global Step: 91960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:12:59,717-Speed 12828.41 samples/sec Loss 3.4298 LearningRate 0.0018 Epoch: 36 Global Step: 91970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:01,284-Speed 13087.06 samples/sec Loss 3.4059 LearningRate 0.0018 Epoch: 36 Global Step: 91980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:02,842-Speed 13151.69 samples/sec Loss 3.4720 LearningRate 0.0018 Epoch: 36 Global Step: 91990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:04,451-Speed 12731.13 samples/sec Loss 3.5003 LearningRate 0.0018 Epoch: 36 Global Step: 92000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:06,013-Speed 13121.11 samples/sec Loss 3.4858 LearningRate 0.0018 Epoch: 36 Global Step: 92010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:07,611-Speed 12826.95 samples/sec Loss 3.4634 LearningRate 0.0018 Epoch: 36 Global Step: 92020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:09,194-Speed 12955.05 samples/sec Loss 3.4587 LearningRate 0.0018 Epoch: 36 Global Step: 92030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:10,772-Speed 12989.72 samples/sec Loss 3.4515 LearningRate 0.0018 Epoch: 36 Global Step: 92040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:12,341-Speed 13061.50 samples/sec Loss 3.4653 LearningRate 0.0018 Epoch: 36 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:13:13,916-Speed 13012.99 samples/sec Loss 3.5462 LearningRate 0.0018 Epoch: 36 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:13:15,471-Speed 13182.71 samples/sec Loss 3.4531 LearningRate 0.0018 Epoch: 36 Global Step: 92070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:17,058-Speed 12907.35 samples/sec Loss 3.5062 LearningRate 0.0018 Epoch: 36 Global Step: 92080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:18,621-Speed 13116.20 samples/sec Loss 3.4298 LearningRate 0.0018 Epoch: 36 Global Step: 92090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:20,170-Speed 13228.16 samples/sec Loss 3.5317 LearningRate 0.0018 Epoch: 36 Global Step: 92100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:21,764-Speed 12856.47 samples/sec Loss 3.4526 LearningRate 0.0018 Epoch: 36 Global Step: 92110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:23,315-Speed 13219.57 samples/sec Loss 3.4439 LearningRate 0.0018 Epoch: 36 Global Step: 92120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:24,867-Speed 13198.85 samples/sec Loss 3.5686 LearningRate 0.0018 Epoch: 36 Global Step: 92130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:26,425-Speed 13183.33 samples/sec Loss 3.5140 LearningRate 0.0018 Epoch: 36 Global Step: 92140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:28,006-Speed 12966.56 samples/sec Loss 3.4444 LearningRate 0.0018 Epoch: 36 Global Step: 92150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:29,549-Speed 13276.41 samples/sec Loss 3.5321 LearningRate 0.0018 Epoch: 36 Global Step: 92160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:31,171-Speed 12640.36 samples/sec Loss 3.4560 LearningRate 0.0018 Epoch: 36 Global Step: 92170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:32,757-Speed 12917.75 samples/sec Loss 3.5858 LearningRate 0.0017 Epoch: 36 Global Step: 92180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:34,351-Speed 12872.40 samples/sec Loss 3.4002 LearningRate 0.0017 Epoch: 36 Global Step: 92190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:35,911-Speed 13145.05 samples/sec Loss 3.5465 LearningRate 0.0017 Epoch: 36 Global Step: 92200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:37,474-Speed 13115.42 samples/sec Loss 3.4816 LearningRate 0.0017 Epoch: 36 Global Step: 92210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:39,062-Speed 12899.59 samples/sec Loss 3.5080 LearningRate 0.0017 Epoch: 36 Global Step: 92220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:40,636-Speed 13023.09 samples/sec Loss 3.4067 LearningRate 0.0017 Epoch: 36 Global Step: 92230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:42,196-Speed 13137.84 samples/sec Loss 3.4588 LearningRate 0.0017 Epoch: 36 Global Step: 92240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:43,763-Speed 13073.83 samples/sec Loss 3.4302 LearningRate 0.0017 Epoch: 36 Global Step: 92250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:45,333-Speed 13059.51 samples/sec Loss 3.5520 LearningRate 0.0017 Epoch: 36 Global Step: 92260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:46,914-Speed 13000.10 samples/sec Loss 3.5119 LearningRate 0.0017 Epoch: 36 Global Step: 92270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:13:48,527-Speed 12709.95 samples/sec Loss 3.4344 LearningRate 0.0017 Epoch: 36 Global Step: 92280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:50,102-Speed 13007.77 samples/sec Loss 3.5060 LearningRate 0.0017 Epoch: 36 Global Step: 92290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:51,676-Speed 13026.38 samples/sec Loss 3.4400 LearningRate 0.0017 Epoch: 36 Global Step: 92300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:53,262-Speed 12918.12 samples/sec Loss 3.4875 LearningRate 0.0017 Epoch: 36 Global Step: 92310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:54,843-Speed 12964.95 samples/sec Loss 3.4891 LearningRate 0.0017 Epoch: 36 Global Step: 92320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:56,398-Speed 13195.19 samples/sec Loss 3.4205 LearningRate 0.0017 Epoch: 36 Global Step: 92330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:57,970-Speed 13031.85 samples/sec Loss 3.5647 LearningRate 0.0017 Epoch: 36 Global Step: 92340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:13:59,534-Speed 13100.36 samples/sec Loss 3.5096 LearningRate 0.0017 Epoch: 36 Global Step: 92350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:01,102-Speed 13073.77 samples/sec Loss 3.4648 LearningRate 0.0017 Epoch: 36 Global Step: 92360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:02,700-Speed 12829.59 samples/sec Loss 3.4812 LearningRate 0.0017 Epoch: 36 Global Step: 92370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:04,255-Speed 13185.69 samples/sec Loss 3.5098 LearningRate 0.0017 Epoch: 36 Global Step: 92380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:05,841-Speed 12921.26 samples/sec Loss 3.4885 LearningRate 0.0017 Epoch: 36 Global Step: 92390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:07,417-Speed 13010.55 samples/sec Loss 3.4917 LearningRate 0.0017 Epoch: 36 Global Step: 92400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:08,985-Speed 13076.68 samples/sec Loss 3.5190 LearningRate 0.0017 Epoch: 36 Global Step: 92410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:10,558-Speed 13029.17 samples/sec Loss 3.4993 LearningRate 0.0017 Epoch: 36 Global Step: 92420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:12,131-Speed 13029.85 samples/sec Loss 3.4946 LearningRate 0.0017 Epoch: 36 Global Step: 92430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:13,679-Speed 13244.08 samples/sec Loss 3.5524 LearningRate 0.0016 Epoch: 36 Global Step: 92440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:15,281-Speed 12795.41 samples/sec Loss 3.4639 LearningRate 0.0016 Epoch: 36 Global Step: 92450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:16,858-Speed 13003.08 samples/sec Loss 3.4869 LearningRate 0.0016 Epoch: 36 Global Step: 92460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:18,436-Speed 12985.51 samples/sec Loss 3.4554 LearningRate 0.0016 Epoch: 36 Global Step: 92470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:20,001-Speed 13099.56 samples/sec Loss 3.5266 LearningRate 0.0016 Epoch: 36 Global Step: 92480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:21,576-Speed 13015.66 samples/sec Loss 3.4873 LearningRate 0.0016 Epoch: 36 Global Step: 92490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:23,157-Speed 12965.98 samples/sec Loss 3.5507 LearningRate 0.0016 Epoch: 36 Global Step: 92500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:24,733-Speed 13003.20 samples/sec Loss 3.5344 LearningRate 0.0016 Epoch: 36 Global Step: 92510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:26,296-Speed 13116.04 samples/sec Loss 3.4366 LearningRate 0.0016 Epoch: 36 Global Step: 92520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:27,870-Speed 13011.32 samples/sec Loss 3.4828 LearningRate 0.0016 Epoch: 36 Global Step: 92530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:29,443-Speed 13027.89 samples/sec Loss 3.4810 LearningRate 0.0016 Epoch: 36 Global Step: 92540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:31,014-Speed 13046.17 samples/sec Loss 3.5542 LearningRate 0.0016 Epoch: 36 Global Step: 92550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:32,597-Speed 12956.86 samples/sec Loss 3.4923 LearningRate 0.0016 Epoch: 36 Global Step: 92560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:14:34,154-Speed 13165.91 samples/sec Loss 3.4721 LearningRate 0.0016 Epoch: 36 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:35,730-Speed 12998.10 samples/sec Loss 3.4933 LearningRate 0.0016 Epoch: 36 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:37,303-Speed 13034.53 samples/sec Loss 3.4604 LearningRate 0.0016 Epoch: 36 Global Step: 92590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:38,853-Speed 13222.45 samples/sec Loss 3.4439 LearningRate 0.0016 Epoch: 36 Global Step: 92600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:40,432-Speed 12974.57 samples/sec Loss 3.5228 LearningRate 0.0016 Epoch: 36 Global Step: 92610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:42,018-Speed 12919.92 samples/sec Loss 3.5145 LearningRate 0.0016 Epoch: 36 Global Step: 92620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:43,590-Speed 13043.52 samples/sec Loss 3.5252 LearningRate 0.0016 Epoch: 36 Global Step: 92630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:45,148-Speed 13155.69 samples/sec Loss 3.5177 LearningRate 0.0016 Epoch: 36 Global Step: 92640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:46,707-Speed 13144.83 samples/sec Loss 3.4818 LearningRate 0.0016 Epoch: 36 Global Step: 92650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:48,277-Speed 13056.45 samples/sec Loss 3.4853 LearningRate 0.0016 Epoch: 36 Global Step: 92660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:49,828-Speed 13209.23 samples/sec Loss 3.4762 LearningRate 0.0016 Epoch: 36 Global Step: 92670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:51,417-Speed 12902.83 samples/sec Loss 3.4775 LearningRate 0.0016 Epoch: 36 Global Step: 92680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:52,989-Speed 13037.94 samples/sec Loss 3.5414 LearningRate 0.0016 Epoch: 36 Global Step: 92690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:54,564-Speed 13004.63 samples/sec Loss 3.5011 LearningRate 0.0015 Epoch: 36 Global Step: 92700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:56,146-Speed 12961.01 samples/sec Loss 3.5091 LearningRate 0.0015 Epoch: 36 Global Step: 92710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:57,727-Speed 12961.71 samples/sec Loss 3.6184 LearningRate 0.0015 Epoch: 36 Global Step: 92720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:14:59,294-Speed 13084.68 samples/sec Loss 3.5424 LearningRate 0.0015 Epoch: 36 Global Step: 92730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:00,863-Speed 13059.93 samples/sec Loss 3.4447 LearningRate 0.0015 Epoch: 36 Global Step: 92740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:02,453-Speed 12897.72 samples/sec Loss 3.4827 LearningRate 0.0015 Epoch: 36 Global Step: 92750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:04,010-Speed 13164.66 samples/sec Loss 3.5060 LearningRate 0.0015 Epoch: 36 Global Step: 92760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:05,552-Speed 13284.66 samples/sec Loss 3.5964 LearningRate 0.0015 Epoch: 36 Global Step: 92770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:07,131-Speed 12975.59 samples/sec Loss 3.4289 LearningRate 0.0015 Epoch: 36 Global Step: 92780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:08,719-Speed 12906.93 samples/sec Loss 3.4961 LearningRate 0.0015 Epoch: 36 Global Step: 92790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:10,309-Speed 12888.44 samples/sec Loss 3.4791 LearningRate 0.0015 Epoch: 36 Global Step: 92800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:11,918-Speed 12750.96 samples/sec Loss 3.5075 LearningRate 0.0015 Epoch: 36 Global Step: 92810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:13,484-Speed 13115.27 samples/sec Loss 3.5410 LearningRate 0.0015 Epoch: 36 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:15,048-Speed 13103.54 samples/sec Loss 3.4697 LearningRate 0.0015 Epoch: 36 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:16,610-Speed 13126.45 samples/sec Loss 3.5212 LearningRate 0.0015 Epoch: 36 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:18,184-Speed 13014.67 samples/sec Loss 3.4615 LearningRate 0.0015 Epoch: 36 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:19,729-Speed 13273.03 samples/sec Loss 3.4392 LearningRate 0.0015 Epoch: 36 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:21,298-Speed 13071.31 samples/sec Loss 3.4912 LearningRate 0.0015 Epoch: 36 Global Step: 92870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:15:22,896-Speed 12822.00 samples/sec Loss 3.5328 LearningRate 0.0015 Epoch: 36 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:15:24,458-Speed 13125.47 samples/sec Loss 3.5226 LearningRate 0.0015 Epoch: 36 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:15:26,015-Speed 13161.56 samples/sec Loss 3.5658 LearningRate 0.0015 Epoch: 36 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:15:27,629-Speed 12698.32 samples/sec Loss 3.5032 LearningRate 0.0015 Epoch: 36 Global Step: 92910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:15:29,158-Speed 13405.06 samples/sec Loss 3.5108 LearningRate 0.0015 Epoch: 36 Global Step: 92920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:30,758-Speed 12982.55 samples/sec Loss 3.5704 LearningRate 0.0015 Epoch: 36 Global Step: 92930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:32,313-Speed 13180.60 samples/sec Loss 3.5145 LearningRate 0.0015 Epoch: 36 Global Step: 92940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:33,891-Speed 12988.50 samples/sec Loss 3.4966 LearningRate 0.0015 Epoch: 36 Global Step: 92950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:35,454-Speed 13116.76 samples/sec Loss 3.4268 LearningRate 0.0015 Epoch: 36 Global Step: 92960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:37,014-Speed 13178.33 samples/sec Loss 3.4929 LearningRate 0.0015 Epoch: 36 Global Step: 92970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:38,566-Speed 13209.46 samples/sec Loss 3.4430 LearningRate 0.0014 Epoch: 36 Global Step: 92980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:40,166-Speed 12810.76 samples/sec Loss 3.5637 LearningRate 0.0014 Epoch: 36 Global Step: 92990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:41,732-Speed 13083.31 samples/sec Loss 3.5232 LearningRate 0.0014 Epoch: 36 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:43,332-Speed 12809.91 samples/sec Loss 3.5111 LearningRate 0.0014 Epoch: 36 Global Step: 93010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:44,925-Speed 12872.70 samples/sec Loss 3.4698 LearningRate 0.0014 Epoch: 36 Global Step: 93020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:15:46,481-Speed 13192.90 samples/sec Loss 3.5405 LearningRate 0.0014 Epoch: 36 Global Step: 93030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:48,053-Speed 13027.66 samples/sec Loss 3.6498 LearningRate 0.0014 Epoch: 36 Global Step: 93040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:49,617-Speed 13109.18 samples/sec Loss 3.4964 LearningRate 0.0014 Epoch: 36 Global Step: 93050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:51,229-Speed 12709.01 samples/sec Loss 3.5651 LearningRate 0.0014 Epoch: 36 Global Step: 93060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:52,824-Speed 12851.16 samples/sec Loss 3.5110 LearningRate 0.0014 Epoch: 36 Global Step: 93070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:54,388-Speed 13128.98 samples/sec Loss 3.5306 LearningRate 0.0014 Epoch: 36 Global Step: 93080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:55,938-Speed 13213.14 samples/sec Loss 3.5185 LearningRate 0.0014 Epoch: 36 Global Step: 93090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:57,517-Speed 12980.61 samples/sec Loss 3.5516 LearningRate 0.0014 Epoch: 36 Global Step: 93100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:15:59,081-Speed 13105.93 samples/sec Loss 3.5086 LearningRate 0.0014 Epoch: 36 Global Step: 93110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:00,677-Speed 12843.26 samples/sec Loss 3.4977 LearningRate 0.0014 Epoch: 36 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:02,233-Speed 13170.90 samples/sec Loss 3.5104 LearningRate 0.0014 Epoch: 36 Global Step: 93130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:16:03,836-Speed 12782.71 samples/sec Loss 3.4636 LearningRate 0.0014 Epoch: 36 Global Step: 93140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:16:05,415-Speed 12978.89 samples/sec Loss 3.5189 LearningRate 0.0014 Epoch: 36 Global Step: 93150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:16:06,976-Speed 13131.28 samples/sec Loss 3.5111 LearningRate 0.0014 Epoch: 36 Global Step: 93160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:16:08,577-Speed 12794.49 samples/sec Loss 3.5646 LearningRate 0.0014 Epoch: 36 Global Step: 93170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:16:10,116-Speed 13317.13 samples/sec Loss 3.5132 LearningRate 0.0014 Epoch: 36 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:11,670-Speed 13203.33 samples/sec Loss 3.5130 LearningRate 0.0014 Epoch: 36 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:13,261-Speed 12879.17 samples/sec Loss 3.5400 LearningRate 0.0014 Epoch: 36 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:14,822-Speed 13124.92 samples/sec Loss 3.4798 LearningRate 0.0014 Epoch: 36 Global Step: 93210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:16,416-Speed 12861.43 samples/sec Loss 3.4720 LearningRate 0.0014 Epoch: 36 Global Step: 93220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:17,967-Speed 13211.05 samples/sec Loss 3.5559 LearningRate 0.0014 Epoch: 36 Global Step: 93230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:19,578-Speed 12725.18 samples/sec Loss 3.5382 LearningRate 0.0014 Epoch: 36 Global Step: 93240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:21,136-Speed 13157.78 samples/sec Loss 3.4828 LearningRate 0.0014 Epoch: 36 Global Step: 93250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:22,720-Speed 12938.60 samples/sec Loss 3.5689 LearningRate 0.0014 Epoch: 36 Global Step: 93260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:24,343-Speed 12637.70 samples/sec Loss 3.4342 LearningRate 0.0013 Epoch: 36 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:25,900-Speed 13156.56 samples/sec Loss 3.4194 LearningRate 0.0013 Epoch: 36 Global Step: 93280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:16:27,447-Speed 13249.03 samples/sec Loss 3.4984 LearningRate 0.0013 Epoch: 36 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:29,040-Speed 12872.76 samples/sec Loss 3.4665 LearningRate 0.0013 Epoch: 36 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:30,612-Speed 13041.04 samples/sec Loss 3.5028 LearningRate 0.0013 Epoch: 36 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:32,175-Speed 13108.34 samples/sec Loss 3.4683 LearningRate 0.0013 Epoch: 36 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:33,813-Speed 12516.78 samples/sec Loss 3.4845 LearningRate 0.0013 Epoch: 36 Global Step: 93330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:35,400-Speed 12920.83 samples/sec Loss 3.5057 LearningRate 0.0013 Epoch: 36 Global Step: 93340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:36,969-Speed 13062.07 samples/sec Loss 3.3840 LearningRate 0.0013 Epoch: 36 Global Step: 93350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:38,534-Speed 13100.64 samples/sec Loss 3.5036 LearningRate 0.0013 Epoch: 36 Global Step: 93360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:40,138-Speed 12778.85 samples/sec Loss 3.5005 LearningRate 0.0013 Epoch: 36 Global Step: 93370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:41,720-Speed 12953.33 samples/sec Loss 3.6073 LearningRate 0.0013 Epoch: 36 Global Step: 93380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:16:43,277-Speed 13160.90 samples/sec Loss 3.5247 LearningRate 0.0013 Epoch: 36 Global Step: 93390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:44,867-Speed 12892.24 samples/sec Loss 3.5606 LearningRate 0.0013 Epoch: 36 Global Step: 93400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:46,444-Speed 13034.56 samples/sec Loss 3.4845 LearningRate 0.0013 Epoch: 36 Global Step: 93410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:48,020-Speed 12998.40 samples/sec Loss 3.5097 LearningRate 0.0013 Epoch: 36 Global Step: 93420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:49,603-Speed 12950.52 samples/sec Loss 3.5074 LearningRate 0.0013 Epoch: 36 Global Step: 93430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:51,178-Speed 13011.59 samples/sec Loss 3.4716 LearningRate 0.0013 Epoch: 36 Global Step: 93440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:52,742-Speed 13109.71 samples/sec Loss 3.4481 LearningRate 0.0013 Epoch: 36 Global Step: 93450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:54,302-Speed 13134.12 samples/sec Loss 3.4651 LearningRate 0.0013 Epoch: 36 Global Step: 93460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:55,899-Speed 12832.44 samples/sec Loss 3.4947 LearningRate 0.0013 Epoch: 36 Global Step: 93470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:57,467-Speed 13067.54 samples/sec Loss 3.4970 LearningRate 0.0013 Epoch: 36 Global Step: 93480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:16:59,037-Speed 13058.94 samples/sec Loss 3.5359 LearningRate 0.0013 Epoch: 36 Global Step: 93490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:00,687-Speed 12412.53 samples/sec Loss 3.4840 LearningRate 0.0013 Epoch: 36 Global Step: 93500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:02,263-Speed 13006.22 samples/sec Loss 3.5467 LearningRate 0.0013 Epoch: 36 Global Step: 93510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:03,821-Speed 13158.87 samples/sec Loss 3.4570 LearningRate 0.0013 Epoch: 36 Global Step: 93520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:05,390-Speed 13064.90 samples/sec Loss 3.5282 LearningRate 0.0013 Epoch: 36 Global Step: 93530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:06,962-Speed 13040.91 samples/sec Loss 3.5111 LearningRate 0.0013 Epoch: 36 Global Step: 93540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:08,519-Speed 13154.49 samples/sec Loss 3.4586 LearningRate 0.0013 Epoch: 36 Global Step: 93550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:10,208-Speed 12136.18 samples/sec Loss 3.5113 LearningRate 0.0013 Epoch: 36 Global Step: 93560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:11,697-Speed 13771.09 samples/sec Loss 3.5765 LearningRate 0.0012 Epoch: 36 Global Step: 93570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:26,418-Speed 1391.33 samples/sec Loss 3.4952 LearningRate 0.0012 Epoch: 37 Global Step: 93580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:27,991-Speed 13025.35 samples/sec Loss 3.4077 LearningRate 0.0012 Epoch: 37 Global Step: 93590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:29,560-Speed 13062.71 samples/sec Loss 3.4167 LearningRate 0.0012 Epoch: 37 Global Step: 93600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:31,157-Speed 12824.13 samples/sec Loss 3.4091 LearningRate 0.0012 Epoch: 37 Global Step: 93610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:32,751-Speed 12885.06 samples/sec Loss 3.3627 LearningRate 0.0012 Epoch: 37 Global Step: 93620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:34,324-Speed 13026.54 samples/sec Loss 3.4109 LearningRate 0.0012 Epoch: 37 Global Step: 93630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:35,886-Speed 13142.01 samples/sec Loss 3.4208 LearningRate 0.0012 Epoch: 37 Global Step: 93640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:37,464-Speed 13021.14 samples/sec Loss 3.4954 LearningRate 0.0012 Epoch: 37 Global Step: 93650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:39,065-Speed 12798.10 samples/sec Loss 3.4524 LearningRate 0.0012 Epoch: 37 Global Step: 93660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:40,637-Speed 13040.28 samples/sec Loss 3.4188 LearningRate 0.0012 Epoch: 37 Global Step: 93670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:42,198-Speed 13131.94 samples/sec Loss 3.4108 LearningRate 0.0012 Epoch: 37 Global Step: 93680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:43,786-Speed 12902.60 samples/sec Loss 3.4216 LearningRate 0.0012 Epoch: 37 Global Step: 93690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:45,356-Speed 13072.34 samples/sec Loss 3.4448 LearningRate 0.0012 Epoch: 37 Global Step: 93700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:46,917-Speed 13131.66 samples/sec Loss 3.4268 LearningRate 0.0012 Epoch: 37 Global Step: 93710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:48,476-Speed 13170.78 samples/sec Loss 3.4114 LearningRate 0.0012 Epoch: 37 Global Step: 93720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:50,045-Speed 13057.52 samples/sec Loss 3.4432 LearningRate 0.0012 Epoch: 37 Global Step: 93730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:17:51,645-Speed 12815.12 samples/sec Loss 3.4065 LearningRate 0.0012 Epoch: 37 Global Step: 93740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:53,211-Speed 13086.80 samples/sec Loss 3.3741 LearningRate 0.0012 Epoch: 37 Global Step: 93750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:54,786-Speed 13008.31 samples/sec Loss 3.4626 LearningRate 0.0012 Epoch: 37 Global Step: 93760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:56,355-Speed 13056.31 samples/sec Loss 3.4722 LearningRate 0.0012 Epoch: 37 Global Step: 93770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:57,969-Speed 12700.67 samples/sec Loss 3.3781 LearningRate 0.0012 Epoch: 37 Global Step: 93780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:17:59,520-Speed 13214.86 samples/sec Loss 3.4483 LearningRate 0.0012 Epoch: 37 Global Step: 93790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:18:01,158-Speed 12513.29 samples/sec Loss 3.4533 LearningRate 0.0012 Epoch: 37 Global Step: 93800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:18:02,719-Speed 13126.66 samples/sec Loss 3.4457 LearningRate 0.0012 Epoch: 37 Global Step: 93810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:18:04,280-Speed 13127.27 samples/sec Loss 3.3964 LearningRate 0.0012 Epoch: 37 Global Step: 93820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:18:05,855-Speed 13013.40 samples/sec Loss 3.3798 LearningRate 0.0012 Epoch: 37 Global Step: 93830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:18:07,487-Speed 12564.15 samples/sec Loss 3.4839 LearningRate 0.0012 Epoch: 37 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:09,067-Speed 12972.33 samples/sec Loss 3.3760 LearningRate 0.0012 Epoch: 37 Global Step: 93850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:10,638-Speed 13040.65 samples/sec Loss 3.3904 LearningRate 0.0012 Epoch: 37 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:12,376-Speed 11800.45 samples/sec Loss 3.4488 LearningRate 0.0012 Epoch: 37 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:13,916-Speed 13311.30 samples/sec Loss 3.4761 LearningRate 0.0011 Epoch: 37 Global Step: 93880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:15,514-Speed 12820.92 samples/sec Loss 3.4082 LearningRate 0.0011 Epoch: 37 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:17,089-Speed 13024.96 samples/sec Loss 3.4299 LearningRate 0.0011 Epoch: 37 Global Step: 93900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:18,629-Speed 13303.96 samples/sec Loss 3.3958 LearningRate 0.0011 Epoch: 37 Global Step: 93910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:20,224-Speed 12851.07 samples/sec Loss 3.4546 LearningRate 0.0011 Epoch: 37 Global Step: 93920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:21,780-Speed 13171.42 samples/sec Loss 3.4742 LearningRate 0.0011 Epoch: 37 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:23,348-Speed 13100.31 samples/sec Loss 3.3998 LearningRate 0.0011 Epoch: 37 Global Step: 93940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:18:24,926-Speed 12983.56 samples/sec Loss 3.4942 LearningRate 0.0011 Epoch: 37 Global Step: 93950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:18:26,502-Speed 13005.43 samples/sec Loss 3.4059 LearningRate 0.0011 Epoch: 37 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:28,081-Speed 12980.07 samples/sec Loss 3.4355 LearningRate 0.0011 Epoch: 37 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:29,667-Speed 12916.99 samples/sec Loss 3.4579 LearningRate 0.0011 Epoch: 37 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:31,227-Speed 13144.25 samples/sec Loss 3.4166 LearningRate 0.0011 Epoch: 37 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:32,792-Speed 13094.73 samples/sec Loss 3.4952 LearningRate 0.0011 Epoch: 37 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:34,355-Speed 13115.54 samples/sec Loss 3.4623 LearningRate 0.0011 Epoch: 37 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:35,906-Speed 13211.18 samples/sec Loss 3.4333 LearningRate 0.0011 Epoch: 37 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:37,479-Speed 13044.97 samples/sec Loss 3.3703 LearningRate 0.0011 Epoch: 37 Global Step: 94030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:39,076-Speed 12840.01 samples/sec Loss 3.4467 LearningRate 0.0011 Epoch: 37 Global Step: 94040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:40,634-Speed 13156.18 samples/sec Loss 3.3752 LearningRate 0.0011 Epoch: 37 Global Step: 94050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:42,239-Speed 12771.55 samples/sec Loss 3.4355 LearningRate 0.0011 Epoch: 37 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:18:43,811-Speed 13039.60 samples/sec Loss 3.4461 LearningRate 0.0011 Epoch: 37 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:18:45,334-Speed 13446.34 samples/sec Loss 3.4070 LearningRate 0.0011 Epoch: 37 Global Step: 94080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:46,921-Speed 12920.37 samples/sec Loss 3.4123 LearningRate 0.0011 Epoch: 37 Global Step: 94090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:48,491-Speed 13053.21 samples/sec Loss 3.3850 LearningRate 0.0011 Epoch: 37 Global Step: 94100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:50,085-Speed 12858.12 samples/sec Loss 3.3618 LearningRate 0.0011 Epoch: 37 Global Step: 94110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:51,645-Speed 13136.71 samples/sec Loss 3.4300 LearningRate 0.0011 Epoch: 37 Global Step: 94120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:53,228-Speed 12943.02 samples/sec Loss 3.4195 LearningRate 0.0011 Epoch: 37 Global Step: 94130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:54,788-Speed 13140.96 samples/sec Loss 3.3985 LearningRate 0.0011 Epoch: 37 Global Step: 94140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:56,359-Speed 13071.64 samples/sec Loss 3.4048 LearningRate 0.0011 Epoch: 37 Global Step: 94150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:57,977-Speed 12667.85 samples/sec Loss 3.4458 LearningRate 0.0011 Epoch: 37 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:18:59,513-Speed 13342.08 samples/sec Loss 3.4532 LearningRate 0.0011 Epoch: 37 Global Step: 94170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:01,111-Speed 12821.45 samples/sec Loss 3.4386 LearningRate 0.0011 Epoch: 37 Global Step: 94180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:02,691-Speed 12976.64 samples/sec Loss 3.4766 LearningRate 0.0011 Epoch: 37 Global Step: 94190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:04,270-Speed 12975.22 samples/sec Loss 3.4469 LearningRate 0.0010 Epoch: 37 Global Step: 94200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:05,838-Speed 13070.87 samples/sec Loss 3.4094 LearningRate 0.0010 Epoch: 37 Global Step: 94210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:07,412-Speed 13029.77 samples/sec Loss 3.3872 LearningRate 0.0010 Epoch: 37 Global Step: 94220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:08,969-Speed 13160.09 samples/sec Loss 3.4369 LearningRate 0.0010 Epoch: 37 Global Step: 94230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:10,580-Speed 12720.02 samples/sec Loss 3.4702 LearningRate 0.0010 Epoch: 37 Global Step: 94240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:12,134-Speed 13219.56 samples/sec Loss 3.3667 LearningRate 0.0010 Epoch: 37 Global Step: 94250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:13,703-Speed 13061.25 samples/sec Loss 3.4621 LearningRate 0.0010 Epoch: 37 Global Step: 94260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:19:15,295-Speed 12873.68 samples/sec Loss 3.4246 LearningRate 0.0010 Epoch: 37 Global Step: 94270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:16,866-Speed 13047.49 samples/sec Loss 3.4126 LearningRate 0.0010 Epoch: 37 Global Step: 94280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:18,440-Speed 13021.51 samples/sec Loss 3.3649 LearningRate 0.0010 Epoch: 37 Global Step: 94290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:20,000-Speed 13142.35 samples/sec Loss 3.4641 LearningRate 0.0010 Epoch: 37 Global Step: 94300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:21,562-Speed 13120.61 samples/sec Loss 3.3814 LearningRate 0.0010 Epoch: 37 Global Step: 94310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:23,155-Speed 12870.86 samples/sec Loss 3.4399 LearningRate 0.0010 Epoch: 37 Global Step: 94320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:24,734-Speed 12981.56 samples/sec Loss 3.4611 LearningRate 0.0010 Epoch: 37 Global Step: 94330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:26,335-Speed 12794.63 samples/sec Loss 3.4076 LearningRate 0.0010 Epoch: 37 Global Step: 94340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:27,893-Speed 13163.27 samples/sec Loss 3.4200 LearningRate 0.0010 Epoch: 37 Global Step: 94350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:29,442-Speed 13227.55 samples/sec Loss 3.4383 LearningRate 0.0010 Epoch: 37 Global Step: 94360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:30,995-Speed 13201.95 samples/sec Loss 3.3907 LearningRate 0.0010 Epoch: 37 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:19:32,578-Speed 12946.12 samples/sec Loss 3.3718 LearningRate 0.0010 Epoch: 37 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:19:34,184-Speed 12760.15 samples/sec Loss 3.4676 LearningRate 0.0010 Epoch: 37 Global Step: 94390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:35,754-Speed 13056.70 samples/sec Loss 3.4161 LearningRate 0.0010 Epoch: 37 Global Step: 94400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:37,310-Speed 13166.32 samples/sec Loss 3.5090 LearningRate 0.0010 Epoch: 37 Global Step: 94410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:38,901-Speed 12886.17 samples/sec Loss 3.4149 LearningRate 0.0010 Epoch: 37 Global Step: 94420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:40,465-Speed 13097.23 samples/sec Loss 3.4275 LearningRate 0.0010 Epoch: 37 Global Step: 94430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:42,034-Speed 13070.33 samples/sec Loss 3.3965 LearningRate 0.0010 Epoch: 37 Global Step: 94440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:43,653-Speed 12656.69 samples/sec Loss 3.4035 LearningRate 0.0010 Epoch: 37 Global Step: 94450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:45,220-Speed 13076.39 samples/sec Loss 3.3736 LearningRate 0.0010 Epoch: 37 Global Step: 94460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:46,766-Speed 13261.81 samples/sec Loss 3.3398 LearningRate 0.0010 Epoch: 37 Global Step: 94470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:48,364-Speed 12822.66 samples/sec Loss 3.4236 LearningRate 0.0010 Epoch: 37 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:49,933-Speed 13053.94 samples/sec Loss 3.4595 LearningRate 0.0010 Epoch: 37 Global Step: 94490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:19:51,497-Speed 13101.34 samples/sec Loss 3.3973 LearningRate 0.0010 Epoch: 37 Global Step: 94500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:19:53,064-Speed 13089.90 samples/sec Loss 3.3991 LearningRate 0.0010 Epoch: 37 Global Step: 94510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:19:54,645-Speed 12958.14 samples/sec Loss 3.4675 LearningRate 0.0010 Epoch: 37 Global Step: 94520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:56,230-Speed 12927.56 samples/sec Loss 3.4419 LearningRate 0.0010 Epoch: 37 Global Step: 94530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:57,792-Speed 13120.66 samples/sec Loss 3.4605 LearningRate 0.0009 Epoch: 37 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:19:59,389-Speed 12832.00 samples/sec Loss 3.4514 LearningRate 0.0009 Epoch: 37 Global Step: 94550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:00,958-Speed 13062.18 samples/sec Loss 3.4595 LearningRate 0.0009 Epoch: 37 Global Step: 94560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:02,535-Speed 13003.68 samples/sec Loss 3.3968 LearningRate 0.0009 Epoch: 37 Global Step: 94570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:04,087-Speed 13198.20 samples/sec Loss 3.4603 LearningRate 0.0009 Epoch: 37 Global Step: 94580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:05,642-Speed 13175.47 samples/sec Loss 3.4021 LearningRate 0.0009 Epoch: 37 Global Step: 94590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:07,207-Speed 13134.45 samples/sec Loss 3.3920 LearningRate 0.0009 Epoch: 37 Global Step: 94600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:08,742-Speed 13346.26 samples/sec Loss 3.3998 LearningRate 0.0009 Epoch: 37 Global Step: 94610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:10,339-Speed 12832.52 samples/sec Loss 3.3568 LearningRate 0.0009 Epoch: 37 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:11,898-Speed 13144.25 samples/sec Loss 3.3752 LearningRate 0.0009 Epoch: 37 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:13,466-Speed 13067.16 samples/sec Loss 3.4276 LearningRate 0.0009 Epoch: 37 Global Step: 94640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:15,072-Speed 12773.89 samples/sec Loss 3.4633 LearningRate 0.0009 Epoch: 37 Global Step: 94650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:16,618-Speed 13247.25 samples/sec Loss 3.4334 LearningRate 0.0009 Epoch: 37 Global Step: 94660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:18,187-Speed 13071.04 samples/sec Loss 3.4782 LearningRate 0.0009 Epoch: 37 Global Step: 94670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:19,751-Speed 13098.76 samples/sec Loss 3.4222 LearningRate 0.0009 Epoch: 37 Global Step: 94680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:21,324-Speed 13033.12 samples/sec Loss 3.4384 LearningRate 0.0009 Epoch: 37 Global Step: 94690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:22,903-Speed 12979.27 samples/sec Loss 3.4381 LearningRate 0.0009 Epoch: 37 Global Step: 94700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:24,496-Speed 12872.06 samples/sec Loss 3.4799 LearningRate 0.0009 Epoch: 37 Global Step: 94710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:26,046-Speed 13226.68 samples/sec Loss 3.4307 LearningRate 0.0009 Epoch: 37 Global Step: 94720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-14 18:20:27,599-Speed 13191.54 samples/sec Loss 3.4251 LearningRate 0.0009 Epoch: 37 Global Step: 94730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:29,189-Speed 12892.01 samples/sec Loss 3.3980 LearningRate 0.0009 Epoch: 37 Global Step: 94740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:20:30,776-Speed 12912.08 samples/sec Loss 3.4155 LearningRate 0.0009 Epoch: 37 Global Step: 94750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:32,335-Speed 13149.55 samples/sec Loss 3.4960 LearningRate 0.0009 Epoch: 37 Global Step: 94760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:33,916-Speed 12961.20 samples/sec Loss 3.5087 LearningRate 0.0009 Epoch: 37 Global Step: 94770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:35,482-Speed 13087.47 samples/sec Loss 3.5046 LearningRate 0.0009 Epoch: 37 Global Step: 94780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:37,021-Speed 13317.46 samples/sec Loss 3.4066 LearningRate 0.0009 Epoch: 37 Global Step: 94790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:38,577-Speed 13177.14 samples/sec Loss 3.4151 LearningRate 0.0009 Epoch: 37 Global Step: 94800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:40,141-Speed 13121.32 samples/sec Loss 3.4659 LearningRate 0.0009 Epoch: 37 Global Step: 94810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:41,707-Speed 13091.23 samples/sec Loss 3.4283 LearningRate 0.0009 Epoch: 37 Global Step: 94820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:43,280-Speed 13031.68 samples/sec Loss 3.4312 LearningRate 0.0009 Epoch: 37 Global Step: 94830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:44,857-Speed 12988.11 samples/sec Loss 3.4438 LearningRate 0.0009 Epoch: 37 Global Step: 94840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:46,430-Speed 13034.32 samples/sec Loss 3.4470 LearningRate 0.0009 Epoch: 37 Global Step: 94850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:48,000-Speed 13051.55 samples/sec Loss 3.4777 LearningRate 0.0009 Epoch: 37 Global Step: 94860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:20:49,554-Speed 13190.82 samples/sec Loss 3.4002 LearningRate 0.0009 Epoch: 37 Global Step: 94870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:51,123-Speed 13058.59 samples/sec Loss 3.4622 LearningRate 0.0009 Epoch: 37 Global Step: 94880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:52,702-Speed 12979.94 samples/sec Loss 3.4971 LearningRate 0.0009 Epoch: 37 Global Step: 94890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:54,276-Speed 13019.48 samples/sec Loss 3.4566 LearningRate 0.0008 Epoch: 37 Global Step: 94900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:55,839-Speed 13116.26 samples/sec Loss 3.3593 LearningRate 0.0008 Epoch: 37 Global Step: 94910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:57,398-Speed 13143.94 samples/sec Loss 3.2969 LearningRate 0.0008 Epoch: 37 Global Step: 94920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:20:58,979-Speed 12969.90 samples/sec Loss 3.5533 LearningRate 0.0008 Epoch: 37 Global Step: 94930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:21:00,551-Speed 13073.12 samples/sec Loss 3.4518 LearningRate 0.0008 Epoch: 37 Global Step: 94940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:21:02,096-Speed 13261.50 samples/sec Loss 3.4015 LearningRate 0.0008 Epoch: 37 Global Step: 94950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:21:03,687-Speed 12883.28 samples/sec Loss 3.4834 LearningRate 0.0008 Epoch: 37 Global Step: 94960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:21:05,269-Speed 12950.19 samples/sec Loss 3.4016 LearningRate 0.0008 Epoch: 37 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:21:06,836-Speed 13086.54 samples/sec Loss 3.4294 LearningRate 0.0008 Epoch: 37 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:21:08,425-Speed 12896.09 samples/sec Loss 3.3931 LearningRate 0.0008 Epoch: 37 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:21:09,983-Speed 13151.56 samples/sec Loss 3.4336 LearningRate 0.0008 Epoch: 37 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:21:32,343-[lfw][95000]XNorm: 7.089844 Training: 2022-01-14 18:21:32,344-[lfw][95000]Accuracy-Flip: 0.99650+-0.00376 Training: 2022-01-14 18:21:32,344-[lfw][95000]Accuracy-Highest: 0.99650 Training: 2022-01-14 18:21:58,001-[cfp_fp][95000]XNorm: 6.039162 Training: 2022-01-14 18:21:58,002-[cfp_fp][95000]Accuracy-Flip: 0.96943+-0.00886 Training: 2022-01-14 18:21:58,004-[cfp_fp][95000]Accuracy-Highest: 0.97043 Training: 2022-01-14 18:22:20,450-[agedb_30][95000]XNorm: 6.862862 Training: 2022-01-14 18:22:20,451-[agedb_30][95000]Accuracy-Flip: 0.96950+-0.00727 Training: 2022-01-14 18:22:20,452-[agedb_30][95000]Accuracy-Highest: 0.97100 Training: 2022-01-14 18:22:22,000-Speed 284.38 samples/sec Loss 3.4133 LearningRate 0.0008 Epoch: 37 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:23,594-Speed 12859.43 samples/sec Loss 3.4769 LearningRate 0.0008 Epoch: 37 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:25,142-Speed 13247.87 samples/sec Loss 3.4517 LearningRate 0.0008 Epoch: 37 Global Step: 95030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:26,698-Speed 13170.21 samples/sec Loss 3.3550 LearningRate 0.0008 Epoch: 37 Global Step: 95040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:28,296-Speed 12823.86 samples/sec Loss 3.5146 LearningRate 0.0008 Epoch: 37 Global Step: 95050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:29,880-Speed 12944.54 samples/sec Loss 3.4380 LearningRate 0.0008 Epoch: 37 Global Step: 95060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:31,474-Speed 12852.29 samples/sec Loss 3.3907 LearningRate 0.0008 Epoch: 37 Global Step: 95070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:33,059-Speed 12941.30 samples/sec Loss 3.4784 LearningRate 0.0008 Epoch: 37 Global Step: 95080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:34,625-Speed 13083.85 samples/sec Loss 3.5203 LearningRate 0.0008 Epoch: 37 Global Step: 95090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:36,181-Speed 13175.94 samples/sec Loss 3.4495 LearningRate 0.0008 Epoch: 37 Global Step: 95100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:37,771-Speed 12893.50 samples/sec Loss 3.4438 LearningRate 0.0008 Epoch: 37 Global Step: 95110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:39,325-Speed 13180.75 samples/sec Loss 3.4607 LearningRate 0.0008 Epoch: 37 Global Step: 95120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:22:40,894-Speed 13100.96 samples/sec Loss 3.4480 LearningRate 0.0008 Epoch: 37 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:42,471-Speed 12993.13 samples/sec Loss 3.3747 LearningRate 0.0008 Epoch: 37 Global Step: 95140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:44,026-Speed 13185.64 samples/sec Loss 3.4693 LearningRate 0.0008 Epoch: 37 Global Step: 95150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:45,584-Speed 13156.27 samples/sec Loss 3.4323 LearningRate 0.0008 Epoch: 37 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:47,146-Speed 13116.47 samples/sec Loss 3.4227 LearningRate 0.0008 Epoch: 37 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:48,719-Speed 13027.01 samples/sec Loss 3.4594 LearningRate 0.0008 Epoch: 37 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:50,313-Speed 12865.46 samples/sec Loss 3.4499 LearningRate 0.0008 Epoch: 37 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:51,907-Speed 12850.74 samples/sec Loss 3.3612 LearningRate 0.0008 Epoch: 37 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:53,464-Speed 13168.87 samples/sec Loss 3.4857 LearningRate 0.0008 Epoch: 37 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:55,015-Speed 13219.05 samples/sec Loss 3.3902 LearningRate 0.0008 Epoch: 37 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:56,564-Speed 13288.16 samples/sec Loss 3.4107 LearningRate 0.0008 Epoch: 37 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:58,154-Speed 12893.88 samples/sec Loss 3.3408 LearningRate 0.0008 Epoch: 37 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:22:59,711-Speed 13164.80 samples/sec Loss 3.4076 LearningRate 0.0008 Epoch: 37 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:01,278-Speed 13078.17 samples/sec Loss 3.4515 LearningRate 0.0008 Epoch: 37 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:02,858-Speed 12966.69 samples/sec Loss 3.5015 LearningRate 0.0008 Epoch: 37 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:04,454-Speed 12868.08 samples/sec Loss 3.3794 LearningRate 0.0007 Epoch: 37 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:06,000-Speed 13255.99 samples/sec Loss 3.4174 LearningRate 0.0007 Epoch: 37 Global Step: 95290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:07,594-Speed 12860.40 samples/sec Loss 3.5363 LearningRate 0.0007 Epoch: 37 Global Step: 95300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:09,154-Speed 13136.20 samples/sec Loss 3.4356 LearningRate 0.0007 Epoch: 37 Global Step: 95310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:10,693-Speed 13317.55 samples/sec Loss 3.4305 LearningRate 0.0007 Epoch: 37 Global Step: 95320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:12,265-Speed 13027.98 samples/sec Loss 3.4198 LearningRate 0.0007 Epoch: 37 Global Step: 95330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:13,851-Speed 12920.12 samples/sec Loss 3.4459 LearningRate 0.0007 Epoch: 37 Global Step: 95340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:15,419-Speed 13077.29 samples/sec Loss 3.4369 LearningRate 0.0007 Epoch: 37 Global Step: 95350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:16,999-Speed 12984.62 samples/sec Loss 3.4619 LearningRate 0.0007 Epoch: 37 Global Step: 95360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:18,560-Speed 13136.40 samples/sec Loss 3.3912 LearningRate 0.0007 Epoch: 37 Global Step: 95370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:20,117-Speed 13161.56 samples/sec Loss 3.4169 LearningRate 0.0007 Epoch: 37 Global Step: 95380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:21,688-Speed 13041.43 samples/sec Loss 3.4895 LearningRate 0.0007 Epoch: 37 Global Step: 95390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:23,248-Speed 13139.04 samples/sec Loss 3.4191 LearningRate 0.0007 Epoch: 37 Global Step: 95400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:24,829-Speed 12959.68 samples/sec Loss 3.4402 LearningRate 0.0007 Epoch: 37 Global Step: 95410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:26,396-Speed 13086.62 samples/sec Loss 3.4569 LearningRate 0.0007 Epoch: 37 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:27,954-Speed 13148.40 samples/sec Loss 3.5201 LearningRate 0.0007 Epoch: 37 Global Step: 95430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:29,531-Speed 12998.59 samples/sec Loss 3.5062 LearningRate 0.0007 Epoch: 37 Global Step: 95440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:31,092-Speed 13131.71 samples/sec Loss 3.3805 LearningRate 0.0007 Epoch: 37 Global Step: 95450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:32,676-Speed 12959.68 samples/sec Loss 3.4416 LearningRate 0.0007 Epoch: 37 Global Step: 95460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:34,262-Speed 12915.71 samples/sec Loss 3.4471 LearningRate 0.0007 Epoch: 37 Global Step: 95470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:35,808-Speed 13265.49 samples/sec Loss 3.4164 LearningRate 0.0007 Epoch: 37 Global Step: 95480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:37,375-Speed 13068.54 samples/sec Loss 3.3660 LearningRate 0.0007 Epoch: 37 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:38,947-Speed 13039.33 samples/sec Loss 3.3990 LearningRate 0.0007 Epoch: 37 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:40,498-Speed 13214.07 samples/sec Loss 3.4247 LearningRate 0.0007 Epoch: 37 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:42,096-Speed 12832.62 samples/sec Loss 3.4004 LearningRate 0.0007 Epoch: 37 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:23:43,635-Speed 13328.25 samples/sec Loss 3.3972 LearningRate 0.0007 Epoch: 37 Global Step: 95530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:45,259-Speed 12622.43 samples/sec Loss 3.4538 LearningRate 0.0007 Epoch: 37 Global Step: 95540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:46,802-Speed 13286.07 samples/sec Loss 3.4430 LearningRate 0.0007 Epoch: 37 Global Step: 95550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:48,364-Speed 13131.34 samples/sec Loss 3.4629 LearningRate 0.0007 Epoch: 37 Global Step: 95560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:49,917-Speed 13199.65 samples/sec Loss 3.4396 LearningRate 0.0007 Epoch: 37 Global Step: 95570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:51,536-Speed 12661.63 samples/sec Loss 3.4172 LearningRate 0.0007 Epoch: 37 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:53,095-Speed 13144.00 samples/sec Loss 3.4383 LearningRate 0.0007 Epoch: 37 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:54,678-Speed 12940.39 samples/sec Loss 3.4353 LearningRate 0.0007 Epoch: 37 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:56,245-Speed 13087.43 samples/sec Loss 3.4478 LearningRate 0.0007 Epoch: 37 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:57,795-Speed 13217.69 samples/sec Loss 3.4638 LearningRate 0.0007 Epoch: 37 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:23:59,357-Speed 13119.93 samples/sec Loss 3.4764 LearningRate 0.0007 Epoch: 37 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:00,950-Speed 12865.72 samples/sec Loss 3.4343 LearningRate 0.0007 Epoch: 37 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:02,496-Speed 13255.44 samples/sec Loss 3.4601 LearningRate 0.0007 Epoch: 37 Global Step: 95650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:04,045-Speed 13233.30 samples/sec Loss 3.4328 LearningRate 0.0007 Epoch: 37 Global Step: 95660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:05,598-Speed 13197.26 samples/sec Loss 3.4501 LearningRate 0.0007 Epoch: 37 Global Step: 95670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:07,160-Speed 13113.43 samples/sec Loss 3.4241 LearningRate 0.0007 Epoch: 37 Global Step: 95680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:08,764-Speed 12782.07 samples/sec Loss 3.4052 LearningRate 0.0006 Epoch: 37 Global Step: 95690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:10,329-Speed 13095.86 samples/sec Loss 3.4773 LearningRate 0.0006 Epoch: 37 Global Step: 95700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:11,907-Speed 12982.46 samples/sec Loss 3.4418 LearningRate 0.0006 Epoch: 37 Global Step: 95710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:13,490-Speed 12945.91 samples/sec Loss 3.4587 LearningRate 0.0006 Epoch: 37 Global Step: 95720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:15,061-Speed 13044.12 samples/sec Loss 3.4319 LearningRate 0.0006 Epoch: 37 Global Step: 95730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:16,614-Speed 13197.20 samples/sec Loss 3.3715 LearningRate 0.0006 Epoch: 37 Global Step: 95740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:18,187-Speed 13023.37 samples/sec Loss 3.3811 LearningRate 0.0006 Epoch: 37 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:19,773-Speed 12924.17 samples/sec Loss 3.4866 LearningRate 0.0006 Epoch: 37 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:21,327-Speed 13194.37 samples/sec Loss 3.5466 LearningRate 0.0006 Epoch: 37 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:22,889-Speed 13122.18 samples/sec Loss 3.4390 LearningRate 0.0006 Epoch: 37 Global Step: 95780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:24,447-Speed 13150.79 samples/sec Loss 3.4455 LearningRate 0.0006 Epoch: 37 Global Step: 95790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:26,021-Speed 13022.50 samples/sec Loss 3.3828 LearningRate 0.0006 Epoch: 37 Global Step: 95800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:27,594-Speed 13036.52 samples/sec Loss 3.5099 LearningRate 0.0006 Epoch: 37 Global Step: 95810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:29,149-Speed 13196.41 samples/sec Loss 3.4285 LearningRate 0.0006 Epoch: 37 Global Step: 95820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:30,777-Speed 12598.31 samples/sec Loss 3.4314 LearningRate 0.0006 Epoch: 37 Global Step: 95830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:32,332-Speed 13183.66 samples/sec Loss 3.4114 LearningRate 0.0006 Epoch: 37 Global Step: 95840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:33,911-Speed 12970.15 samples/sec Loss 3.3911 LearningRate 0.0006 Epoch: 37 Global Step: 95850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:35,490-Speed 12985.07 samples/sec Loss 3.4837 LearningRate 0.0006 Epoch: 37 Global Step: 95860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:37,053-Speed 13114.53 samples/sec Loss 3.4510 LearningRate 0.0006 Epoch: 37 Global Step: 95870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:38,611-Speed 13144.90 samples/sec Loss 3.4737 LearningRate 0.0006 Epoch: 37 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:40,200-Speed 12897.21 samples/sec Loss 3.4549 LearningRate 0.0006 Epoch: 37 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:41,737-Speed 13343.55 samples/sec Loss 3.3955 LearningRate 0.0006 Epoch: 37 Global Step: 95900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:43,301-Speed 13120.56 samples/sec Loss 3.4964 LearningRate 0.0006 Epoch: 37 Global Step: 95910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:44,858-Speed 13155.92 samples/sec Loss 3.4831 LearningRate 0.0006 Epoch: 37 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:24:46,414-Speed 13183.26 samples/sec Loss 3.4071 LearningRate 0.0006 Epoch: 37 Global Step: 95930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:47,994-Speed 12977.05 samples/sec Loss 3.4150 LearningRate 0.0006 Epoch: 37 Global Step: 95940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:49,552-Speed 13150.07 samples/sec Loss 3.4156 LearningRate 0.0006 Epoch: 37 Global Step: 95950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:51,131-Speed 12975.66 samples/sec Loss 3.4633 LearningRate 0.0006 Epoch: 37 Global Step: 95960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:52,715-Speed 12955.47 samples/sec Loss 3.5001 LearningRate 0.0006 Epoch: 37 Global Step: 95970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:54,273-Speed 13152.58 samples/sec Loss 3.4940 LearningRate 0.0006 Epoch: 37 Global Step: 95980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:55,849-Speed 13007.69 samples/sec Loss 3.3833 LearningRate 0.0006 Epoch: 37 Global Step: 95990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:57,405-Speed 13167.64 samples/sec Loss 3.4023 LearningRate 0.0006 Epoch: 37 Global Step: 96000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:24:58,990-Speed 12927.71 samples/sec Loss 3.3941 LearningRate 0.0006 Epoch: 37 Global Step: 96010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:00,569-Speed 12984.97 samples/sec Loss 3.4536 LearningRate 0.0006 Epoch: 37 Global Step: 96020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:02,129-Speed 13140.17 samples/sec Loss 3.4488 LearningRate 0.0006 Epoch: 37 Global Step: 96030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:03,666-Speed 13336.82 samples/sec Loss 3.4327 LearningRate 0.0006 Epoch: 37 Global Step: 96040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:05,245-Speed 12974.13 samples/sec Loss 3.4921 LearningRate 0.0006 Epoch: 37 Global Step: 96050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:06,813-Speed 13080.87 samples/sec Loss 3.5022 LearningRate 0.0006 Epoch: 37 Global Step: 96060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:08,383-Speed 13046.99 samples/sec Loss 3.4547 LearningRate 0.0006 Epoch: 37 Global Step: 96070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:09,972-Speed 12902.68 samples/sec Loss 3.4354 LearningRate 0.0006 Epoch: 37 Global Step: 96080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:11,576-Speed 12778.00 samples/sec Loss 3.5032 LearningRate 0.0006 Epoch: 37 Global Step: 96090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:13,054-Speed 13866.23 samples/sec Loss 3.4156 LearningRate 0.0006 Epoch: 37 Global Step: 96100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:29,498-Speed 1245.56 samples/sec Loss 3.3890 LearningRate 0.0006 Epoch: 38 Global Step: 96110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:31,106-Speed 12748.33 samples/sec Loss 3.2883 LearningRate 0.0006 Epoch: 38 Global Step: 96120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:32,692-Speed 12927.99 samples/sec Loss 3.4015 LearningRate 0.0005 Epoch: 38 Global Step: 96130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:25:34,255-Speed 13113.74 samples/sec Loss 3.3252 LearningRate 0.0005 Epoch: 38 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:25:35,895-Speed 12493.11 samples/sec Loss 3.4143 LearningRate 0.0005 Epoch: 38 Global Step: 96150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:25:37,528-Speed 12551.04 samples/sec Loss 3.3666 LearningRate 0.0005 Epoch: 38 Global Step: 96160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:25:39,162-Speed 12536.34 samples/sec Loss 3.3675 LearningRate 0.0005 Epoch: 38 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:25:40,730-Speed 13074.49 samples/sec Loss 3.3458 LearningRate 0.0005 Epoch: 38 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:42,292-Speed 13118.63 samples/sec Loss 3.3590 LearningRate 0.0005 Epoch: 38 Global Step: 96190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:43,880-Speed 12923.62 samples/sec Loss 3.4090 LearningRate 0.0005 Epoch: 38 Global Step: 96200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:45,451-Speed 13051.39 samples/sec Loss 3.3618 LearningRate 0.0005 Epoch: 38 Global Step: 96210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:47,016-Speed 13098.44 samples/sec Loss 3.4474 LearningRate 0.0005 Epoch: 38 Global Step: 96220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:48,603-Speed 12908.38 samples/sec Loss 3.4246 LearningRate 0.0005 Epoch: 38 Global Step: 96230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:50,196-Speed 12863.93 samples/sec Loss 3.4479 LearningRate 0.0005 Epoch: 38 Global Step: 96240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:51,771-Speed 13018.37 samples/sec Loss 3.3755 LearningRate 0.0005 Epoch: 38 Global Step: 96250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:53,349-Speed 12982.17 samples/sec Loss 3.3584 LearningRate 0.0005 Epoch: 38 Global Step: 96260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:54,914-Speed 13100.19 samples/sec Loss 3.3739 LearningRate 0.0005 Epoch: 38 Global Step: 96270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:25:56,481-Speed 13097.44 samples/sec Loss 3.4573 LearningRate 0.0005 Epoch: 38 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:25:58,065-Speed 12939.60 samples/sec Loss 3.3749 LearningRate 0.0005 Epoch: 38 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:25:59,642-Speed 12989.14 samples/sec Loss 3.4359 LearningRate 0.0005 Epoch: 38 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:26:01,246-Speed 12783.45 samples/sec Loss 3.4392 LearningRate 0.0005 Epoch: 38 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:26:02,827-Speed 12961.19 samples/sec Loss 3.3647 LearningRate 0.0005 Epoch: 38 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:26:04,382-Speed 13184.00 samples/sec Loss 3.3634 LearningRate 0.0005 Epoch: 38 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:26:05,954-Speed 13039.83 samples/sec Loss 3.3776 LearningRate 0.0005 Epoch: 38 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:07,503-Speed 13226.98 samples/sec Loss 3.3004 LearningRate 0.0005 Epoch: 38 Global Step: 96350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:09,096-Speed 12856.97 samples/sec Loss 3.3910 LearningRate 0.0005 Epoch: 38 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:10,661-Speed 13103.89 samples/sec Loss 3.3833 LearningRate 0.0005 Epoch: 38 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:12,208-Speed 13241.01 samples/sec Loss 3.3773 LearningRate 0.0005 Epoch: 38 Global Step: 96380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:13,774-Speed 13090.56 samples/sec Loss 3.4023 LearningRate 0.0005 Epoch: 38 Global Step: 96390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:15,357-Speed 12939.20 samples/sec Loss 3.4456 LearningRate 0.0005 Epoch: 38 Global Step: 96400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:16,904-Speed 13257.74 samples/sec Loss 3.4066 LearningRate 0.0005 Epoch: 38 Global Step: 96410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:18,469-Speed 13114.02 samples/sec Loss 3.4329 LearningRate 0.0005 Epoch: 38 Global Step: 96420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:20,061-Speed 12878.03 samples/sec Loss 3.4002 LearningRate 0.0005 Epoch: 38 Global Step: 96430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:21,646-Speed 12935.45 samples/sec Loss 3.4089 LearningRate 0.0005 Epoch: 38 Global Step: 96440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:23,229-Speed 12940.17 samples/sec Loss 3.4363 LearningRate 0.0005 Epoch: 38 Global Step: 96450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:24,795-Speed 13091.93 samples/sec Loss 3.3826 LearningRate 0.0005 Epoch: 38 Global Step: 96460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:26,344-Speed 13230.53 samples/sec Loss 3.4525 LearningRate 0.0005 Epoch: 38 Global Step: 96470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:26:27,917-Speed 13064.97 samples/sec Loss 3.3556 LearningRate 0.0005 Epoch: 38 Global Step: 96480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:29,476-Speed 13148.20 samples/sec Loss 3.3800 LearningRate 0.0005 Epoch: 38 Global Step: 96490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:31,043-Speed 13083.07 samples/sec Loss 3.4836 LearningRate 0.0005 Epoch: 38 Global Step: 96500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:32,619-Speed 13001.87 samples/sec Loss 3.4460 LearningRate 0.0005 Epoch: 38 Global Step: 96510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:34,197-Speed 12985.97 samples/sec Loss 3.3625 LearningRate 0.0005 Epoch: 38 Global Step: 96520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:35,763-Speed 13090.61 samples/sec Loss 3.4639 LearningRate 0.0005 Epoch: 38 Global Step: 96530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:37,335-Speed 13033.70 samples/sec Loss 3.3917 LearningRate 0.0005 Epoch: 38 Global Step: 96540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:38,894-Speed 13151.64 samples/sec Loss 3.3743 LearningRate 0.0005 Epoch: 38 Global Step: 96550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:40,462-Speed 13065.00 samples/sec Loss 3.3614 LearningRate 0.0005 Epoch: 38 Global Step: 96560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:42,025-Speed 13111.30 samples/sec Loss 3.4163 LearningRate 0.0005 Epoch: 38 Global Step: 96570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:43,583-Speed 13158.00 samples/sec Loss 3.3983 LearningRate 0.0005 Epoch: 38 Global Step: 96580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:45,133-Speed 13220.28 samples/sec Loss 3.4814 LearningRate 0.0005 Epoch: 38 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:46,759-Speed 12608.98 samples/sec Loss 3.3891 LearningRate 0.0005 Epoch: 38 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:48,327-Speed 13065.18 samples/sec Loss 3.3567 LearningRate 0.0004 Epoch: 38 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:49,920-Speed 12864.60 samples/sec Loss 3.3430 LearningRate 0.0004 Epoch: 38 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:51,517-Speed 12836.08 samples/sec Loss 3.3423 LearningRate 0.0004 Epoch: 38 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:53,086-Speed 13063.86 samples/sec Loss 3.3776 LearningRate 0.0004 Epoch: 38 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:54,703-Speed 12670.85 samples/sec Loss 3.3616 LearningRate 0.0004 Epoch: 38 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:56,266-Speed 13121.48 samples/sec Loss 3.3962 LearningRate 0.0004 Epoch: 38 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:57,818-Speed 13200.98 samples/sec Loss 3.3656 LearningRate 0.0004 Epoch: 38 Global Step: 96670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:26:59,401-Speed 12946.69 samples/sec Loss 3.3744 LearningRate 0.0004 Epoch: 38 Global Step: 96680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:00,958-Speed 13165.85 samples/sec Loss 3.3888 LearningRate 0.0004 Epoch: 38 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:02,505-Speed 13245.10 samples/sec Loss 3.4311 LearningRate 0.0004 Epoch: 38 Global Step: 96700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:04,121-Speed 12688.73 samples/sec Loss 3.4202 LearningRate 0.0004 Epoch: 38 Global Step: 96710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:05,701-Speed 12974.57 samples/sec Loss 3.3979 LearningRate 0.0004 Epoch: 38 Global Step: 96720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:07,270-Speed 13062.28 samples/sec Loss 3.4475 LearningRate 0.0004 Epoch: 38 Global Step: 96730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:08,831-Speed 13128.82 samples/sec Loss 3.4292 LearningRate 0.0004 Epoch: 38 Global Step: 96740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:10,412-Speed 12982.48 samples/sec Loss 3.3943 LearningRate 0.0004 Epoch: 38 Global Step: 96750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:11,976-Speed 13110.28 samples/sec Loss 3.4552 LearningRate 0.0004 Epoch: 38 Global Step: 96760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:13,551-Speed 13013.95 samples/sec Loss 3.3390 LearningRate 0.0004 Epoch: 38 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:15,156-Speed 12771.75 samples/sec Loss 3.4157 LearningRate 0.0004 Epoch: 38 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:16,712-Speed 13180.38 samples/sec Loss 3.3585 LearningRate 0.0004 Epoch: 38 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:18,306-Speed 12856.44 samples/sec Loss 3.3656 LearningRate 0.0004 Epoch: 38 Global Step: 96800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:19,873-Speed 13077.88 samples/sec Loss 3.3541 LearningRate 0.0004 Epoch: 38 Global Step: 96810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:21,444-Speed 13051.09 samples/sec Loss 3.3605 LearningRate 0.0004 Epoch: 38 Global Step: 96820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:22,997-Speed 13192.37 samples/sec Loss 3.3713 LearningRate 0.0004 Epoch: 38 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:24,573-Speed 13004.88 samples/sec Loss 3.3151 LearningRate 0.0004 Epoch: 38 Global Step: 96840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:26,140-Speed 13079.82 samples/sec Loss 3.4060 LearningRate 0.0004 Epoch: 38 Global Step: 96850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:27,708-Speed 13074.62 samples/sec Loss 3.4465 LearningRate 0.0004 Epoch: 38 Global Step: 96860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:29,341-Speed 12548.88 samples/sec Loss 3.3943 LearningRate 0.0004 Epoch: 38 Global Step: 96870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:30,902-Speed 13128.66 samples/sec Loss 3.4848 LearningRate 0.0004 Epoch: 38 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:32,467-Speed 13100.90 samples/sec Loss 3.2723 LearningRate 0.0004 Epoch: 38 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:34,044-Speed 12993.40 samples/sec Loss 3.3419 LearningRate 0.0004 Epoch: 38 Global Step: 96900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:35,614-Speed 13103.07 samples/sec Loss 3.3631 LearningRate 0.0004 Epoch: 38 Global Step: 96910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:37,178-Speed 13098.57 samples/sec Loss 3.3666 LearningRate 0.0004 Epoch: 38 Global Step: 96920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:27:38,755-Speed 12996.09 samples/sec Loss 3.4082 LearningRate 0.0004 Epoch: 38 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:40,315-Speed 13136.54 samples/sec Loss 3.3903 LearningRate 0.0004 Epoch: 38 Global Step: 96940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:41,877-Speed 13117.46 samples/sec Loss 3.4709 LearningRate 0.0004 Epoch: 38 Global Step: 96950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:27:43,469-Speed 12880.75 samples/sec Loss 3.4161 LearningRate 0.0004 Epoch: 38 Global Step: 96960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:45,029-Speed 13127.81 samples/sec Loss 3.4026 LearningRate 0.0004 Epoch: 38 Global Step: 96970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:46,612-Speed 12946.86 samples/sec Loss 3.3559 LearningRate 0.0004 Epoch: 38 Global Step: 96980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:48,190-Speed 12993.48 samples/sec Loss 3.4039 LearningRate 0.0004 Epoch: 38 Global Step: 96990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:49,760-Speed 13049.89 samples/sec Loss 3.4068 LearningRate 0.0004 Epoch: 38 Global Step: 97000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:51,327-Speed 13082.49 samples/sec Loss 3.4302 LearningRate 0.0004 Epoch: 38 Global Step: 97010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:52,886-Speed 13146.10 samples/sec Loss 3.3982 LearningRate 0.0004 Epoch: 38 Global Step: 97020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:54,454-Speed 13068.15 samples/sec Loss 3.4843 LearningRate 0.0004 Epoch: 38 Global Step: 97030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:56,006-Speed 13209.43 samples/sec Loss 3.4010 LearningRate 0.0004 Epoch: 38 Global Step: 97040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:57,585-Speed 12982.01 samples/sec Loss 3.3167 LearningRate 0.0004 Epoch: 38 Global Step: 97050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:27:59,127-Speed 13288.59 samples/sec Loss 3.3948 LearningRate 0.0004 Epoch: 38 Global Step: 97060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:00,708-Speed 12966.88 samples/sec Loss 3.4311 LearningRate 0.0004 Epoch: 38 Global Step: 97070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:02,278-Speed 13052.02 samples/sec Loss 3.3237 LearningRate 0.0004 Epoch: 38 Global Step: 97080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:03,828-Speed 13218.90 samples/sec Loss 3.4703 LearningRate 0.0004 Epoch: 38 Global Step: 97090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:05,425-Speed 12856.76 samples/sec Loss 3.3733 LearningRate 0.0004 Epoch: 38 Global Step: 97100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:07,019-Speed 12859.15 samples/sec Loss 3.3653 LearningRate 0.0004 Epoch: 38 Global Step: 97110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:08,557-Speed 13324.88 samples/sec Loss 3.4115 LearningRate 0.0004 Epoch: 38 Global Step: 97120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:10,137-Speed 12968.44 samples/sec Loss 3.4104 LearningRate 0.0004 Epoch: 38 Global Step: 97130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:11,712-Speed 13021.02 samples/sec Loss 3.3975 LearningRate 0.0003 Epoch: 38 Global Step: 97140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:13,295-Speed 12944.79 samples/sec Loss 3.4337 LearningRate 0.0003 Epoch: 38 Global Step: 97150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:14,844-Speed 13232.32 samples/sec Loss 3.3922 LearningRate 0.0003 Epoch: 38 Global Step: 97160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:16,414-Speed 13074.06 samples/sec Loss 3.3912 LearningRate 0.0003 Epoch: 38 Global Step: 97170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:17,993-Speed 12978.55 samples/sec Loss 3.4292 LearningRate 0.0003 Epoch: 38 Global Step: 97180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:19,589-Speed 12840.85 samples/sec Loss 3.4346 LearningRate 0.0003 Epoch: 38 Global Step: 97190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:21,146-Speed 13169.27 samples/sec Loss 3.3942 LearningRate 0.0003 Epoch: 38 Global Step: 97200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:22,727-Speed 12958.28 samples/sec Loss 3.3851 LearningRate 0.0003 Epoch: 38 Global Step: 97210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:24,272-Speed 13270.52 samples/sec Loss 3.4324 LearningRate 0.0003 Epoch: 38 Global Step: 97220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:25,837-Speed 13094.44 samples/sec Loss 3.4360 LearningRate 0.0003 Epoch: 38 Global Step: 97230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:27,407-Speed 13052.23 samples/sec Loss 3.4644 LearningRate 0.0003 Epoch: 38 Global Step: 97240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:28:28,957-Speed 13227.26 samples/sec Loss 3.4005 LearningRate 0.0003 Epoch: 38 Global Step: 97250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:30,511-Speed 13186.03 samples/sec Loss 3.3455 LearningRate 0.0003 Epoch: 38 Global Step: 97260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:32,081-Speed 13053.62 samples/sec Loss 3.4263 LearningRate 0.0003 Epoch: 38 Global Step: 97270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:33,643-Speed 13124.38 samples/sec Loss 3.3845 LearningRate 0.0003 Epoch: 38 Global Step: 97280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:35,208-Speed 13094.32 samples/sec Loss 3.3967 LearningRate 0.0003 Epoch: 38 Global Step: 97290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:36,768-Speed 13139.64 samples/sec Loss 3.3860 LearningRate 0.0003 Epoch: 38 Global Step: 97300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:38,340-Speed 13038.70 samples/sec Loss 3.4426 LearningRate 0.0003 Epoch: 38 Global Step: 97310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:39,930-Speed 12886.37 samples/sec Loss 3.3970 LearningRate 0.0003 Epoch: 38 Global Step: 97320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:41,527-Speed 12848.75 samples/sec Loss 3.4166 LearningRate 0.0003 Epoch: 38 Global Step: 97330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:43,103-Speed 13014.68 samples/sec Loss 3.4003 LearningRate 0.0003 Epoch: 38 Global Step: 97340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:44,670-Speed 13069.65 samples/sec Loss 3.3056 LearningRate 0.0003 Epoch: 38 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:28:46,231-Speed 13133.33 samples/sec Loss 3.4685 LearningRate 0.0003 Epoch: 38 Global Step: 97360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:47,795-Speed 13107.53 samples/sec Loss 3.4554 LearningRate 0.0003 Epoch: 38 Global Step: 97370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:49,383-Speed 12904.15 samples/sec Loss 3.4322 LearningRate 0.0003 Epoch: 38 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:50,934-Speed 13216.06 samples/sec Loss 3.3976 LearningRate 0.0003 Epoch: 38 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:52,593-Speed 12352.57 samples/sec Loss 3.4299 LearningRate 0.0003 Epoch: 38 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:54,138-Speed 13267.86 samples/sec Loss 3.3795 LearningRate 0.0003 Epoch: 38 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:55,706-Speed 13063.55 samples/sec Loss 3.4435 LearningRate 0.0003 Epoch: 38 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:57,298-Speed 12874.23 samples/sec Loss 3.3741 LearningRate 0.0003 Epoch: 38 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:28:58,854-Speed 13173.10 samples/sec Loss 3.3783 LearningRate 0.0003 Epoch: 38 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:00,419-Speed 13097.04 samples/sec Loss 3.4093 LearningRate 0.0003 Epoch: 38 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:02,004-Speed 12926.97 samples/sec Loss 3.4736 LearningRate 0.0003 Epoch: 38 Global Step: 97460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:29:03,567-Speed 13143.48 samples/sec Loss 3.4072 LearningRate 0.0003 Epoch: 38 Global Step: 97470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:05,150-Speed 12939.50 samples/sec Loss 3.4549 LearningRate 0.0003 Epoch: 38 Global Step: 97480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:06,720-Speed 13089.42 samples/sec Loss 3.4115 LearningRate 0.0003 Epoch: 38 Global Step: 97490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:08,299-Speed 12972.97 samples/sec Loss 3.4002 LearningRate 0.0003 Epoch: 38 Global Step: 97500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:09,861-Speed 13123.63 samples/sec Loss 3.4043 LearningRate 0.0003 Epoch: 38 Global Step: 97510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:11,421-Speed 13164.26 samples/sec Loss 3.3898 LearningRate 0.0003 Epoch: 38 Global Step: 97520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:13,003-Speed 12956.01 samples/sec Loss 3.3849 LearningRate 0.0003 Epoch: 38 Global Step: 97530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:14,577-Speed 13022.28 samples/sec Loss 3.3776 LearningRate 0.0003 Epoch: 38 Global Step: 97540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:16,136-Speed 13145.98 samples/sec Loss 3.4075 LearningRate 0.0003 Epoch: 38 Global Step: 97550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:17,691-Speed 13181.60 samples/sec Loss 3.4617 LearningRate 0.0003 Epoch: 38 Global Step: 97560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:19,272-Speed 12965.02 samples/sec Loss 3.4168 LearningRate 0.0003 Epoch: 38 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:29:20,830-Speed 13161.04 samples/sec Loss 3.3816 LearningRate 0.0003 Epoch: 38 Global Step: 97580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:22,396-Speed 13082.65 samples/sec Loss 3.3627 LearningRate 0.0003 Epoch: 38 Global Step: 97590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:23,941-Speed 13267.06 samples/sec Loss 3.3586 LearningRate 0.0003 Epoch: 38 Global Step: 97600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:25,514-Speed 13024.98 samples/sec Loss 3.4099 LearningRate 0.0003 Epoch: 38 Global Step: 97610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:27,083-Speed 13071.18 samples/sec Loss 3.3823 LearningRate 0.0003 Epoch: 38 Global Step: 97620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:28,648-Speed 13091.27 samples/sec Loss 3.4761 LearningRate 0.0003 Epoch: 38 Global Step: 97630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:30,258-Speed 12723.92 samples/sec Loss 3.4347 LearningRate 0.0003 Epoch: 38 Global Step: 97640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:31,810-Speed 13210.47 samples/sec Loss 3.4841 LearningRate 0.0003 Epoch: 38 Global Step: 97650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:33,368-Speed 13153.94 samples/sec Loss 3.3673 LearningRate 0.0003 Epoch: 38 Global Step: 97660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:34,927-Speed 13145.22 samples/sec Loss 3.3736 LearningRate 0.0003 Epoch: 38 Global Step: 97670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:36,490-Speed 13109.57 samples/sec Loss 3.4176 LearningRate 0.0003 Epoch: 38 Global Step: 97680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:38,091-Speed 12827.50 samples/sec Loss 3.3340 LearningRate 0.0003 Epoch: 38 Global Step: 97690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:39,671-Speed 12975.34 samples/sec Loss 3.3462 LearningRate 0.0003 Epoch: 38 Global Step: 97700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:41,239-Speed 13071.21 samples/sec Loss 3.4488 LearningRate 0.0003 Epoch: 38 Global Step: 97710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:29:42,816-Speed 12998.30 samples/sec Loss 3.4642 LearningRate 0.0003 Epoch: 38 Global Step: 97720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:44,383-Speed 13075.02 samples/sec Loss 3.3642 LearningRate 0.0003 Epoch: 38 Global Step: 97730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:45,936-Speed 13219.35 samples/sec Loss 3.3511 LearningRate 0.0003 Epoch: 38 Global Step: 97740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:47,537-Speed 12804.12 samples/sec Loss 3.3967 LearningRate 0.0003 Epoch: 38 Global Step: 97750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:49,128-Speed 12882.02 samples/sec Loss 3.4138 LearningRate 0.0003 Epoch: 38 Global Step: 97760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:50,689-Speed 13121.96 samples/sec Loss 3.3725 LearningRate 0.0002 Epoch: 38 Global Step: 97770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:52,292-Speed 12786.15 samples/sec Loss 3.3981 LearningRate 0.0002 Epoch: 38 Global Step: 97780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:53,843-Speed 13218.24 samples/sec Loss 3.3643 LearningRate 0.0002 Epoch: 38 Global Step: 97790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:55,425-Speed 12955.28 samples/sec Loss 3.3282 LearningRate 0.0002 Epoch: 38 Global Step: 97800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:56,978-Speed 13197.38 samples/sec Loss 3.4228 LearningRate 0.0002 Epoch: 38 Global Step: 97810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:29:58,575-Speed 12826.61 samples/sec Loss 3.3607 LearningRate 0.0002 Epoch: 38 Global Step: 97820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:00,156-Speed 12966.99 samples/sec Loss 3.4053 LearningRate 0.0002 Epoch: 38 Global Step: 97830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:01,746-Speed 13075.12 samples/sec Loss 3.3650 LearningRate 0.0002 Epoch: 38 Global Step: 97840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:03,308-Speed 13122.94 samples/sec Loss 3.4357 LearningRate 0.0002 Epoch: 38 Global Step: 97850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:04,864-Speed 13164.68 samples/sec Loss 3.4424 LearningRate 0.0002 Epoch: 38 Global Step: 97860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:06,431-Speed 13081.78 samples/sec Loss 3.3728 LearningRate 0.0002 Epoch: 38 Global Step: 97870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:07,993-Speed 13120.68 samples/sec Loss 3.4193 LearningRate 0.0002 Epoch: 38 Global Step: 97880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:09,560-Speed 13084.06 samples/sec Loss 3.3947 LearningRate 0.0002 Epoch: 38 Global Step: 97890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:11,124-Speed 13105.30 samples/sec Loss 3.3452 LearningRate 0.0002 Epoch: 38 Global Step: 97900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:12,772-Speed 12436.90 samples/sec Loss 3.4672 LearningRate 0.0002 Epoch: 38 Global Step: 97910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:14,327-Speed 13183.71 samples/sec Loss 3.4471 LearningRate 0.0002 Epoch: 38 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:30:15,902-Speed 13009.19 samples/sec Loss 3.4044 LearningRate 0.0002 Epoch: 38 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:30:17,457-Speed 13178.51 samples/sec Loss 3.4283 LearningRate 0.0002 Epoch: 38 Global Step: 97940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:19,039-Speed 12953.17 samples/sec Loss 3.4245 LearningRate 0.0002 Epoch: 38 Global Step: 97950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:20,612-Speed 13026.61 samples/sec Loss 3.4454 LearningRate 0.0002 Epoch: 38 Global Step: 97960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:22,174-Speed 13127.44 samples/sec Loss 3.3844 LearningRate 0.0002 Epoch: 38 Global Step: 97970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:23,741-Speed 13074.50 samples/sec Loss 3.4740 LearningRate 0.0002 Epoch: 38 Global Step: 97980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:25,353-Speed 12715.94 samples/sec Loss 3.3587 LearningRate 0.0002 Epoch: 38 Global Step: 97990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:26,894-Speed 13298.46 samples/sec Loss 3.3637 LearningRate 0.0002 Epoch: 38 Global Step: 98000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:28,514-Speed 12644.87 samples/sec Loss 3.3859 LearningRate 0.0002 Epoch: 38 Global Step: 98010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:30,099-Speed 12934.96 samples/sec Loss 3.3789 LearningRate 0.0002 Epoch: 38 Global Step: 98020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:31,658-Speed 13153.84 samples/sec Loss 3.3852 LearningRate 0.0002 Epoch: 38 Global Step: 98030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:33,265-Speed 12754.01 samples/sec Loss 3.3803 LearningRate 0.0002 Epoch: 38 Global Step: 98040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:34,815-Speed 13218.43 samples/sec Loss 3.4296 LearningRate 0.0002 Epoch: 38 Global Step: 98050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:36,386-Speed 13050.82 samples/sec Loss 3.3752 LearningRate 0.0002 Epoch: 38 Global Step: 98060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:37,975-Speed 12895.31 samples/sec Loss 3.4610 LearningRate 0.0002 Epoch: 38 Global Step: 98070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:39,562-Speed 12913.39 samples/sec Loss 3.4237 LearningRate 0.0002 Epoch: 38 Global Step: 98080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:41,133-Speed 13051.97 samples/sec Loss 3.4252 LearningRate 0.0002 Epoch: 38 Global Step: 98090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:30:42,721-Speed 12907.15 samples/sec Loss 3.4033 LearningRate 0.0002 Epoch: 38 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:44,291-Speed 13056.99 samples/sec Loss 3.3211 LearningRate 0.0002 Epoch: 38 Global Step: 98110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:45,846-Speed 13182.24 samples/sec Loss 3.4734 LearningRate 0.0002 Epoch: 38 Global Step: 98120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:47,418-Speed 13030.12 samples/sec Loss 3.4086 LearningRate 0.0002 Epoch: 38 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:48,981-Speed 13117.67 samples/sec Loss 3.3891 LearningRate 0.0002 Epoch: 38 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:50,582-Speed 12794.55 samples/sec Loss 3.3554 LearningRate 0.0002 Epoch: 38 Global Step: 98150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:52,151-Speed 13072.14 samples/sec Loss 3.3874 LearningRate 0.0002 Epoch: 38 Global Step: 98160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:53,718-Speed 13076.76 samples/sec Loss 3.3920 LearningRate 0.0002 Epoch: 38 Global Step: 98170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:55,352-Speed 12545.85 samples/sec Loss 3.3942 LearningRate 0.0002 Epoch: 38 Global Step: 98180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:56,900-Speed 13251.68 samples/sec Loss 3.4327 LearningRate 0.0002 Epoch: 38 Global Step: 98190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:30:58,478-Speed 12983.66 samples/sec Loss 3.3552 LearningRate 0.0002 Epoch: 38 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:00,036-Speed 13152.50 samples/sec Loss 3.4446 LearningRate 0.0002 Epoch: 38 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:01,602-Speed 13089.77 samples/sec Loss 3.3320 LearningRate 0.0002 Epoch: 38 Global Step: 98220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:03,210-Speed 12741.09 samples/sec Loss 3.4029 LearningRate 0.0002 Epoch: 38 Global Step: 98230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:04,777-Speed 13074.56 samples/sec Loss 3.3211 LearningRate 0.0002 Epoch: 38 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:06,338-Speed 13137.86 samples/sec Loss 3.3703 LearningRate 0.0002 Epoch: 38 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:07,904-Speed 13084.75 samples/sec Loss 3.4149 LearningRate 0.0002 Epoch: 38 Global Step: 98260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:09,458-Speed 13187.18 samples/sec Loss 3.4441 LearningRate 0.0002 Epoch: 38 Global Step: 98270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:11,029-Speed 13045.51 samples/sec Loss 3.4279 LearningRate 0.0002 Epoch: 38 Global Step: 98280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:12,611-Speed 12952.65 samples/sec Loss 3.4148 LearningRate 0.0002 Epoch: 38 Global Step: 98290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:14,176-Speed 13100.66 samples/sec Loss 3.4718 LearningRate 0.0002 Epoch: 38 Global Step: 98300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:15,754-Speed 12985.93 samples/sec Loss 3.3940 LearningRate 0.0002 Epoch: 38 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:17,314-Speed 13140.73 samples/sec Loss 3.4754 LearningRate 0.0002 Epoch: 38 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:18,874-Speed 13135.49 samples/sec Loss 3.4352 LearningRate 0.0002 Epoch: 38 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:20,479-Speed 12770.52 samples/sec Loss 3.3783 LearningRate 0.0002 Epoch: 38 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:22,044-Speed 13096.52 samples/sec Loss 3.4097 LearningRate 0.0002 Epoch: 38 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:23,580-Speed 13342.32 samples/sec Loss 3.3343 LearningRate 0.0002 Epoch: 38 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:25,133-Speed 13196.22 samples/sec Loss 3.4247 LearningRate 0.0002 Epoch: 38 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:26,722-Speed 12900.75 samples/sec Loss 3.4187 LearningRate 0.0002 Epoch: 38 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:28,295-Speed 13031.46 samples/sec Loss 3.3195 LearningRate 0.0002 Epoch: 38 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:29,870-Speed 13010.94 samples/sec Loss 3.4046 LearningRate 0.0002 Epoch: 38 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:31,466-Speed 12846.65 samples/sec Loss 3.3742 LearningRate 0.0002 Epoch: 38 Global Step: 98410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:33,013-Speed 13251.35 samples/sec Loss 3.3537 LearningRate 0.0002 Epoch: 38 Global Step: 98420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:34,561-Speed 13234.98 samples/sec Loss 3.3473 LearningRate 0.0002 Epoch: 38 Global Step: 98430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:36,141-Speed 12976.56 samples/sec Loss 3.4130 LearningRate 0.0002 Epoch: 38 Global Step: 98440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:37,713-Speed 13039.95 samples/sec Loss 3.3278 LearningRate 0.0002 Epoch: 38 Global Step: 98450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:39,274-Speed 13126.70 samples/sec Loss 3.3700 LearningRate 0.0002 Epoch: 38 Global Step: 98460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:40,910-Speed 12530.66 samples/sec Loss 3.3870 LearningRate 0.0002 Epoch: 38 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:42,478-Speed 13064.15 samples/sec Loss 3.4356 LearningRate 0.0002 Epoch: 38 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:44,063-Speed 12931.36 samples/sec Loss 3.4090 LearningRate 0.0002 Epoch: 38 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:45,633-Speed 13057.63 samples/sec Loss 3.4678 LearningRate 0.0002 Epoch: 38 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:47,260-Speed 12641.38 samples/sec Loss 3.4043 LearningRate 0.0002 Epoch: 38 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:31:48,799-Speed 13319.40 samples/sec Loss 3.4198 LearningRate 0.0002 Epoch: 38 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:50,345-Speed 13262.52 samples/sec Loss 3.4023 LearningRate 0.0001 Epoch: 38 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:51,927-Speed 12953.94 samples/sec Loss 3.4226 LearningRate 0.0001 Epoch: 38 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:53,494-Speed 13085.67 samples/sec Loss 3.4987 LearningRate 0.0001 Epoch: 38 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:55,060-Speed 13087.64 samples/sec Loss 3.4046 LearningRate 0.0001 Epoch: 38 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:56,617-Speed 13158.44 samples/sec Loss 3.4013 LearningRate 0.0001 Epoch: 38 Global Step: 98570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:58,209-Speed 12874.28 samples/sec Loss 3.3701 LearningRate 0.0001 Epoch: 38 Global Step: 98580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:31:59,779-Speed 13053.06 samples/sec Loss 3.3646 LearningRate 0.0001 Epoch: 38 Global Step: 98590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:01,344-Speed 13103.01 samples/sec Loss 3.4033 LearningRate 0.0001 Epoch: 38 Global Step: 98600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:02,905-Speed 13125.49 samples/sec Loss 3.4987 LearningRate 0.0001 Epoch: 38 Global Step: 98610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:04,514-Speed 12738.81 samples/sec Loss 3.3292 LearningRate 0.0001 Epoch: 38 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:06,011-Speed 13689.68 samples/sec Loss 3.4465 LearningRate 0.0001 Epoch: 38 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:20,347-Speed 1428.76 samples/sec Loss 3.3820 LearningRate 0.0001 Epoch: 39 Global Step: 98640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:21,946-Speed 12823.70 samples/sec Loss 3.3242 LearningRate 0.0001 Epoch: 39 Global Step: 98650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:23,541-Speed 12856.22 samples/sec Loss 3.3915 LearningRate 0.0001 Epoch: 39 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:25,133-Speed 12886.88 samples/sec Loss 3.3605 LearningRate 0.0001 Epoch: 39 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:26,700-Speed 13078.75 samples/sec Loss 3.3203 LearningRate 0.0001 Epoch: 39 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:28,292-Speed 12872.36 samples/sec Loss 3.3642 LearningRate 0.0001 Epoch: 39 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:29,847-Speed 13186.06 samples/sec Loss 3.3358 LearningRate 0.0001 Epoch: 39 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:31,419-Speed 13028.67 samples/sec Loss 3.3576 LearningRate 0.0001 Epoch: 39 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:33,024-Speed 12770.02 samples/sec Loss 3.4186 LearningRate 0.0001 Epoch: 39 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:32:34,613-Speed 12900.07 samples/sec Loss 3.4283 LearningRate 0.0001 Epoch: 39 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:32:36,197-Speed 12941.73 samples/sec Loss 3.3813 LearningRate 0.0001 Epoch: 39 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:37,769-Speed 13037.24 samples/sec Loss 3.4364 LearningRate 0.0001 Epoch: 39 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:39,349-Speed 12972.85 samples/sec Loss 3.3935 LearningRate 0.0001 Epoch: 39 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:40,935-Speed 12927.72 samples/sec Loss 3.2444 LearningRate 0.0001 Epoch: 39 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:42,567-Speed 12559.29 samples/sec Loss 3.3573 LearningRate 0.0001 Epoch: 39 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:44,130-Speed 13112.10 samples/sec Loss 3.3961 LearningRate 0.0001 Epoch: 39 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:45,703-Speed 13028.92 samples/sec Loss 3.3791 LearningRate 0.0001 Epoch: 39 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:47,315-Speed 12712.23 samples/sec Loss 3.3934 LearningRate 0.0001 Epoch: 39 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:48,873-Speed 13162.01 samples/sec Loss 3.3261 LearningRate 0.0001 Epoch: 39 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:50,460-Speed 12908.80 samples/sec Loss 3.3478 LearningRate 0.0001 Epoch: 39 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:32:52,055-Speed 12858.21 samples/sec Loss 3.3081 LearningRate 0.0001 Epoch: 39 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:32:53,673-Speed 12660.86 samples/sec Loss 3.4224 LearningRate 0.0001 Epoch: 39 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:32:55,235-Speed 13119.17 samples/sec Loss 3.3913 LearningRate 0.0001 Epoch: 39 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:32:56,811-Speed 13012.96 samples/sec Loss 3.4194 LearningRate 0.0001 Epoch: 39 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:32:58,400-Speed 12900.96 samples/sec Loss 3.4733 LearningRate 0.0001 Epoch: 39 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:32:59,989-Speed 12894.02 samples/sec Loss 3.5026 LearningRate 0.0001 Epoch: 39 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:01,573-Speed 12947.61 samples/sec Loss 3.3182 LearningRate 0.0001 Epoch: 39 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:03,165-Speed 12876.52 samples/sec Loss 3.2583 LearningRate 0.0001 Epoch: 39 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:04,765-Speed 12805.65 samples/sec Loss 3.3868 LearningRate 0.0001 Epoch: 39 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:06,361-Speed 12847.32 samples/sec Loss 3.3680 LearningRate 0.0001 Epoch: 39 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:07,940-Speed 12983.74 samples/sec Loss 3.3873 LearningRate 0.0001 Epoch: 39 Global Step: 98940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:09,491-Speed 13216.30 samples/sec Loss 3.2804 LearningRate 0.0001 Epoch: 39 Global Step: 98950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:11,081-Speed 12890.26 samples/sec Loss 3.3638 LearningRate 0.0001 Epoch: 39 Global Step: 98960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:12,651-Speed 13053.33 samples/sec Loss 3.3442 LearningRate 0.0001 Epoch: 39 Global Step: 98970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:14,219-Speed 13073.68 samples/sec Loss 3.3781 LearningRate 0.0001 Epoch: 39 Global Step: 98980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:15,773-Speed 13185.29 samples/sec Loss 3.4198 LearningRate 0.0001 Epoch: 39 Global Step: 98990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:17,323-Speed 13222.76 samples/sec Loss 3.3938 LearningRate 0.0001 Epoch: 39 Global Step: 99000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:18,945-Speed 12643.12 samples/sec Loss 3.3346 LearningRate 0.0001 Epoch: 39 Global Step: 99010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:20,515-Speed 13046.01 samples/sec Loss 3.3067 LearningRate 0.0001 Epoch: 39 Global Step: 99020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:22,145-Speed 12582.12 samples/sec Loss 3.3156 LearningRate 0.0001 Epoch: 39 Global Step: 99030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:23,734-Speed 12901.10 samples/sec Loss 3.3713 LearningRate 0.0001 Epoch: 39 Global Step: 99040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:25,294-Speed 13134.51 samples/sec Loss 3.3520 LearningRate 0.0001 Epoch: 39 Global Step: 99050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:26,873-Speed 12997.36 samples/sec Loss 3.3538 LearningRate 0.0001 Epoch: 39 Global Step: 99060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:28,439-Speed 13088.41 samples/sec Loss 3.4162 LearningRate 0.0001 Epoch: 39 Global Step: 99070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:30,004-Speed 13094.88 samples/sec Loss 3.4390 LearningRate 0.0001 Epoch: 39 Global Step: 99080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:31,607-Speed 12786.94 samples/sec Loss 3.4389 LearningRate 0.0001 Epoch: 39 Global Step: 99090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:33:33,182-Speed 13016.84 samples/sec Loss 3.3765 LearningRate 0.0001 Epoch: 39 Global Step: 99100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:34,734-Speed 13203.55 samples/sec Loss 3.4271 LearningRate 0.0001 Epoch: 39 Global Step: 99110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:36,330-Speed 12837.65 samples/sec Loss 3.3586 LearningRate 0.0001 Epoch: 39 Global Step: 99120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:37,894-Speed 13110.41 samples/sec Loss 3.4051 LearningRate 0.0001 Epoch: 39 Global Step: 99130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:39,550-Speed 12374.30 samples/sec Loss 3.4864 LearningRate 0.0001 Epoch: 39 Global Step: 99140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:41,171-Speed 12644.10 samples/sec Loss 3.3782 LearningRate 0.0001 Epoch: 39 Global Step: 99150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:42,743-Speed 13036.11 samples/sec Loss 3.3467 LearningRate 0.0001 Epoch: 39 Global Step: 99160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:44,332-Speed 12897.62 samples/sec Loss 3.3874 LearningRate 0.0001 Epoch: 39 Global Step: 99170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:45,897-Speed 13092.05 samples/sec Loss 3.4269 LearningRate 0.0001 Epoch: 39 Global Step: 99180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:47,491-Speed 12864.65 samples/sec Loss 3.3396 LearningRate 0.0001 Epoch: 39 Global Step: 99190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:49,067-Speed 13001.86 samples/sec Loss 3.3687 LearningRate 0.0001 Epoch: 39 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:50,669-Speed 12796.41 samples/sec Loss 3.4524 LearningRate 0.0001 Epoch: 39 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:52,268-Speed 12820.55 samples/sec Loss 3.4180 LearningRate 0.0001 Epoch: 39 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:53,825-Speed 13164.56 samples/sec Loss 3.3966 LearningRate 0.0001 Epoch: 39 Global Step: 99230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:55,461-Speed 12526.96 samples/sec Loss 3.3462 LearningRate 0.0001 Epoch: 39 Global Step: 99240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:33:56,996-Speed 13350.55 samples/sec Loss 3.3625 LearningRate 0.0001 Epoch: 39 Global Step: 99250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:33:58,569-Speed 13027.20 samples/sec Loss 3.3973 LearningRate 0.0001 Epoch: 39 Global Step: 99260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:00,176-Speed 12746.68 samples/sec Loss 3.3312 LearningRate 0.0001 Epoch: 39 Global Step: 99270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:01,735-Speed 13157.99 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 39 Global Step: 99280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:03,299-Speed 13103.68 samples/sec Loss 3.4603 LearningRate 0.0001 Epoch: 39 Global Step: 99290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:04,905-Speed 12762.43 samples/sec Loss 3.4219 LearningRate 0.0001 Epoch: 39 Global Step: 99300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:06,463-Speed 13169.35 samples/sec Loss 3.3768 LearningRate 0.0001 Epoch: 39 Global Step: 99310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:08,028-Speed 13093.17 samples/sec Loss 3.3608 LearningRate 0.0001 Epoch: 39 Global Step: 99320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:09,671-Speed 12474.31 samples/sec Loss 3.3499 LearningRate 0.0001 Epoch: 39 Global Step: 99330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:11,244-Speed 13031.17 samples/sec Loss 3.4042 LearningRate 0.0001 Epoch: 39 Global Step: 99340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:12,810-Speed 13085.31 samples/sec Loss 3.2880 LearningRate 0.0001 Epoch: 39 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:34:14,433-Speed 12628.66 samples/sec Loss 3.3463 LearningRate 0.0001 Epoch: 39 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:34:16,009-Speed 12998.66 samples/sec Loss 3.3787 LearningRate 0.0001 Epoch: 39 Global Step: 99370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:34:17,576-Speed 13079.59 samples/sec Loss 3.3881 LearningRate 0.0001 Epoch: 39 Global Step: 99380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:19,150-Speed 13025.18 samples/sec Loss 3.3846 LearningRate 0.0001 Epoch: 39 Global Step: 99390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:20,737-Speed 12913.97 samples/sec Loss 3.3096 LearningRate 0.0001 Epoch: 39 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:22,351-Speed 12712.85 samples/sec Loss 3.3265 LearningRate 0.0001 Epoch: 39 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:23,906-Speed 13199.52 samples/sec Loss 3.4003 LearningRate 0.0001 Epoch: 39 Global Step: 99420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:25,541-Speed 12527.33 samples/sec Loss 3.3563 LearningRate 0.0001 Epoch: 39 Global Step: 99430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:27,100-Speed 13159.35 samples/sec Loss 3.4579 LearningRate 0.0001 Epoch: 39 Global Step: 99440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:28,667-Speed 13080.46 samples/sec Loss 3.3845 LearningRate 0.0001 Epoch: 39 Global Step: 99450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:30,281-Speed 12695.81 samples/sec Loss 3.3032 LearningRate 0.0001 Epoch: 39 Global Step: 99460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:31,825-Speed 13280.15 samples/sec Loss 3.3533 LearningRate 0.0001 Epoch: 39 Global Step: 99470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:33,383-Speed 13153.38 samples/sec Loss 3.3386 LearningRate 0.0001 Epoch: 39 Global Step: 99480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:34,983-Speed 12803.68 samples/sec Loss 3.3479 LearningRate 0.0001 Epoch: 39 Global Step: 99490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:36,543-Speed 13144.52 samples/sec Loss 3.3634 LearningRate 0.0001 Epoch: 39 Global Step: 99500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:38,130-Speed 12908.41 samples/sec Loss 3.4161 LearningRate 0.0001 Epoch: 39 Global Step: 99510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:39,735-Speed 12770.06 samples/sec Loss 3.3816 LearningRate 0.0001 Epoch: 39 Global Step: 99520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:41,302-Speed 13080.27 samples/sec Loss 3.3591 LearningRate 0.0001 Epoch: 39 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:42,836-Speed 13359.60 samples/sec Loss 3.2602 LearningRate 0.0001 Epoch: 39 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:44,461-Speed 12611.62 samples/sec Loss 3.4151 LearningRate 0.0001 Epoch: 39 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:46,025-Speed 13105.18 samples/sec Loss 3.3959 LearningRate 0.0001 Epoch: 39 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:47,641-Speed 12677.05 samples/sec Loss 3.4185 LearningRate 0.0001 Epoch: 39 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:49,216-Speed 13019.91 samples/sec Loss 3.4864 LearningRate 0.0001 Epoch: 39 Global Step: 99580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:34:50,773-Speed 13165.54 samples/sec Loss 3.2879 LearningRate 0.0001 Epoch: 39 Global Step: 99590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:34:52,336-Speed 13107.75 samples/sec Loss 3.3631 LearningRate 0.0001 Epoch: 39 Global Step: 99600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:34:53,937-Speed 12819.84 samples/sec Loss 3.4334 LearningRate 0.0001 Epoch: 39 Global Step: 99610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:55,512-Speed 13018.22 samples/sec Loss 3.3937 LearningRate 0.0001 Epoch: 39 Global Step: 99620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:57,067-Speed 13175.35 samples/sec Loss 3.3477 LearningRate 0.0001 Epoch: 39 Global Step: 99630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:34:58,682-Speed 12693.41 samples/sec Loss 3.3604 LearningRate 0.0001 Epoch: 39 Global Step: 99640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:00,268-Speed 12920.53 samples/sec Loss 3.3693 LearningRate 0.0000 Epoch: 39 Global Step: 99650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:01,860-Speed 12874.63 samples/sec Loss 3.4224 LearningRate 0.0000 Epoch: 39 Global Step: 99660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:03,476-Speed 12688.32 samples/sec Loss 3.4022 LearningRate 0.0000 Epoch: 39 Global Step: 99670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:05,063-Speed 12911.76 samples/sec Loss 3.4920 LearningRate 0.0000 Epoch: 39 Global Step: 99680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:06,679-Speed 12689.10 samples/sec Loss 3.3514 LearningRate 0.0000 Epoch: 39 Global Step: 99690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:08,242-Speed 13115.54 samples/sec Loss 3.4584 LearningRate 0.0000 Epoch: 39 Global Step: 99700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:09,804-Speed 13119.97 samples/sec Loss 3.4882 LearningRate 0.0000 Epoch: 39 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:35:11,333-Speed 13409.23 samples/sec Loss 3.4249 LearningRate 0.0000 Epoch: 39 Global Step: 99720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:12,938-Speed 12762.48 samples/sec Loss 3.3454 LearningRate 0.0000 Epoch: 39 Global Step: 99730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:14,510-Speed 13043.10 samples/sec Loss 3.3600 LearningRate 0.0000 Epoch: 39 Global Step: 99740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:16,076-Speed 13083.03 samples/sec Loss 3.3867 LearningRate 0.0000 Epoch: 39 Global Step: 99750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:17,632-Speed 13175.61 samples/sec Loss 3.3354 LearningRate 0.0000 Epoch: 39 Global Step: 99760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:19,265-Speed 12544.24 samples/sec Loss 3.3577 LearningRate 0.0000 Epoch: 39 Global Step: 99770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:20,841-Speed 13002.35 samples/sec Loss 3.2795 LearningRate 0.0000 Epoch: 39 Global Step: 99780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:22,417-Speed 13011.44 samples/sec Loss 3.4040 LearningRate 0.0000 Epoch: 39 Global Step: 99790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:24,009-Speed 12873.33 samples/sec Loss 3.3459 LearningRate 0.0000 Epoch: 39 Global Step: 99800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:25,577-Speed 13065.89 samples/sec Loss 3.2754 LearningRate 0.0000 Epoch: 39 Global Step: 99810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:27,168-Speed 12890.86 samples/sec Loss 3.3042 LearningRate 0.0000 Epoch: 39 Global Step: 99820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:28,741-Speed 13031.57 samples/sec Loss 3.3587 LearningRate 0.0000 Epoch: 39 Global Step: 99830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:30,310-Speed 13059.03 samples/sec Loss 3.4362 LearningRate 0.0000 Epoch: 39 Global Step: 99840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:31,858-Speed 13243.39 samples/sec Loss 3.3579 LearningRate 0.0000 Epoch: 39 Global Step: 99850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:33,424-Speed 13089.37 samples/sec Loss 3.4511 LearningRate 0.0000 Epoch: 39 Global Step: 99860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:34,993-Speed 13061.93 samples/sec Loss 3.3758 LearningRate 0.0000 Epoch: 39 Global Step: 99870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:36,557-Speed 13105.51 samples/sec Loss 3.3851 LearningRate 0.0000 Epoch: 39 Global Step: 99880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:38,141-Speed 12933.25 samples/sec Loss 3.3782 LearningRate 0.0000 Epoch: 39 Global Step: 99890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:39,731-Speed 12893.04 samples/sec Loss 3.3324 LearningRate 0.0000 Epoch: 39 Global Step: 99900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:41,352-Speed 12654.32 samples/sec Loss 3.4271 LearningRate 0.0000 Epoch: 39 Global Step: 99910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:42,906-Speed 13194.53 samples/sec Loss 3.4183 LearningRate 0.0000 Epoch: 39 Global Step: 99920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:44,509-Speed 12783.01 samples/sec Loss 3.4136 LearningRate 0.0000 Epoch: 39 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:35:46,049-Speed 13306.37 samples/sec Loss 3.3694 LearningRate 0.0000 Epoch: 39 Global Step: 99940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:35:47,607-Speed 13156.63 samples/sec Loss 3.4085 LearningRate 0.0000 Epoch: 39 Global Step: 99950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:49,252-Speed 12453.31 samples/sec Loss 3.4189 LearningRate 0.0000 Epoch: 39 Global Step: 99960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:50,826-Speed 13023.57 samples/sec Loss 3.3679 LearningRate 0.0000 Epoch: 39 Global Step: 99970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:52,360-Speed 13363.99 samples/sec Loss 3.3176 LearningRate 0.0000 Epoch: 39 Global Step: 99980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:53,980-Speed 12657.72 samples/sec Loss 3.4782 LearningRate 0.0000 Epoch: 39 Global Step: 99990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:35:55,557-Speed 12999.64 samples/sec Loss 3.4124 LearningRate 0.0000 Epoch: 39 Global Step: 100000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:36:19,049-[lfw][100000]XNorm: 7.120917 Training: 2022-01-14 18:36:19,050-[lfw][100000]Accuracy-Flip: 0.99583+-0.00367 Training: 2022-01-14 18:36:19,051-[lfw][100000]Accuracy-Highest: 0.99650 Training: 2022-01-14 18:36:46,353-[cfp_fp][100000]XNorm: 6.061482 Training: 2022-01-14 18:36:46,354-[cfp_fp][100000]Accuracy-Flip: 0.97086+-0.00911 Training: 2022-01-14 18:36:46,355-[cfp_fp][100000]Accuracy-Highest: 0.97086 Training: 2022-01-14 18:37:08,607-[agedb_30][100000]XNorm: 6.889985 Training: 2022-01-14 18:37:08,608-[agedb_30][100000]Accuracy-Flip: 0.96933+-0.00629 Training: 2022-01-14 18:37:08,608-[agedb_30][100000]Accuracy-Highest: 0.97100 Training: 2022-01-14 18:37:10,173-Speed 274.48 samples/sec Loss 3.3821 LearningRate 0.0000 Epoch: 39 Global Step: 100010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:37:11,736-Speed 13116.50 samples/sec Loss 3.3874 LearningRate 0.0000 Epoch: 39 Global Step: 100020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:37:13,274-Speed 13328.91 samples/sec Loss 3.4122 LearningRate 0.0000 Epoch: 39 Global Step: 100030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:37:14,884-Speed 12733.65 samples/sec Loss 3.4327 LearningRate 0.0000 Epoch: 39 Global Step: 100040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:37:16,441-Speed 13163.16 samples/sec Loss 3.4110 LearningRate 0.0000 Epoch: 39 Global Step: 100050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:18,004-Speed 13118.75 samples/sec Loss 3.3991 LearningRate 0.0000 Epoch: 39 Global Step: 100060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:19,602-Speed 12843.22 samples/sec Loss 3.3787 LearningRate 0.0000 Epoch: 39 Global Step: 100070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:21,138-Speed 13358.42 samples/sec Loss 3.3351 LearningRate 0.0000 Epoch: 39 Global Step: 100080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:22,740-Speed 12816.81 samples/sec Loss 3.4869 LearningRate 0.0000 Epoch: 39 Global Step: 100090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:24,308-Speed 13071.81 samples/sec Loss 3.3916 LearningRate 0.0000 Epoch: 39 Global Step: 100100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:25,909-Speed 12803.51 samples/sec Loss 3.4235 LearningRate 0.0000 Epoch: 39 Global Step: 100110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:27,493-Speed 12936.70 samples/sec Loss 3.3664 LearningRate 0.0000 Epoch: 39 Global Step: 100120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:29,044-Speed 13217.54 samples/sec Loss 3.3503 LearningRate 0.0000 Epoch: 39 Global Step: 100130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:30,623-Speed 12981.95 samples/sec Loss 3.3696 LearningRate 0.0000 Epoch: 39 Global Step: 100140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:32,191-Speed 13072.04 samples/sec Loss 3.4100 LearningRate 0.0000 Epoch: 39 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:37:33,775-Speed 12934.30 samples/sec Loss 3.3604 LearningRate 0.0000 Epoch: 39 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:37:35,365-Speed 12890.97 samples/sec Loss 3.3359 LearningRate 0.0000 Epoch: 39 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:37:36,917-Speed 13210.07 samples/sec Loss 3.3578 LearningRate 0.0000 Epoch: 39 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:37:38,509-Speed 12872.74 samples/sec Loss 3.4133 LearningRate 0.0000 Epoch: 39 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:37:40,102-Speed 12865.41 samples/sec Loss 3.3632 LearningRate 0.0000 Epoch: 39 Global Step: 100200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:41,692-Speed 12887.46 samples/sec Loss 3.3704 LearningRate 0.0000 Epoch: 39 Global Step: 100210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:43,257-Speed 13101.02 samples/sec Loss 3.4079 LearningRate 0.0000 Epoch: 39 Global Step: 100220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:44,843-Speed 12923.45 samples/sec Loss 3.3242 LearningRate 0.0000 Epoch: 39 Global Step: 100230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:46,422-Speed 12976.21 samples/sec Loss 3.3547 LearningRate 0.0000 Epoch: 39 Global Step: 100240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:47,972-Speed 13224.78 samples/sec Loss 3.4025 LearningRate 0.0000 Epoch: 39 Global Step: 100250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:49,573-Speed 12799.09 samples/sec Loss 3.4284 LearningRate 0.0000 Epoch: 39 Global Step: 100260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:51,144-Speed 13047.04 samples/sec Loss 3.4080 LearningRate 0.0000 Epoch: 39 Global Step: 100270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:52,730-Speed 12920.75 samples/sec Loss 3.3438 LearningRate 0.0000 Epoch: 39 Global Step: 100280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:54,275-Speed 13272.55 samples/sec Loss 3.3504 LearningRate 0.0000 Epoch: 39 Global Step: 100290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:55,839-Speed 13097.35 samples/sec Loss 3.4773 LearningRate 0.0000 Epoch: 39 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:37:57,440-Speed 12798.29 samples/sec Loss 3.4227 LearningRate 0.0000 Epoch: 39 Global Step: 100310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:37:59,006-Speed 13095.56 samples/sec Loss 3.3968 LearningRate 0.0000 Epoch: 39 Global Step: 100320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:00,577-Speed 13039.35 samples/sec Loss 3.3758 LearningRate 0.0000 Epoch: 39 Global Step: 100330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:02,191-Speed 12700.42 samples/sec Loss 3.4331 LearningRate 0.0000 Epoch: 39 Global Step: 100340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:03,760-Speed 13057.37 samples/sec Loss 3.3208 LearningRate 0.0000 Epoch: 39 Global Step: 100350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:05,347-Speed 12923.41 samples/sec Loss 3.4244 LearningRate 0.0000 Epoch: 39 Global Step: 100360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:06,901-Speed 13188.02 samples/sec Loss 3.4073 LearningRate 0.0000 Epoch: 39 Global Step: 100370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:08,487-Speed 12929.30 samples/sec Loss 3.4169 LearningRate 0.0000 Epoch: 39 Global Step: 100380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:10,039-Speed 13202.75 samples/sec Loss 3.4362 LearningRate 0.0000 Epoch: 39 Global Step: 100390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:11,612-Speed 13023.53 samples/sec Loss 3.4486 LearningRate 0.0000 Epoch: 39 Global Step: 100400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:13,236-Speed 12621.62 samples/sec Loss 3.3593 LearningRate 0.0000 Epoch: 39 Global Step: 100410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:38:14,784-Speed 13238.38 samples/sec Loss 3.3481 LearningRate 0.0000 Epoch: 39 Global Step: 100420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:16,373-Speed 12899.30 samples/sec Loss 3.3912 LearningRate 0.0000 Epoch: 39 Global Step: 100430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:17,967-Speed 12849.20 samples/sec Loss 3.4194 LearningRate 0.0000 Epoch: 39 Global Step: 100440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:19,538-Speed 13056.59 samples/sec Loss 3.3215 LearningRate 0.0000 Epoch: 39 Global Step: 100450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:21,154-Speed 12678.30 samples/sec Loss 3.3674 LearningRate 0.0000 Epoch: 39 Global Step: 100460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:22,717-Speed 13118.78 samples/sec Loss 3.3489 LearningRate 0.0000 Epoch: 39 Global Step: 100470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:24,316-Speed 12820.57 samples/sec Loss 3.3487 LearningRate 0.0000 Epoch: 39 Global Step: 100480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:25,885-Speed 13058.46 samples/sec Loss 3.3376 LearningRate 0.0000 Epoch: 39 Global Step: 100490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:27,479-Speed 12851.55 samples/sec Loss 3.3982 LearningRate 0.0000 Epoch: 39 Global Step: 100500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:29,041-Speed 13122.40 samples/sec Loss 3.3057 LearningRate 0.0000 Epoch: 39 Global Step: 100510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:30,590-Speed 13236.86 samples/sec Loss 3.4197 LearningRate 0.0000 Epoch: 39 Global Step: 100520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:32,179-Speed 12898.06 samples/sec Loss 3.4143 LearningRate 0.0000 Epoch: 39 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:33,738-Speed 13148.88 samples/sec Loss 3.4357 LearningRate 0.0000 Epoch: 39 Global Step: 100540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:35,343-Speed 12765.68 samples/sec Loss 3.4178 LearningRate 0.0000 Epoch: 39 Global Step: 100550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:36,911-Speed 13076.34 samples/sec Loss 3.3648 LearningRate 0.0000 Epoch: 39 Global Step: 100560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:38,485-Speed 13028.30 samples/sec Loss 3.4187 LearningRate 0.0000 Epoch: 39 Global Step: 100570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:40,078-Speed 12864.05 samples/sec Loss 3.3371 LearningRate 0.0000 Epoch: 39 Global Step: 100580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:41,659-Speed 12955.91 samples/sec Loss 3.3078 LearningRate 0.0000 Epoch: 39 Global Step: 100590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:43,241-Speed 12969.64 samples/sec Loss 3.3622 LearningRate 0.0000 Epoch: 39 Global Step: 100600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:44,807-Speed 13087.32 samples/sec Loss 3.2923 LearningRate 0.0000 Epoch: 39 Global Step: 100610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:46,380-Speed 13027.68 samples/sec Loss 3.3746 LearningRate 0.0000 Epoch: 39 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:38:47,935-Speed 13174.66 samples/sec Loss 3.3900 LearningRate 0.0000 Epoch: 39 Global Step: 100630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:49,506-Speed 13046.93 samples/sec Loss 3.3867 LearningRate 0.0000 Epoch: 39 Global Step: 100640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:51,087-Speed 12968.40 samples/sec Loss 3.4059 LearningRate 0.0000 Epoch: 39 Global Step: 100650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:52,667-Speed 12970.50 samples/sec Loss 3.4514 LearningRate 0.0000 Epoch: 39 Global Step: 100660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:54,276-Speed 12742.85 samples/sec Loss 3.3952 LearningRate 0.0000 Epoch: 39 Global Step: 100670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:55,850-Speed 13016.20 samples/sec Loss 3.4362 LearningRate 0.0000 Epoch: 39 Global Step: 100680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:57,427-Speed 12995.51 samples/sec Loss 3.4443 LearningRate 0.0000 Epoch: 39 Global Step: 100690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:38:59,047-Speed 12653.94 samples/sec Loss 3.3964 LearningRate 0.0000 Epoch: 39 Global Step: 100700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:00,606-Speed 13150.88 samples/sec Loss 3.3937 LearningRate 0.0000 Epoch: 39 Global Step: 100710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:02,200-Speed 12856.41 samples/sec Loss 3.3629 LearningRate 0.0000 Epoch: 39 Global Step: 100720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:03,781-Speed 12964.90 samples/sec Loss 3.2976 LearningRate 0.0000 Epoch: 39 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:39:05,356-Speed 13017.31 samples/sec Loss 3.4095 LearningRate 0.0000 Epoch: 39 Global Step: 100740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:39:06,905-Speed 13226.68 samples/sec Loss 3.4072 LearningRate 0.0000 Epoch: 39 Global Step: 100750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:08,469-Speed 13105.79 samples/sec Loss 3.3590 LearningRate 0.0000 Epoch: 39 Global Step: 100760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:10,048-Speed 12982.29 samples/sec Loss 3.4009 LearningRate 0.0000 Epoch: 39 Global Step: 100770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:11,706-Speed 12357.51 samples/sec Loss 3.4328 LearningRate 0.0000 Epoch: 39 Global Step: 100780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:13,270-Speed 13104.34 samples/sec Loss 3.3727 LearningRate 0.0000 Epoch: 39 Global Step: 100790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:14,828-Speed 13165.76 samples/sec Loss 3.3920 LearningRate 0.0000 Epoch: 39 Global Step: 100800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:16,381-Speed 13199.68 samples/sec Loss 3.3965 LearningRate 0.0000 Epoch: 39 Global Step: 100810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:17,977-Speed 12841.21 samples/sec Loss 3.3738 LearningRate 0.0000 Epoch: 39 Global Step: 100820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:19,539-Speed 13126.95 samples/sec Loss 3.3905 LearningRate 0.0000 Epoch: 39 Global Step: 100830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:21,121-Speed 12952.64 samples/sec Loss 3.3994 LearningRate 0.0000 Epoch: 39 Global Step: 100840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:22,690-Speed 13060.82 samples/sec Loss 3.4499 LearningRate 0.0000 Epoch: 39 Global Step: 100850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:24,262-Speed 13040.87 samples/sec Loss 3.3462 LearningRate 0.0000 Epoch: 39 Global Step: 100860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:25,843-Speed 12966.72 samples/sec Loss 3.3882 LearningRate 0.0000 Epoch: 39 Global Step: 100870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:27,401-Speed 13150.32 samples/sec Loss 3.4614 LearningRate 0.0000 Epoch: 39 Global Step: 100880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 18:39:29,014-Speed 12715.90 samples/sec Loss 3.4208 LearningRate 0.0000 Epoch: 39 Global Step: 100890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:30,583-Speed 13057.93 samples/sec Loss 3.2467 LearningRate 0.0000 Epoch: 39 Global Step: 100900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:32,166-Speed 12941.49 samples/sec Loss 3.3950 LearningRate 0.0000 Epoch: 39 Global Step: 100910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:33,737-Speed 13052.09 samples/sec Loss 3.3929 LearningRate 0.0000 Epoch: 39 Global Step: 100920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:35,287-Speed 13225.12 samples/sec Loss 3.3120 LearningRate 0.0000 Epoch: 39 Global Step: 100930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:36,858-Speed 13039.48 samples/sec Loss 3.3829 LearningRate 0.0000 Epoch: 39 Global Step: 100940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:38,480-Speed 12639.52 samples/sec Loss 3.2934 LearningRate 0.0000 Epoch: 39 Global Step: 100950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:40,054-Speed 13025.80 samples/sec Loss 3.3749 LearningRate 0.0000 Epoch: 39 Global Step: 100960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:41,649-Speed 12842.24 samples/sec Loss 3.3887 LearningRate 0.0000 Epoch: 39 Global Step: 100970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:43,224-Speed 13013.55 samples/sec Loss 3.4573 LearningRate 0.0000 Epoch: 39 Global Step: 100980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:44,788-Speed 13106.31 samples/sec Loss 3.3910 LearningRate 0.0000 Epoch: 39 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:39:46,376-Speed 12915.25 samples/sec Loss 3.3960 LearningRate 0.0000 Epoch: 39 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:39:47,939-Speed 13123.49 samples/sec Loss 3.3433 LearningRate 0.0000 Epoch: 39 Global Step: 101010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:39:49,528-Speed 12898.97 samples/sec Loss 3.3548 LearningRate 0.0000 Epoch: 39 Global Step: 101020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:39:51,087-Speed 13142.41 samples/sec Loss 3.4161 LearningRate 0.0000 Epoch: 39 Global Step: 101030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:39:52,646-Speed 13158.41 samples/sec Loss 3.4161 LearningRate 0.0000 Epoch: 39 Global Step: 101040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:54,263-Speed 12683.44 samples/sec Loss 3.3488 LearningRate 0.0000 Epoch: 39 Global Step: 101050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:55,832-Speed 13057.82 samples/sec Loss 3.3456 LearningRate 0.0000 Epoch: 39 Global Step: 101060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:57,427-Speed 12850.12 samples/sec Loss 3.3990 LearningRate 0.0000 Epoch: 39 Global Step: 101070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:39:58,977-Speed 13239.45 samples/sec Loss 3.2698 LearningRate 0.0000 Epoch: 39 Global Step: 101080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:40:00,561-Speed 12936.49 samples/sec Loss 3.3362 LearningRate 0.0000 Epoch: 39 Global Step: 101090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:40:02,161-Speed 12827.66 samples/sec Loss 3.4092 LearningRate 0.0000 Epoch: 39 Global Step: 101100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:40:03,717-Speed 13172.69 samples/sec Loss 3.3617 LearningRate 0.0000 Epoch: 39 Global Step: 101110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:40:05,300-Speed 12962.81 samples/sec Loss 3.4490 LearningRate 0.0000 Epoch: 39 Global Step: 101120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:40:06,885-Speed 12926.46 samples/sec Loss 3.3807 LearningRate 0.0000 Epoch: 39 Global Step: 101130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 18:40:08,483-Speed 12829.03 samples/sec Loss 3.3623 LearningRate 0.0000 Epoch: 39 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:40:10,069-Speed 12925.17 samples/sec Loss 3.3933 LearningRate 0.0000 Epoch: 39 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 18:40:11,517-Speed 14151.84 samples/sec Loss 3.3929 LearningRate 0.0000 Epoch: 39 Global Step: 101160 Fp16 Grad Scale: 32768 Required: -0 hours