Training: 2022-01-16 21:35:48,806-rank_id: 0 Training: 2022-01-16 21:36:18,835-: loss cosface Training: 2022-01-16 21:36:18,836-: network r100 Training: 2022-01-16 21:36:18,836-: resume False Training: 2022-01-16 21:36:18,836-: output work_dirs/glint360k_r100_lr02_bs4k_16gpus Training: 2022-01-16 21:36:18,836-: embedding_size 512 Training: 2022-01-16 21:36:18,836-: sample_rate 1.0 Training: 2022-01-16 21:36:18,837-: fp16 True Training: 2022-01-16 21:36:18,837-: momentum 0.9 Training: 2022-01-16 21:36:18,837-: weight_decay 0.0005 Training: 2022-01-16 21:36:18,837-: batch_size 256 Training: 2022-01-16 21:36:18,837-: lr 0.4 Training: 2022-01-16 21:36:18,837-: dali False Training: 2022-01-16 21:36:18,837-: verbose 5000 Training: 2022-01-16 21:36:18,837-: frequent 10 Training: 2022-01-16 21:36:18,837-: score None Training: 2022-01-16 21:36:18,837-: rec /train_tmp/glint360k Training: 2022-01-16 21:36:18,837-: num_classes 360232 Training: 2022-01-16 21:36:18,837-: num_image 17091657 Training: 2022-01-16 21:36:18,837-: num_epoch 20 Training: 2022-01-16 21:36:18,837-: warmup_epoch 2 Training: 2022-01-16 21:36:18,837-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-01-16 21:36:18,837-: warmup_step 8344 Training: 2022-01-16 21:36:18,837-: total_step 83440 Training: 2022-01-16 21:37:47,588-Reducer buckets have been rebuilt in this iteration. Training: 2022-01-16 21:38:03,643-Speed 4834.26 samples/sec Loss 42.3598 LearningRate 0.0010 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-01-16 21:38:12,096-Speed 4846.83 samples/sec Loss 42.3379 LearningRate 0.0014 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-01-16 21:38:20,326-Speed 4978.24 samples/sec Loss 42.3521 LearningRate 0.0019 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-01-16 21:38:28,642-Speed 4926.09 samples/sec Loss 42.3252 LearningRate 0.0024 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-01-16 21:38:37,021-Speed 4889.62 samples/sec Loss 42.2611 LearningRate 0.0029 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-01-16 21:38:45,231-Speed 4989.86 samples/sec Loss 42.2046 LearningRate 0.0034 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-01-16 21:38:53,457-Speed 4980.60 samples/sec Loss 42.1659 LearningRate 0.0038 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-16 21:39:01,595-Speed 5034.66 samples/sec Loss 42.1064 LearningRate 0.0043 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-01-16 21:39:09,978-Speed 4887.11 samples/sec Loss 42.0092 LearningRate 0.0048 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-01-16 21:39:18,290-Speed 4928.63 samples/sec Loss 41.8919 LearningRate 0.0053 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-16 21:39:26,585-Speed 4938.70 samples/sec Loss 41.7167 LearningRate 0.0058 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-01-16 21:39:34,727-Speed 5031.31 samples/sec Loss 41.4520 LearningRate 0.0062 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-16 21:39:42,979-Speed 4964.64 samples/sec Loss 41.1694 LearningRate 0.0067 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-16 21:39:51,124-Speed 5029.95 samples/sec Loss 40.8227 LearningRate 0.0072 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-16 21:39:59,231-Speed 5053.17 samples/sec Loss 40.4952 LearningRate 0.0077 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-16 21:40:07,408-Speed 5009.78 samples/sec Loss 40.1500 LearningRate 0.0081 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-16 21:40:15,555-Speed 5028.51 samples/sec Loss 39.8512 LearningRate 0.0086 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-16 21:40:23,829-Speed 4951.39 samples/sec Loss 39.5800 LearningRate 0.0091 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-16 21:40:31,941-Speed 5050.22 samples/sec Loss 39.3429 LearningRate 0.0096 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-16 21:40:40,193-Speed 4964.43 samples/sec Loss 39.1648 LearningRate 0.0101 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-16 21:40:48,497-Speed 4933.61 samples/sec Loss 38.9826 LearningRate 0.0105 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-16 21:40:56,700-Speed 4994.12 samples/sec Loss 38.7781 LearningRate 0.0110 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-16 21:41:04,833-Speed 5036.83 samples/sec Loss 38.6530 LearningRate 0.0115 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-16 21:41:13,081-Speed 4967.05 samples/sec Loss 38.5239 LearningRate 0.0120 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-16 21:41:21,245-Speed 5018.01 samples/sec Loss 38.4202 LearningRate 0.0125 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-16 21:41:29,401-Speed 5023.10 samples/sec Loss 38.3199 LearningRate 0.0129 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:41:37,579-Speed 5008.91 samples/sec Loss 38.2390 LearningRate 0.0134 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:41:45,806-Speed 4979.72 samples/sec Loss 38.1545 LearningRate 0.0139 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:41:53,932-Speed 5041.56 samples/sec Loss 38.0808 LearningRate 0.0144 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:42:02,109-Speed 5010.20 samples/sec Loss 37.9930 LearningRate 0.0149 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:42:10,259-Speed 5026.11 samples/sec Loss 37.9548 LearningRate 0.0153 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:42:18,367-Speed 5052.82 samples/sec Loss 37.8730 LearningRate 0.0158 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:42:26,749-Speed 4887.38 samples/sec Loss 37.8595 LearningRate 0.0163 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:42:35,331-Speed 4773.39 samples/sec Loss 37.7946 LearningRate 0.0168 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:42:43,714-Speed 4886.87 samples/sec Loss 37.7501 LearningRate 0.0173 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:42:52,199-Speed 4828.46 samples/sec Loss 37.7153 LearningRate 0.0177 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:43:00,527-Speed 4918.84 samples/sec Loss 37.6671 LearningRate 0.0182 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:43:08,811-Speed 4945.02 samples/sec Loss 37.6244 LearningRate 0.0187 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:43:17,185-Speed 4891.89 samples/sec Loss 37.5715 LearningRate 0.0192 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-16 21:43:25,568-Speed 4887.01 samples/sec Loss 37.5309 LearningRate 0.0197 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-16 21:43:34,209-Speed 4740.19 samples/sec Loss 37.4967 LearningRate 0.0201 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-16 21:43:43,142-Speed 4586.17 samples/sec Loss 37.4433 LearningRate 0.0206 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-16 21:43:52,076-Speed 4585.93 samples/sec Loss 37.3920 LearningRate 0.0211 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-16 21:44:00,664-Speed 4769.83 samples/sec Loss 37.3686 LearningRate 0.0216 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-16 21:44:08,816-Speed 5025.37 samples/sec Loss 37.3318 LearningRate 0.0221 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-16 21:44:17,003-Speed 5003.77 samples/sec Loss 37.2828 LearningRate 0.0225 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-16 21:44:25,345-Speed 4910.27 samples/sec Loss 37.2487 LearningRate 0.0230 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:44:33,762-Speed 4866.92 samples/sec Loss 37.2001 LearningRate 0.0235 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:44:41,881-Speed 5045.83 samples/sec Loss 37.1601 LearningRate 0.0240 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:44:49,992-Speed 5050.97 samples/sec Loss 37.1145 LearningRate 0.0244 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:44:58,042-Speed 5088.69 samples/sec Loss 37.0903 LearningRate 0.0249 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:45:06,283-Speed 4971.30 samples/sec Loss 37.0200 LearningRate 0.0254 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:45:14,385-Speed 5056.40 samples/sec Loss 36.9918 LearningRate 0.0259 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:45:22,529-Speed 5029.98 samples/sec Loss 36.9457 LearningRate 0.0264 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:45:30,695-Speed 5016.28 samples/sec Loss 36.8602 LearningRate 0.0268 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:45:38,850-Speed 5023.82 samples/sec Loss 36.7954 LearningRate 0.0273 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:45:47,157-Speed 4931.65 samples/sec Loss 36.7233 LearningRate 0.0278 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:45:55,468-Speed 4929.00 samples/sec Loss 36.6769 LearningRate 0.0283 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:46:03,886-Speed 4866.64 samples/sec Loss 36.6102 LearningRate 0.0288 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:46:12,083-Speed 4997.31 samples/sec Loss 36.5377 LearningRate 0.0292 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:46:20,339-Speed 4962.02 samples/sec Loss 36.4571 LearningRate 0.0297 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:46:28,501-Speed 5018.68 samples/sec Loss 36.4004 LearningRate 0.0302 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:46:36,718-Speed 4985.69 samples/sec Loss 36.3468 LearningRate 0.0307 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:46:44,988-Speed 4953.48 samples/sec Loss 36.2428 LearningRate 0.0312 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:46:53,147-Speed 5021.59 samples/sec Loss 36.1746 LearningRate 0.0316 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:01,324-Speed 5009.78 samples/sec Loss 36.0855 LearningRate 0.0321 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:09,630-Speed 4931.81 samples/sec Loss 36.0429 LearningRate 0.0326 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:17,800-Speed 5013.80 samples/sec Loss 35.9046 LearningRate 0.0331 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:26,096-Speed 4938.08 samples/sec Loss 35.8652 LearningRate 0.0336 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:34,239-Speed 5030.98 samples/sec Loss 35.7652 LearningRate 0.0340 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:42,399-Speed 5020.12 samples/sec Loss 35.6646 LearningRate 0.0345 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:50,543-Speed 5030.45 samples/sec Loss 35.5185 LearningRate 0.0350 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:47:58,711-Speed 5014.99 samples/sec Loss 35.3987 LearningRate 0.0355 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:48:07,038-Speed 4919.77 samples/sec Loss 35.3237 LearningRate 0.0360 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:48:15,370-Speed 4916.94 samples/sec Loss 35.2523 LearningRate 0.0364 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:48:23,530-Speed 5020.15 samples/sec Loss 35.1140 LearningRate 0.0369 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:48:31,813-Speed 4945.45 samples/sec Loss 35.0306 LearningRate 0.0374 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:48:40,372-Speed 4786.38 samples/sec Loss 34.9250 LearningRate 0.0379 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:48:48,675-Speed 4933.75 samples/sec Loss 34.8609 LearningRate 0.0384 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:48:56,868-Speed 5000.10 samples/sec Loss 34.7451 LearningRate 0.0388 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:49:05,270-Speed 4875.65 samples/sec Loss 34.5928 LearningRate 0.0393 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:49:13,469-Speed 4996.50 samples/sec Loss 34.5335 LearningRate 0.0398 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:49:21,896-Speed 4861.35 samples/sec Loss 34.3727 LearningRate 0.0403 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:49:30,216-Speed 4923.98 samples/sec Loss 34.3175 LearningRate 0.0407 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:49:38,555-Speed 4912.27 samples/sec Loss 34.1208 LearningRate 0.0412 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:49:46,889-Speed 4915.30 samples/sec Loss 34.0163 LearningRate 0.0417 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:49:55,144-Speed 4962.27 samples/sec Loss 33.9206 LearningRate 0.0422 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:50:03,480-Speed 4914.43 samples/sec Loss 33.7373 LearningRate 0.0427 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:50:11,720-Speed 4971.90 samples/sec Loss 33.6189 LearningRate 0.0431 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:50:19,938-Speed 4984.55 samples/sec Loss 33.5059 LearningRate 0.0436 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:50:28,219-Speed 4947.15 samples/sec Loss 33.3199 LearningRate 0.0441 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:50:36,309-Speed 5064.07 samples/sec Loss 33.2516 LearningRate 0.0446 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:50:44,477-Speed 5014.62 samples/sec Loss 33.1493 LearningRate 0.0451 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-01-16 21:50:52,604-Speed 5040.53 samples/sec Loss 33.0069 LearningRate 0.0455 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:00,779-Speed 5011.43 samples/sec Loss 32.8917 LearningRate 0.0460 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:09,177-Speed 4878.26 samples/sec Loss 32.7576 LearningRate 0.0465 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:17,337-Speed 5020.29 samples/sec Loss 32.6143 LearningRate 0.0470 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:25,474-Speed 5034.25 samples/sec Loss 32.3868 LearningRate 0.0475 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:33,587-Speed 5049.33 samples/sec Loss 32.2796 LearningRate 0.0479 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:41,787-Speed 4995.86 samples/sec Loss 32.1530 LearningRate 0.0484 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:50,019-Speed 4976.39 samples/sec Loss 31.9311 LearningRate 0.0489 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:51:58,275-Speed 4962.26 samples/sec Loss 31.8882 LearningRate 0.0494 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:52:06,405-Speed 5039.19 samples/sec Loss 31.7123 LearningRate 0.0499 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:52:14,689-Speed 4944.75 samples/sec Loss 31.6548 LearningRate 0.0503 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:52:22,769-Speed 5070.28 samples/sec Loss 31.4818 LearningRate 0.0508 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:52:30,912-Speed 5030.70 samples/sec Loss 31.2750 LearningRate 0.0513 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:52:39,054-Speed 5031.41 samples/sec Loss 31.2020 LearningRate 0.0518 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-16 21:52:47,459-Speed 4873.89 samples/sec Loss 30.9998 LearningRate 0.0523 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:52:55,675-Speed 4986.22 samples/sec Loss 30.8290 LearningRate 0.0527 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:53:03,843-Speed 5015.24 samples/sec Loss 30.7409 LearningRate 0.0532 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:53:12,138-Speed 4938.70 samples/sec Loss 30.5584 LearningRate 0.0537 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:53:20,247-Speed 5051.90 samples/sec Loss 30.4590 LearningRate 0.0542 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:53:28,394-Speed 5028.42 samples/sec Loss 30.2364 LearningRate 0.0547 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:53:36,662-Speed 4954.34 samples/sec Loss 30.0776 LearningRate 0.0551 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:53:44,916-Speed 4963.14 samples/sec Loss 29.8886 LearningRate 0.0556 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:53:53,069-Speed 5025.12 samples/sec Loss 29.7635 LearningRate 0.0561 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:01,394-Speed 4920.32 samples/sec Loss 29.6980 LearningRate 0.0566 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:09,833-Speed 4854.64 samples/sec Loss 29.5140 LearningRate 0.0570 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:18,004-Speed 5013.43 samples/sec Loss 29.3197 LearningRate 0.0575 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:26,277-Speed 4951.93 samples/sec Loss 29.1883 LearningRate 0.0580 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:34,457-Speed 5007.66 samples/sec Loss 29.0412 LearningRate 0.0585 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:42,760-Speed 4934.06 samples/sec Loss 28.9125 LearningRate 0.0590 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:51,007-Speed 4967.11 samples/sec Loss 28.7392 LearningRate 0.0594 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:54:59,354-Speed 4907.55 samples/sec Loss 28.5431 LearningRate 0.0599 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:55:07,521-Speed 5016.32 samples/sec Loss 28.4012 LearningRate 0.0604 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:55:15,671-Speed 5026.76 samples/sec Loss 28.2825 LearningRate 0.0609 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:55:23,946-Speed 4950.29 samples/sec Loss 28.1665 LearningRate 0.0614 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:55:32,138-Speed 5001.02 samples/sec Loss 27.9887 LearningRate 0.0618 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:55:40,465-Speed 4919.67 samples/sec Loss 27.8234 LearningRate 0.0623 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:55:48,787-Speed 4922.73 samples/sec Loss 27.6696 LearningRate 0.0628 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:55:56,985-Speed 4996.92 samples/sec Loss 27.4806 LearningRate 0.0633 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:56:05,286-Speed 4934.93 samples/sec Loss 27.3248 LearningRate 0.0638 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:56:13,472-Speed 5004.15 samples/sec Loss 27.1269 LearningRate 0.0642 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:56:21,692-Speed 4983.84 samples/sec Loss 26.9021 LearningRate 0.0647 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:56:29,866-Speed 5011.72 samples/sec Loss 26.9248 LearningRate 0.0652 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:56:38,805-Speed 4582.84 samples/sec Loss 26.7098 LearningRate 0.0657 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:56:47,685-Speed 4613.20 samples/sec Loss 26.5880 LearningRate 0.0662 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:56:55,902-Speed 4985.42 samples/sec Loss 26.3728 LearningRate 0.0666 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:57:04,163-Speed 4958.75 samples/sec Loss 26.2446 LearningRate 0.0671 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:57:12,583-Speed 4865.57 samples/sec Loss 26.0873 LearningRate 0.0676 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:57:20,879-Speed 4938.14 samples/sec Loss 25.9997 LearningRate 0.0681 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:57:29,047-Speed 5015.11 samples/sec Loss 25.7740 LearningRate 0.0686 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:57:37,176-Speed 5039.40 samples/sec Loss 25.6851 LearningRate 0.0690 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:57:45,444-Speed 4954.80 samples/sec Loss 25.5735 LearningRate 0.0695 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:57:53,647-Speed 4994.23 samples/sec Loss 25.3888 LearningRate 0.0700 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:02,067-Speed 4865.45 samples/sec Loss 25.2423 LearningRate 0.0705 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:10,454-Speed 4883.98 samples/sec Loss 25.1456 LearningRate 0.0709 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:18,625-Speed 5013.47 samples/sec Loss 24.9345 LearningRate 0.0714 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:26,890-Speed 4956.95 samples/sec Loss 24.7267 LearningRate 0.0719 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:35,135-Speed 4968.81 samples/sec Loss 24.6332 LearningRate 0.0724 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:43,386-Speed 4964.15 samples/sec Loss 24.4571 LearningRate 0.0729 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:51,548-Speed 5019.07 samples/sec Loss 24.2938 LearningRate 0.0733 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:58:59,793-Speed 4968.77 samples/sec Loss 24.1337 LearningRate 0.0738 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 21:59:08,038-Speed 4968.26 samples/sec Loss 24.0089 LearningRate 0.0743 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:59:16,555-Speed 4809.97 samples/sec Loss 23.8815 LearningRate 0.0748 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:59:24,896-Speed 4911.89 samples/sec Loss 23.6897 LearningRate 0.0753 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:59:33,049-Speed 5024.51 samples/sec Loss 23.5113 LearningRate 0.0757 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:59:41,206-Speed 5021.63 samples/sec Loss 23.4685 LearningRate 0.0762 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:59:49,399-Speed 5000.18 samples/sec Loss 23.3367 LearningRate 0.0767 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 21:59:57,711-Speed 4928.45 samples/sec Loss 23.2095 LearningRate 0.0772 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:00:06,065-Speed 4903.69 samples/sec Loss 23.0791 LearningRate 0.0777 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:00:14,342-Speed 4949.11 samples/sec Loss 22.9124 LearningRate 0.0781 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:00:22,966-Speed 4750.30 samples/sec Loss 22.6971 LearningRate 0.0786 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:00:31,258-Speed 4940.41 samples/sec Loss 22.5844 LearningRate 0.0791 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:00:39,561-Speed 4934.05 samples/sec Loss 22.4688 LearningRate 0.0796 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:00:47,928-Speed 4895.83 samples/sec Loss 22.4530 LearningRate 0.0801 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:00:56,195-Speed 4955.27 samples/sec Loss 22.3219 LearningRate 0.0805 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:01:04,331-Speed 5034.96 samples/sec Loss 22.0713 LearningRate 0.0810 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:01:12,459-Speed 5039.69 samples/sec Loss 21.9351 LearningRate 0.0815 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:01:20,697-Speed 4972.91 samples/sec Loss 21.8716 LearningRate 0.0820 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:01:28,914-Speed 4985.56 samples/sec Loss 21.7223 LearningRate 0.0825 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:01:37,360-Speed 4850.15 samples/sec Loss 21.6826 LearningRate 0.0829 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:01:45,664-Speed 4933.37 samples/sec Loss 21.4516 LearningRate 0.0834 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:01:53,972-Speed 4930.79 samples/sec Loss 21.4658 LearningRate 0.0839 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:02,391-Speed 4866.01 samples/sec Loss 21.2986 LearningRate 0.0844 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:10,702-Speed 4929.11 samples/sec Loss 21.1724 LearningRate 0.0849 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:18,849-Speed 5027.90 samples/sec Loss 21.0257 LearningRate 0.0853 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:26,979-Speed 5038.86 samples/sec Loss 20.9709 LearningRate 0.0858 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:35,146-Speed 5016.41 samples/sec Loss 20.7826 LearningRate 0.0863 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:43,274-Speed 5039.86 samples/sec Loss 20.6862 LearningRate 0.0868 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:51,542-Speed 4955.22 samples/sec Loss 20.6006 LearningRate 0.0872 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:02:59,734-Speed 5000.07 samples/sec Loss 20.5349 LearningRate 0.0877 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:03:07,961-Speed 4979.86 samples/sec Loss 20.3657 LearningRate 0.0882 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:03:16,076-Speed 5047.76 samples/sec Loss 20.2035 LearningRate 0.0887 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:03:24,173-Speed 5059.42 samples/sec Loss 20.1603 LearningRate 0.0892 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:03:32,318-Speed 5029.41 samples/sec Loss 19.9967 LearningRate 0.0896 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:03:40,405-Speed 5066.02 samples/sec Loss 19.9293 LearningRate 0.0901 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:03:48,556-Speed 5025.79 samples/sec Loss 19.8982 LearningRate 0.0906 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:03:56,737-Speed 5007.59 samples/sec Loss 19.7541 LearningRate 0.0911 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:04:04,907-Speed 5013.89 samples/sec Loss 19.5327 LearningRate 0.0916 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:04:13,398-Speed 4825.22 samples/sec Loss 19.5247 LearningRate 0.0920 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:04:21,961-Speed 4783.77 samples/sec Loss 19.2866 LearningRate 0.0925 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:04:30,503-Speed 4795.44 samples/sec Loss 19.3377 LearningRate 0.0930 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:04:38,645-Speed 5031.19 samples/sec Loss 19.1654 LearningRate 0.0935 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:04:46,778-Speed 5037.20 samples/sec Loss 19.0661 LearningRate 0.0940 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:04:54,915-Speed 5034.17 samples/sec Loss 19.0985 LearningRate 0.0944 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:05:03,071-Speed 5023.04 samples/sec Loss 18.8547 LearningRate 0.0949 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:05:11,421-Speed 4905.95 samples/sec Loss 18.7522 LearningRate 0.0954 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:05:19,557-Speed 5034.90 samples/sec Loss 18.6880 LearningRate 0.0959 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:05:27,826-Speed 4954.50 samples/sec Loss 18.5571 LearningRate 0.0964 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:05:36,053-Speed 4979.35 samples/sec Loss 18.5208 LearningRate 0.0968 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:05:44,144-Speed 5063.36 samples/sec Loss 18.4338 LearningRate 0.0973 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:05:52,397-Speed 4963.50 samples/sec Loss 18.3452 LearningRate 0.0978 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:00,896-Speed 4819.55 samples/sec Loss 18.3139 LearningRate 0.0983 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:09,258-Speed 4899.39 samples/sec Loss 18.2651 LearningRate 0.0988 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:17,478-Speed 4983.80 samples/sec Loss 18.1821 LearningRate 0.0992 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:25,825-Speed 4907.78 samples/sec Loss 17.9896 LearningRate 0.0997 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:33,901-Speed 5072.21 samples/sec Loss 17.8825 LearningRate 0.1002 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:42,040-Speed 5033.68 samples/sec Loss 17.8816 LearningRate 0.1007 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:50,182-Speed 5031.30 samples/sec Loss 17.7457 LearningRate 0.1012 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:06:58,405-Speed 4981.40 samples/sec Loss 17.7346 LearningRate 0.1016 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:07:06,954-Speed 4792.28 samples/sec Loss 17.5962 LearningRate 0.1021 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:07:15,368-Speed 4868.30 samples/sec Loss 17.4839 LearningRate 0.1026 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:07:23,582-Speed 4987.26 samples/sec Loss 17.4174 LearningRate 0.1031 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:07:31,678-Speed 5059.83 samples/sec Loss 17.3327 LearningRate 0.1035 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:07:40,383-Speed 4706.27 samples/sec Loss 17.2869 LearningRate 0.1040 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:07:48,774-Speed 4881.91 samples/sec Loss 17.2258 LearningRate 0.1045 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:07:57,075-Speed 4934.92 samples/sec Loss 17.0574 LearningRate 0.1050 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:08:05,372-Speed 4937.66 samples/sec Loss 17.1282 LearningRate 0.1055 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:08:13,592-Speed 4983.54 samples/sec Loss 17.0076 LearningRate 0.1059 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:08:21,865-Speed 4951.56 samples/sec Loss 16.8732 LearningRate 0.1064 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:08:30,165-Speed 4935.47 samples/sec Loss 16.7699 LearningRate 0.1069 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:08:38,368-Speed 4994.34 samples/sec Loss 16.8538 LearningRate 0.1074 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:08:46,662-Speed 4939.32 samples/sec Loss 16.7010 LearningRate 0.1079 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:08:55,340-Speed 4720.61 samples/sec Loss 16.6733 LearningRate 0.1083 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:09:03,451-Speed 5050.42 samples/sec Loss 16.6112 LearningRate 0.1088 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:09:11,607-Speed 5023.05 samples/sec Loss 16.5260 LearningRate 0.1093 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:09:19,723-Speed 5047.49 samples/sec Loss 16.4477 LearningRate 0.1098 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:09:27,828-Speed 5054.15 samples/sec Loss 16.4372 LearningRate 0.1103 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:09:35,987-Speed 5020.67 samples/sec Loss 16.2977 LearningRate 0.1107 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:09:44,298-Speed 4929.29 samples/sec Loss 16.2453 LearningRate 0.1112 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:09:52,472-Speed 5012.02 samples/sec Loss 16.2183 LearningRate 0.1117 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:10:00,691-Speed 4983.87 samples/sec Loss 16.1771 LearningRate 0.1122 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:10:08,892-Speed 4995.42 samples/sec Loss 16.1491 LearningRate 0.1127 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:10:17,015-Speed 5043.38 samples/sec Loss 15.8892 LearningRate 0.1131 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:10:25,320-Speed 4931.95 samples/sec Loss 15.8577 LearningRate 0.1136 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:10:33,994-Speed 4723.10 samples/sec Loss 15.9396 LearningRate 0.1141 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:10:42,700-Speed 4705.05 samples/sec Loss 15.8920 LearningRate 0.1146 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:10:51,263-Speed 4784.47 samples/sec Loss 15.7884 LearningRate 0.1151 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:10:59,762-Speed 4819.80 samples/sec Loss 15.7291 LearningRate 0.1155 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:11:08,008-Speed 4967.93 samples/sec Loss 15.6805 LearningRate 0.1160 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:11:16,191-Speed 5007.19 samples/sec Loss 15.6026 LearningRate 0.1165 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:11:24,397-Speed 4991.93 samples/sec Loss 15.5509 LearningRate 0.1170 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:11:32,655-Speed 4960.85 samples/sec Loss 15.4965 LearningRate 0.1174 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:11:40,892-Speed 4973.68 samples/sec Loss 15.4301 LearningRate 0.1179 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:11:49,057-Speed 5016.92 samples/sec Loss 15.4001 LearningRate 0.1184 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:11:57,285-Speed 4978.77 samples/sec Loss 15.3611 LearningRate 0.1189 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:12:05,484-Speed 4996.27 samples/sec Loss 15.2732 LearningRate 0.1194 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:12:13,587-Speed 5056.13 samples/sec Loss 15.2135 LearningRate 0.1198 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:12:21,754-Speed 5015.74 samples/sec Loss 15.1041 LearningRate 0.1203 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:12:29,961-Speed 4991.44 samples/sec Loss 15.0685 LearningRate 0.1208 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:12:38,182-Speed 4983.10 samples/sec Loss 15.0893 LearningRate 0.1213 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:12:46,363-Speed 5007.78 samples/sec Loss 15.0674 LearningRate 0.1218 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:12:54,450-Speed 5065.79 samples/sec Loss 14.9497 LearningRate 0.1222 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:13:02,581-Speed 5037.70 samples/sec Loss 14.8894 LearningRate 0.1227 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:13:10,683-Speed 5056.15 samples/sec Loss 14.8328 LearningRate 0.1232 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:13:18,728-Speed 5091.76 samples/sec Loss 14.7859 LearningRate 0.1237 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:13:26,973-Speed 4968.98 samples/sec Loss 14.7780 LearningRate 0.1242 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:13:35,230-Speed 4961.19 samples/sec Loss 14.7019 LearningRate 0.1246 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:13:43,386-Speed 5022.47 samples/sec Loss 14.5623 LearningRate 0.1251 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:13:51,616-Speed 4977.80 samples/sec Loss 14.5339 LearningRate 0.1256 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:13:59,811-Speed 4998.99 samples/sec Loss 14.5251 LearningRate 0.1261 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:14:08,066-Speed 4962.39 samples/sec Loss 14.5157 LearningRate 0.1266 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:14:16,354-Speed 4942.77 samples/sec Loss 14.4885 LearningRate 0.1270 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:14:25,258-Speed 4600.50 samples/sec Loss 14.3332 LearningRate 0.1275 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:14:34,416-Speed 4473.44 samples/sec Loss 14.3308 LearningRate 0.1280 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:14:43,010-Speed 4766.31 samples/sec Loss 14.2487 LearningRate 0.1285 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:14:51,532-Speed 4807.41 samples/sec Loss 14.3078 LearningRate 0.1290 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:14:59,928-Speed 4879.03 samples/sec Loss 14.2348 LearningRate 0.1294 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:15:07,981-Speed 5086.74 samples/sec Loss 14.1805 LearningRate 0.1299 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:15:16,248-Speed 4955.08 samples/sec Loss 14.1010 LearningRate 0.1304 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:15:24,373-Speed 5042.65 samples/sec Loss 14.0890 LearningRate 0.1309 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:15:32,489-Speed 5047.09 samples/sec Loss 14.0290 LearningRate 0.1314 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:15:40,597-Speed 5052.72 samples/sec Loss 13.9701 LearningRate 0.1318 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:15:48,706-Speed 5051.45 samples/sec Loss 14.0432 LearningRate 0.1323 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:15:56,930-Speed 4981.70 samples/sec Loss 13.9617 LearningRate 0.1328 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:16:05,205-Speed 4950.90 samples/sec Loss 13.8213 LearningRate 0.1333 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:16:13,377-Speed 5012.38 samples/sec Loss 13.8585 LearningRate 0.1337 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:16:21,545-Speed 5015.58 samples/sec Loss 13.7872 LearningRate 0.1342 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:16:29,687-Speed 5031.33 samples/sec Loss 13.7412 LearningRate 0.1347 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:16:37,906-Speed 4984.32 samples/sec Loss 13.6822 LearningRate 0.1352 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:16:46,172-Speed 4955.96 samples/sec Loss 13.7927 LearningRate 0.1357 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:16:54,346-Speed 5011.44 samples/sec Loss 13.6895 LearningRate 0.1361 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:17:02,640-Speed 4938.93 samples/sec Loss 13.6576 LearningRate 0.1366 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:17:10,755-Speed 5048.76 samples/sec Loss 13.5819 LearningRate 0.1371 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:17:18,923-Speed 5015.14 samples/sec Loss 13.5438 LearningRate 0.1376 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:17:27,316-Speed 4880.27 samples/sec Loss 13.5480 LearningRate 0.1381 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:17:35,524-Speed 4991.49 samples/sec Loss 13.4148 LearningRate 0.1385 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:17:43,867-Speed 4910.30 samples/sec Loss 13.4454 LearningRate 0.1390 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:17:52,044-Speed 5009.55 samples/sec Loss 13.3492 LearningRate 0.1395 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:18:00,506-Speed 4840.86 samples/sec Loss 13.3993 LearningRate 0.1400 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:18:08,824-Speed 4925.00 samples/sec Loss 13.3015 LearningRate 0.1405 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:18:16,961-Speed 5034.94 samples/sec Loss 13.3401 LearningRate 0.1409 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:18:25,038-Speed 5071.80 samples/sec Loss 13.2972 LearningRate 0.1414 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:18:33,225-Speed 5003.43 samples/sec Loss 13.1764 LearningRate 0.1419 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:18:41,448-Speed 4982.50 samples/sec Loss 13.1724 LearningRate 0.1424 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:18:49,578-Speed 5038.46 samples/sec Loss 13.0745 LearningRate 0.1429 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:18:57,782-Speed 4993.07 samples/sec Loss 13.1559 LearningRate 0.1433 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:19:06,205-Speed 4863.57 samples/sec Loss 13.0919 LearningRate 0.1438 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:19:14,539-Speed 4915.75 samples/sec Loss 13.0541 LearningRate 0.1443 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:19:22,696-Speed 5022.40 samples/sec Loss 12.9794 LearningRate 0.1448 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:19:30,895-Speed 4996.68 samples/sec Loss 13.0392 LearningRate 0.1453 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:19:39,152-Speed 4961.21 samples/sec Loss 12.8754 LearningRate 0.1457 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:19:47,665-Speed 4811.66 samples/sec Loss 12.8917 LearningRate 0.1462 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:19:55,881-Speed 4986.47 samples/sec Loss 12.9503 LearningRate 0.1467 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:20:04,085-Speed 4993.54 samples/sec Loss 12.8584 LearningRate 0.1472 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:20:12,190-Speed 5054.24 samples/sec Loss 12.8006 LearningRate 0.1477 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:20:20,486-Speed 4937.56 samples/sec Loss 12.7971 LearningRate 0.1481 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:20:28,811-Speed 4921.10 samples/sec Loss 12.7016 LearningRate 0.1486 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:20:36,976-Speed 5017.62 samples/sec Loss 12.7326 LearningRate 0.1491 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:20:45,146-Speed 5013.80 samples/sec Loss 12.6789 LearningRate 0.1496 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:20:53,438-Speed 4940.31 samples/sec Loss 12.6347 LearningRate 0.1500 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:21:01,654-Speed 4986.01 samples/sec Loss 12.5944 LearningRate 0.1505 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:21:09,823-Speed 5015.27 samples/sec Loss 12.5664 LearningRate 0.1510 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:21:18,039-Speed 4985.94 samples/sec Loss 12.5917 LearningRate 0.1515 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:21:26,217-Speed 5008.86 samples/sec Loss 12.6674 LearningRate 0.1520 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:21:34,451-Speed 4975.22 samples/sec Loss 12.5474 LearningRate 0.1524 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:21:42,577-Speed 5041.73 samples/sec Loss 12.5789 LearningRate 0.1529 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:21:50,722-Speed 5029.70 samples/sec Loss 12.5482 LearningRate 0.1534 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:21:58,929-Speed 4991.74 samples/sec Loss 12.4477 LearningRate 0.1539 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:22:07,358-Speed 4859.70 samples/sec Loss 12.4549 LearningRate 0.1544 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:22:15,524-Speed 5016.82 samples/sec Loss 12.4161 LearningRate 0.1548 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:22:23,815-Speed 4941.11 samples/sec Loss 12.4185 LearningRate 0.1553 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:22:32,023-Speed 4990.68 samples/sec Loss 12.3670 LearningRate 0.1558 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:22:40,272-Speed 4965.99 samples/sec Loss 12.2455 LearningRate 0.1563 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:22:48,456-Speed 5005.39 samples/sec Loss 12.2639 LearningRate 0.1568 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:22:56,653-Speed 4997.48 samples/sec Loss 12.1214 LearningRate 0.1572 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 22:23:04,857-Speed 4993.38 samples/sec Loss 12.1641 LearningRate 0.1577 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:23:13,086-Speed 4978.16 samples/sec Loss 12.1502 LearningRate 0.1582 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:23:21,287-Speed 4995.48 samples/sec Loss 12.2031 LearningRate 0.1587 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:23:29,379-Speed 5062.33 samples/sec Loss 12.1495 LearningRate 0.1592 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:23:37,552-Speed 5011.82 samples/sec Loss 12.1068 LearningRate 0.1596 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:23:45,740-Speed 5003.10 samples/sec Loss 12.0941 LearningRate 0.1601 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:23:53,971-Speed 4977.17 samples/sec Loss 12.0419 LearningRate 0.1606 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:24:02,360-Speed 4883.16 samples/sec Loss 11.9535 LearningRate 0.1611 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:24:10,700-Speed 4912.21 samples/sec Loss 11.9618 LearningRate 0.1616 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:24:19,017-Speed 4925.02 samples/sec Loss 11.9050 LearningRate 0.1620 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:24:27,355-Speed 4913.45 samples/sec Loss 12.0094 LearningRate 0.1625 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:24:35,558-Speed 4993.63 samples/sec Loss 11.9909 LearningRate 0.1630 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:24:43,760-Speed 4994.63 samples/sec Loss 11.8981 LearningRate 0.1635 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:24:51,931-Speed 5013.96 samples/sec Loss 11.8704 LearningRate 0.1640 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:00,140-Speed 4989.99 samples/sec Loss 11.9976 LearningRate 0.1644 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:08,479-Speed 4912.56 samples/sec Loss 11.8776 LearningRate 0.1649 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:16,687-Speed 4991.24 samples/sec Loss 11.7961 LearningRate 0.1654 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:24,888-Speed 4995.16 samples/sec Loss 11.8549 LearningRate 0.1659 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:33,357-Speed 4836.72 samples/sec Loss 11.7974 LearningRate 0.1663 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:41,825-Speed 4837.63 samples/sec Loss 11.8136 LearningRate 0.1668 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:50,005-Speed 5008.32 samples/sec Loss 11.7381 LearningRate 0.1673 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:25:58,256-Speed 4964.74 samples/sec Loss 11.6860 LearningRate 0.1678 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:26:06,445-Speed 5002.48 samples/sec Loss 11.6740 LearningRate 0.1683 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:26:14,734-Speed 4941.91 samples/sec Loss 11.6839 LearningRate 0.1687 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:26:22,955-Speed 4983.65 samples/sec Loss 11.7213 LearningRate 0.1692 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:26:31,208-Speed 4963.63 samples/sec Loss 11.6605 LearningRate 0.1697 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:26:39,506-Speed 4937.17 samples/sec Loss 11.6327 LearningRate 0.1702 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:26:47,667-Speed 5019.03 samples/sec Loss 11.6611 LearningRate 0.1707 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:26:55,817-Speed 5026.95 samples/sec Loss 11.5816 LearningRate 0.1711 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:27:03,980-Speed 5018.30 samples/sec Loss 11.5136 LearningRate 0.1716 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:27:12,296-Speed 4926.20 samples/sec Loss 11.5676 LearningRate 0.1721 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:27:20,984-Speed 4715.31 samples/sec Loss 11.5564 LearningRate 0.1726 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:27:29,361-Speed 4890.17 samples/sec Loss 11.5324 LearningRate 0.1731 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:27:37,457-Speed 5060.06 samples/sec Loss 11.5380 LearningRate 0.1735 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:27:45,800-Speed 4910.03 samples/sec Loss 11.5312 LearningRate 0.1740 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:27:53,949-Speed 5027.29 samples/sec Loss 11.4526 LearningRate 0.1745 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:28:02,158-Speed 4990.26 samples/sec Loss 11.3834 LearningRate 0.1750 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:28:10,398-Speed 4971.72 samples/sec Loss 11.4085 LearningRate 0.1755 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:28:18,666-Speed 4954.65 samples/sec Loss 11.4172 LearningRate 0.1759 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:28:26,980-Speed 4927.30 samples/sec Loss 11.4271 LearningRate 0.1764 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:28:35,477-Speed 4821.87 samples/sec Loss 11.4623 LearningRate 0.1769 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:28:43,685-Speed 4991.00 samples/sec Loss 11.3656 LearningRate 0.1774 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:28:51,912-Speed 4979.31 samples/sec Loss 11.3316 LearningRate 0.1779 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:29:00,122-Speed 4989.82 samples/sec Loss 11.3855 LearningRate 0.1783 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:29:08,435-Speed 4927.87 samples/sec Loss 11.3646 LearningRate 0.1788 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:29:16,702-Speed 4954.92 samples/sec Loss 11.2844 LearningRate 0.1793 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:29:24,956-Speed 4963.15 samples/sec Loss 11.2679 LearningRate 0.1798 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:29:33,241-Speed 4944.70 samples/sec Loss 11.2887 LearningRate 0.1802 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:29:41,532-Speed 4941.71 samples/sec Loss 11.2738 LearningRate 0.1807 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:29:49,900-Speed 4895.38 samples/sec Loss 11.1755 LearningRate 0.1812 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:29:58,242-Speed 4910.69 samples/sec Loss 11.1624 LearningRate 0.1817 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:30:06,490-Speed 4966.41 samples/sec Loss 11.0844 LearningRate 0.1822 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:30:14,665-Speed 5011.42 samples/sec Loss 11.1195 LearningRate 0.1826 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:30:22,808-Speed 5031.07 samples/sec Loss 11.1204 LearningRate 0.1831 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:30:30,913-Speed 5054.06 samples/sec Loss 11.1601 LearningRate 0.1836 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:30:39,091-Speed 5009.28 samples/sec Loss 11.1316 LearningRate 0.1841 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:30:47,296-Speed 4992.78 samples/sec Loss 10.9913 LearningRate 0.1846 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 22:30:55,450-Speed 5023.75 samples/sec Loss 11.0974 LearningRate 0.1850 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 22:31:03,642-Speed 5000.84 samples/sec Loss 11.1658 LearningRate 0.1855 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:31:11,941-Speed 4936.20 samples/sec Loss 10.9609 LearningRate 0.1860 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:31:20,103-Speed 5018.94 samples/sec Loss 10.8907 LearningRate 0.1865 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:31:28,471-Speed 4895.83 samples/sec Loss 10.9837 LearningRate 0.1870 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:31:36,658-Speed 5003.60 samples/sec Loss 10.9712 LearningRate 0.1874 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:31:44,991-Speed 4916.35 samples/sec Loss 10.9870 LearningRate 0.1879 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:31:53,537-Speed 4793.42 samples/sec Loss 10.9272 LearningRate 0.1884 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:32:01,782-Speed 4968.52 samples/sec Loss 10.9542 LearningRate 0.1889 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:32:10,005-Speed 4981.80 samples/sec Loss 10.9283 LearningRate 0.1894 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:32:18,146-Speed 5031.96 samples/sec Loss 10.8906 LearningRate 0.1898 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:32:26,357-Speed 4989.21 samples/sec Loss 10.8396 LearningRate 0.1903 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:32:34,725-Speed 4895.45 samples/sec Loss 10.8897 LearningRate 0.1908 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:32:43,470-Speed 4684.47 samples/sec Loss 10.8428 LearningRate 0.1913 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:32:51,874-Speed 4874.20 samples/sec Loss 10.8105 LearningRate 0.1918 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:00,002-Speed 5040.28 samples/sec Loss 10.8154 LearningRate 0.1922 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:08,276-Speed 4951.26 samples/sec Loss 10.8426 LearningRate 0.1927 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:16,520-Speed 4969.12 samples/sec Loss 10.8263 LearningRate 0.1932 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:24,770-Speed 4965.59 samples/sec Loss 10.8471 LearningRate 0.1937 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:33,043-Speed 4951.54 samples/sec Loss 10.7199 LearningRate 0.1942 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:41,329-Speed 4944.39 samples/sec Loss 10.8153 LearningRate 0.1946 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:49,752-Speed 4863.81 samples/sec Loss 10.7178 LearningRate 0.1951 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:33:57,879-Speed 5041.02 samples/sec Loss 10.7150 LearningRate 0.1956 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:34:05,997-Speed 5046.26 samples/sec Loss 10.6387 LearningRate 0.1961 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:34:14,178-Speed 5007.38 samples/sec Loss 10.6429 LearningRate 0.1965 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:34:22,292-Speed 5048.59 samples/sec Loss 10.6712 LearningRate 0.1970 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:34:30,443-Speed 5025.41 samples/sec Loss 10.7216 LearningRate 0.1975 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:34:39,031-Speed 4770.30 samples/sec Loss 10.6880 LearningRate 0.1980 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:34:47,136-Speed 5054.75 samples/sec Loss 10.7008 LearningRate 0.1985 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:34:55,328-Speed 5000.15 samples/sec Loss 10.6140 LearningRate 0.1989 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:35:03,519-Speed 5001.45 samples/sec Loss 10.6419 LearningRate 0.1994 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:35:11,853-Speed 4915.57 samples/sec Loss 10.6494 LearningRate 0.1999 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:35:48,809-Speed 1108.41 samples/sec Loss 10.0757 LearningRate 0.2004 Epoch: 1 Global Step: 4180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:35:56,972-Speed 5018.55 samples/sec Loss 9.8241 LearningRate 0.2009 Epoch: 1 Global Step: 4190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:36:05,285-Speed 4928.05 samples/sec Loss 9.8612 LearningRate 0.2013 Epoch: 1 Global Step: 4200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:36:13,514-Speed 4978.17 samples/sec Loss 9.8862 LearningRate 0.2018 Epoch: 1 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:36:21,805-Speed 4941.15 samples/sec Loss 9.9971 LearningRate 0.2023 Epoch: 1 Global Step: 4220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:36:30,165-Speed 4900.12 samples/sec Loss 9.9703 LearningRate 0.2028 Epoch: 1 Global Step: 4230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:36:38,348-Speed 5006.35 samples/sec Loss 9.9053 LearningRate 0.2033 Epoch: 1 Global Step: 4240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:36:46,688-Speed 4911.83 samples/sec Loss 9.9260 LearningRate 0.2037 Epoch: 1 Global Step: 4250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:36:55,030-Speed 4911.20 samples/sec Loss 9.9240 LearningRate 0.2042 Epoch: 1 Global Step: 4260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:37:03,368-Speed 4912.67 samples/sec Loss 9.8782 LearningRate 0.2047 Epoch: 1 Global Step: 4270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:37:11,472-Speed 5054.77 samples/sec Loss 9.9922 LearningRate 0.2052 Epoch: 1 Global Step: 4280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:37:19,629-Speed 5022.41 samples/sec Loss 9.9864 LearningRate 0.2057 Epoch: 1 Global Step: 4290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:37:27,770-Speed 5031.80 samples/sec Loss 9.9892 LearningRate 0.2061 Epoch: 1 Global Step: 4300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:37:35,915-Speed 5029.75 samples/sec Loss 9.9583 LearningRate 0.2066 Epoch: 1 Global Step: 4310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:37:44,196-Speed 4946.95 samples/sec Loss 10.0587 LearningRate 0.2071 Epoch: 1 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:37:52,354-Speed 5022.73 samples/sec Loss 10.1213 LearningRate 0.2076 Epoch: 1 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:00,788-Speed 4856.72 samples/sec Loss 9.9560 LearningRate 0.2081 Epoch: 1 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:09,069-Speed 4946.61 samples/sec Loss 10.0172 LearningRate 0.2085 Epoch: 1 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:17,570-Speed 4818.85 samples/sec Loss 9.9217 LearningRate 0.2090 Epoch: 1 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:25,816-Speed 4968.37 samples/sec Loss 10.0915 LearningRate 0.2095 Epoch: 1 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:34,103-Speed 4943.49 samples/sec Loss 9.9744 LearningRate 0.2100 Epoch: 1 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:42,443-Speed 4912.05 samples/sec Loss 10.1370 LearningRate 0.2105 Epoch: 1 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:50,761-Speed 4924.48 samples/sec Loss 10.0148 LearningRate 0.2109 Epoch: 1 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:38:59,260-Speed 4820.28 samples/sec Loss 10.0199 LearningRate 0.2114 Epoch: 1 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:39:07,618-Speed 4901.09 samples/sec Loss 10.1006 LearningRate 0.2119 Epoch: 1 Global Step: 4420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 22:39:15,985-Speed 4896.72 samples/sec Loss 10.0150 LearningRate 0.2124 Epoch: 1 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:39:24,184-Speed 4996.27 samples/sec Loss 10.0262 LearningRate 0.2128 Epoch: 1 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:39:32,449-Speed 4956.23 samples/sec Loss 10.0538 LearningRate 0.2133 Epoch: 1 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:39:40,769-Speed 4923.57 samples/sec Loss 10.0809 LearningRate 0.2138 Epoch: 1 Global Step: 4460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:39:48,959-Speed 5001.83 samples/sec Loss 10.0892 LearningRate 0.2143 Epoch: 1 Global Step: 4470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:39:57,105-Speed 5029.38 samples/sec Loss 10.1042 LearningRate 0.2148 Epoch: 1 Global Step: 4480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:40:05,320-Speed 4986.24 samples/sec Loss 10.0882 LearningRate 0.2152 Epoch: 1 Global Step: 4490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:40:13,718-Speed 4877.94 samples/sec Loss 10.0365 LearningRate 0.2157 Epoch: 1 Global Step: 4500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:40:21,826-Speed 5053.05 samples/sec Loss 10.0966 LearningRate 0.2162 Epoch: 1 Global Step: 4510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:40:30,224-Speed 4878.17 samples/sec Loss 10.0901 LearningRate 0.2167 Epoch: 1 Global Step: 4520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:40:38,415-Speed 5000.83 samples/sec Loss 10.0241 LearningRate 0.2172 Epoch: 1 Global Step: 4530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:40:46,708-Speed 4939.54 samples/sec Loss 10.0361 LearningRate 0.2176 Epoch: 1 Global Step: 4540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:40:54,930-Speed 4982.52 samples/sec Loss 10.0494 LearningRate 0.2181 Epoch: 1 Global Step: 4550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:41:03,078-Speed 5027.79 samples/sec Loss 9.9896 LearningRate 0.2186 Epoch: 1 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:41:11,250-Speed 5013.01 samples/sec Loss 10.1053 LearningRate 0.2191 Epoch: 1 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:41:19,642-Speed 4881.60 samples/sec Loss 10.1201 LearningRate 0.2196 Epoch: 1 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:41:28,029-Speed 4884.50 samples/sec Loss 10.1106 LearningRate 0.2200 Epoch: 1 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:41:36,256-Speed 4979.75 samples/sec Loss 10.0807 LearningRate 0.2205 Epoch: 1 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:41:44,513-Speed 4961.10 samples/sec Loss 10.0250 LearningRate 0.2210 Epoch: 1 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:41:52,794-Speed 4946.94 samples/sec Loss 10.0263 LearningRate 0.2215 Epoch: 1 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:42:01,041-Speed 4967.06 samples/sec Loss 10.0432 LearningRate 0.2220 Epoch: 1 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:42:09,216-Speed 5011.19 samples/sec Loss 10.1131 LearningRate 0.2224 Epoch: 1 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:42:17,537-Speed 4922.90 samples/sec Loss 10.0478 LearningRate 0.2229 Epoch: 1 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:42:25,826-Speed 4942.62 samples/sec Loss 10.0358 LearningRate 0.2234 Epoch: 1 Global Step: 4660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:42:33,955-Speed 5039.37 samples/sec Loss 10.0322 LearningRate 0.2239 Epoch: 1 Global Step: 4670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:42:42,112-Speed 5021.83 samples/sec Loss 10.0602 LearningRate 0.2244 Epoch: 1 Global Step: 4680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:42:50,886-Speed 4669.31 samples/sec Loss 10.0112 LearningRate 0.2248 Epoch: 1 Global Step: 4690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:42:59,915-Speed 4537.21 samples/sec Loss 10.0510 LearningRate 0.2253 Epoch: 1 Global Step: 4700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:43:08,225-Speed 4929.48 samples/sec Loss 10.0608 LearningRate 0.2258 Epoch: 1 Global Step: 4710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:43:16,466-Speed 4970.94 samples/sec Loss 10.1652 LearningRate 0.2263 Epoch: 1 Global Step: 4720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:43:24,636-Speed 5013.76 samples/sec Loss 10.1112 LearningRate 0.2267 Epoch: 1 Global Step: 4730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:43:32,813-Speed 5010.63 samples/sec Loss 10.0685 LearningRate 0.2272 Epoch: 1 Global Step: 4740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:43:41,061-Speed 4966.40 samples/sec Loss 10.0828 LearningRate 0.2277 Epoch: 1 Global Step: 4750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:43:49,318-Speed 4961.39 samples/sec Loss 10.0709 LearningRate 0.2282 Epoch: 1 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:43:57,481-Speed 5018.78 samples/sec Loss 10.0053 LearningRate 0.2287 Epoch: 1 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:44:05,575-Speed 5060.94 samples/sec Loss 10.0344 LearningRate 0.2291 Epoch: 1 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:44:13,853-Speed 4948.96 samples/sec Loss 10.1159 LearningRate 0.2296 Epoch: 1 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:44:22,014-Speed 5019.68 samples/sec Loss 10.1660 LearningRate 0.2301 Epoch: 1 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:44:30,191-Speed 5009.87 samples/sec Loss 10.0719 LearningRate 0.2306 Epoch: 1 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:44:38,344-Speed 5024.83 samples/sec Loss 10.0175 LearningRate 0.2311 Epoch: 1 Global Step: 4820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:44:46,527-Speed 5005.73 samples/sec Loss 10.0063 LearningRate 0.2315 Epoch: 1 Global Step: 4830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:44:55,014-Speed 4827.46 samples/sec Loss 10.0563 LearningRate 0.2320 Epoch: 1 Global Step: 4840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:45:03,406-Speed 4881.28 samples/sec Loss 10.0427 LearningRate 0.2325 Epoch: 1 Global Step: 4850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:45:11,701-Speed 4938.51 samples/sec Loss 9.9769 LearningRate 0.2330 Epoch: 1 Global Step: 4860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:45:19,912-Speed 4989.19 samples/sec Loss 9.9814 LearningRate 0.2335 Epoch: 1 Global Step: 4870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:45:28,159-Speed 4967.10 samples/sec Loss 10.0591 LearningRate 0.2339 Epoch: 1 Global Step: 4880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:45:36,467-Speed 4931.14 samples/sec Loss 10.0068 LearningRate 0.2344 Epoch: 1 Global Step: 4890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:45:44,591-Speed 5042.80 samples/sec Loss 9.9754 LearningRate 0.2349 Epoch: 1 Global Step: 4900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:45:52,747-Speed 5022.25 samples/sec Loss 9.9673 LearningRate 0.2354 Epoch: 1 Global Step: 4910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 22:46:00,813-Speed 5078.46 samples/sec Loss 9.9321 LearningRate 0.2359 Epoch: 1 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:46:09,036-Speed 4982.55 samples/sec Loss 10.0467 LearningRate 0.2363 Epoch: 1 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:46:17,211-Speed 5011.35 samples/sec Loss 10.0342 LearningRate 0.2368 Epoch: 1 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:46:25,453-Speed 4970.16 samples/sec Loss 9.9848 LearningRate 0.2373 Epoch: 1 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:46:33,751-Speed 4936.64 samples/sec Loss 9.9760 LearningRate 0.2378 Epoch: 1 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:46:42,091-Speed 4912.27 samples/sec Loss 9.9762 LearningRate 0.2383 Epoch: 1 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:46:50,331-Speed 4971.28 samples/sec Loss 9.9440 LearningRate 0.2387 Epoch: 1 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:46:58,442-Speed 5050.78 samples/sec Loss 9.9521 LearningRate 0.2392 Epoch: 1 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:47:06,544-Speed 5056.05 samples/sec Loss 9.9480 LearningRate 0.2397 Epoch: 1 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 22:47:54,831-[lfw][5000]XNorm: 20.365733 Training: 2022-01-16 22:47:54,832-[lfw][5000]Accuracy-Flip: 0.99583+-0.00335 Training: 2022-01-16 22:47:54,832-[lfw][5000]Accuracy-Highest: 0.99583 Training: 2022-01-16 22:48:50,810-[cfp_fp][5000]XNorm: 18.044106 Training: 2022-01-16 22:48:50,811-[cfp_fp][5000]Accuracy-Flip: 0.94557+-0.01113 Training: 2022-01-16 22:48:50,812-[cfp_fp][5000]Accuracy-Highest: 0.94557 Training: 2022-01-16 22:49:38,938-[agedb_30][5000]XNorm: 19.914426 Training: 2022-01-16 22:49:38,939-[agedb_30][5000]Accuracy-Flip: 0.95567+-0.01068 Training: 2022-01-16 22:49:38,939-[agedb_30][5000]Accuracy-Highest: 0.95567 Training: 2022-01-16 22:49:47,080-Speed 255.15 samples/sec Loss 9.9223 LearningRate 0.2402 Epoch: 1 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:49:55,275-Speed 4998.53 samples/sec Loss 10.0471 LearningRate 0.2407 Epoch: 1 Global Step: 5020 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:50:03,553-Speed 4949.34 samples/sec Loss 9.9719 LearningRate 0.2411 Epoch: 1 Global Step: 5030 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:50:11,766-Speed 4987.58 samples/sec Loss 9.9264 LearningRate 0.2416 Epoch: 1 Global Step: 5040 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:50:19,950-Speed 5005.06 samples/sec Loss 10.0270 LearningRate 0.2421 Epoch: 1 Global Step: 5050 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:50:28,105-Speed 5023.96 samples/sec Loss 10.0171 LearningRate 0.2426 Epoch: 1 Global Step: 5060 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:50:36,222-Speed 5046.52 samples/sec Loss 9.9957 LearningRate 0.2430 Epoch: 1 Global Step: 5070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:50:44,502-Speed 4947.41 samples/sec Loss 9.9123 LearningRate 0.2435 Epoch: 1 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:50:52,975-Speed 4835.22 samples/sec Loss 9.9952 LearningRate 0.2440 Epoch: 1 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:01,185-Speed 4989.51 samples/sec Loss 9.9019 LearningRate 0.2445 Epoch: 1 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:09,307-Speed 5043.99 samples/sec Loss 9.9331 LearningRate 0.2450 Epoch: 1 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:17,483-Speed 5010.25 samples/sec Loss 10.0110 LearningRate 0.2454 Epoch: 1 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:25,871-Speed 4884.16 samples/sec Loss 10.0327 LearningRate 0.2459 Epoch: 1 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:34,709-Speed 4635.19 samples/sec Loss 9.8656 LearningRate 0.2464 Epoch: 1 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:42,883-Speed 5011.53 samples/sec Loss 9.9706 LearningRate 0.2469 Epoch: 1 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:51,491-Speed 4758.85 samples/sec Loss 9.9388 LearningRate 0.2474 Epoch: 1 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:51:59,925-Speed 4857.54 samples/sec Loss 9.9672 LearningRate 0.2478 Epoch: 1 Global Step: 5170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:52:08,042-Speed 5046.93 samples/sec Loss 9.9421 LearningRate 0.2483 Epoch: 1 Global Step: 5180 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:52:16,182-Speed 5032.42 samples/sec Loss 9.9111 LearningRate 0.2488 Epoch: 1 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:52:24,364-Speed 5007.55 samples/sec Loss 9.9033 LearningRate 0.2493 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:52:32,737-Speed 4892.49 samples/sec Loss 9.9520 LearningRate 0.2498 Epoch: 1 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:52:41,071-Speed 4915.01 samples/sec Loss 10.0002 LearningRate 0.2502 Epoch: 1 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:52:49,328-Speed 4961.54 samples/sec Loss 9.8503 LearningRate 0.2507 Epoch: 1 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:52:57,639-Speed 4928.97 samples/sec Loss 9.8605 LearningRate 0.2512 Epoch: 1 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:53:05,860-Speed 4982.77 samples/sec Loss 9.8617 LearningRate 0.2517 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:53:14,037-Speed 5010.20 samples/sec Loss 9.9059 LearningRate 0.2522 Epoch: 1 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:53:22,191-Speed 5023.41 samples/sec Loss 9.8844 LearningRate 0.2526 Epoch: 1 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:53:30,363-Speed 5013.50 samples/sec Loss 10.0270 LearningRate 0.2531 Epoch: 1 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:53:38,769-Speed 4873.33 samples/sec Loss 9.8174 LearningRate 0.2536 Epoch: 1 Global Step: 5290 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:53:47,031-Speed 4957.92 samples/sec Loss 9.7955 LearningRate 0.2541 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:53:56,018-Speed 4558.57 samples/sec Loss 9.8661 LearningRate 0.2546 Epoch: 1 Global Step: 5310 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:54:04,883-Speed 4620.90 samples/sec Loss 9.9671 LearningRate 0.2550 Epoch: 1 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:54:13,953-Speed 4517.04 samples/sec Loss 9.8344 LearningRate 0.2555 Epoch: 1 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:54:22,330-Speed 4890.00 samples/sec Loss 9.8543 LearningRate 0.2560 Epoch: 1 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:54:30,588-Speed 4960.20 samples/sec Loss 9.9133 LearningRate 0.2565 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:54:39,082-Speed 4823.07 samples/sec Loss 9.8630 LearningRate 0.2570 Epoch: 1 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:54:47,695-Speed 4756.54 samples/sec Loss 9.8540 LearningRate 0.2574 Epoch: 1 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:54:55,847-Speed 5025.10 samples/sec Loss 9.8629 LearningRate 0.2579 Epoch: 1 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:55:03,956-Speed 5051.48 samples/sec Loss 9.8427 LearningRate 0.2584 Epoch: 1 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:55:12,134-Speed 5009.22 samples/sec Loss 9.8718 LearningRate 0.2589 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:55:20,309-Speed 5011.16 samples/sec Loss 9.9007 LearningRate 0.2593 Epoch: 1 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:55:29,171-Speed 4622.67 samples/sec Loss 9.7584 LearningRate 0.2598 Epoch: 1 Global Step: 5420 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:55:37,326-Speed 5023.62 samples/sec Loss 9.8513 LearningRate 0.2603 Epoch: 1 Global Step: 5430 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:55:45,493-Speed 5016.35 samples/sec Loss 9.8043 LearningRate 0.2608 Epoch: 1 Global Step: 5440 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:55:53,746-Speed 4963.13 samples/sec Loss 9.7977 LearningRate 0.2613 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:01,950-Speed 4993.63 samples/sec Loss 9.9720 LearningRate 0.2617 Epoch: 1 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:10,111-Speed 5019.80 samples/sec Loss 9.9319 LearningRate 0.2622 Epoch: 1 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:18,223-Speed 5049.99 samples/sec Loss 9.8725 LearningRate 0.2627 Epoch: 1 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:26,401-Speed 5008.84 samples/sec Loss 9.7983 LearningRate 0.2632 Epoch: 1 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:34,548-Speed 5028.78 samples/sec Loss 9.8542 LearningRate 0.2637 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:42,860-Speed 4928.58 samples/sec Loss 9.8458 LearningRate 0.2641 Epoch: 1 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:50,955-Speed 5060.38 samples/sec Loss 9.8665 LearningRate 0.2646 Epoch: 1 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:56:59,123-Speed 5015.70 samples/sec Loss 9.9908 LearningRate 0.2651 Epoch: 1 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:57:07,205-Speed 5068.90 samples/sec Loss 9.9735 LearningRate 0.2656 Epoch: 1 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:57:15,470-Speed 4956.81 samples/sec Loss 9.9058 LearningRate 0.2661 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:57:23,651-Speed 5007.44 samples/sec Loss 9.9443 LearningRate 0.2665 Epoch: 1 Global Step: 5560 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:57:31,963-Speed 4928.63 samples/sec Loss 9.7892 LearningRate 0.2670 Epoch: 1 Global Step: 5570 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:57:40,376-Speed 4869.02 samples/sec Loss 9.7221 LearningRate 0.2675 Epoch: 1 Global Step: 5580 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:57:48,574-Speed 4997.15 samples/sec Loss 9.7510 LearningRate 0.2680 Epoch: 1 Global Step: 5590 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:57:56,728-Speed 5023.81 samples/sec Loss 9.7941 LearningRate 0.2685 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:58:04,976-Speed 4966.76 samples/sec Loss 9.7338 LearningRate 0.2689 Epoch: 1 Global Step: 5610 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:58:13,210-Speed 4975.38 samples/sec Loss 9.8486 LearningRate 0.2694 Epoch: 1 Global Step: 5620 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:58:21,596-Speed 4885.11 samples/sec Loss 9.8238 LearningRate 0.2699 Epoch: 1 Global Step: 5630 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:58:29,938-Speed 4910.51 samples/sec Loss 9.7919 LearningRate 0.2704 Epoch: 1 Global Step: 5640 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:58:38,234-Speed 4938.05 samples/sec Loss 9.8436 LearningRate 0.2709 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 22:58:46,610-Speed 4890.59 samples/sec Loss 9.8350 LearningRate 0.2713 Epoch: 1 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:58:54,842-Speed 4976.72 samples/sec Loss 9.7162 LearningRate 0.2718 Epoch: 1 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:59:03,097-Speed 4962.52 samples/sec Loss 9.7602 LearningRate 0.2723 Epoch: 1 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:59:11,228-Speed 5037.95 samples/sec Loss 9.6925 LearningRate 0.2728 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:59:19,356-Speed 5040.07 samples/sec Loss 9.7012 LearningRate 0.2733 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:59:27,483-Speed 5040.67 samples/sec Loss 9.7739 LearningRate 0.2737 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:59:35,610-Speed 5040.96 samples/sec Loss 9.9040 LearningRate 0.2742 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:59:43,795-Speed 5005.05 samples/sec Loss 9.8717 LearningRate 0.2747 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 22:59:52,038-Speed 4969.52 samples/sec Loss 9.7998 LearningRate 0.2752 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:00:00,434-Speed 4879.06 samples/sec Loss 9.7456 LearningRate 0.2756 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:00:08,522-Speed 5065.13 samples/sec Loss 9.7670 LearningRate 0.2761 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 23:00:16,644-Speed 5044.28 samples/sec Loss 9.7840 LearningRate 0.2766 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 23:00:24,837-Speed 5000.27 samples/sec Loss 9.7008 LearningRate 0.2771 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 23:00:33,064-Speed 4978.90 samples/sec Loss 9.8575 LearningRate 0.2776 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:00:41,264-Speed 4996.21 samples/sec Loss 9.8516 LearningRate 0.2780 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:00:49,543-Speed 4947.94 samples/sec Loss 9.7422 LearningRate 0.2785 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:00:57,840-Speed 4937.60 samples/sec Loss 9.7254 LearningRate 0.2790 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:01:05,952-Speed 5050.01 samples/sec Loss 9.7870 LearningRate 0.2795 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:01:14,116-Speed 5017.84 samples/sec Loss 9.8415 LearningRate 0.2800 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:01:22,309-Speed 4999.80 samples/sec Loss 9.8611 LearningRate 0.2804 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:01:30,606-Speed 4937.70 samples/sec Loss 9.7994 LearningRate 0.2809 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:01:38,884-Speed 4948.31 samples/sec Loss 9.7573 LearningRate 0.2814 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:01:47,248-Speed 4897.55 samples/sec Loss 9.7897 LearningRate 0.2819 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:01:55,587-Speed 4912.88 samples/sec Loss 9.7839 LearningRate 0.2824 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:02:03,779-Speed 5001.04 samples/sec Loss 9.8314 LearningRate 0.2828 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:02:11,929-Speed 5026.04 samples/sec Loss 9.7410 LearningRate 0.2833 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:02:20,052-Speed 5043.37 samples/sec Loss 9.7393 LearningRate 0.2838 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:02:28,280-Speed 4979.12 samples/sec Loss 9.7898 LearningRate 0.2843 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:02:36,417-Speed 5034.42 samples/sec Loss 9.6952 LearningRate 0.2848 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:02:44,526-Speed 5051.52 samples/sec Loss 9.8754 LearningRate 0.2852 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:02:52,902-Speed 4890.86 samples/sec Loss 9.7668 LearningRate 0.2857 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:01,423-Speed 4807.59 samples/sec Loss 9.7664 LearningRate 0.2862 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:09,878-Speed 4844.89 samples/sec Loss 9.8546 LearningRate 0.2867 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:18,299-Speed 4864.87 samples/sec Loss 9.7275 LearningRate 0.2872 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:26,473-Speed 5011.65 samples/sec Loss 9.7536 LearningRate 0.2876 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:34,558-Speed 5066.50 samples/sec Loss 9.7392 LearningRate 0.2881 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:42,911-Speed 4904.79 samples/sec Loss 9.7910 LearningRate 0.2886 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:51,104-Speed 5000.18 samples/sec Loss 9.7403 LearningRate 0.2891 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:03:59,339-Speed 4974.69 samples/sec Loss 9.7399 LearningRate 0.2895 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-16 23:04:07,492-Speed 5023.89 samples/sec Loss 9.8315 LearningRate 0.2900 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-16 23:04:15,794-Speed 4934.52 samples/sec Loss 9.7457 LearningRate 0.2905 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:04:24,019-Speed 4980.89 samples/sec Loss 9.7503 LearningRate 0.2910 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-16 23:04:32,165-Speed 5028.39 samples/sec Loss 9.7754 LearningRate 0.2915 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:04:40,277-Speed 5050.37 samples/sec Loss 9.7921 LearningRate 0.2919 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:04:48,467-Speed 5001.89 samples/sec Loss 9.7807 LearningRate 0.2924 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:04:56,692-Speed 4980.58 samples/sec Loss 9.8300 LearningRate 0.2929 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:05:04,853-Speed 5019.52 samples/sec Loss 9.7536 LearningRate 0.2934 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:05:12,955-Speed 5056.45 samples/sec Loss 9.7603 LearningRate 0.2939 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:05:21,110-Speed 5023.20 samples/sec Loss 9.8816 LearningRate 0.2943 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:05:29,322-Speed 4988.22 samples/sec Loss 9.7938 LearningRate 0.2948 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-16 23:05:37,458-Speed 5035.31 samples/sec Loss 9.7322 LearningRate 0.2953 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:05:45,677-Speed 4984.52 samples/sec Loss 9.7333 LearningRate 0.2958 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:05:53,863-Speed 5003.93 samples/sec Loss 9.8089 LearningRate 0.2963 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:06:02,044-Speed 5007.93 samples/sec Loss 9.7996 LearningRate 0.2967 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:06:10,426-Speed 4887.37 samples/sec Loss 9.6751 LearningRate 0.2972 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:06:18,985-Speed 4786.19 samples/sec Loss 9.7572 LearningRate 0.2977 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:06:27,358-Speed 4892.56 samples/sec Loss 9.8231 LearningRate 0.2982 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:06:35,611-Speed 4963.78 samples/sec Loss 9.8268 LearningRate 0.2987 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:06:43,762-Speed 5025.97 samples/sec Loss 9.7373 LearningRate 0.2991 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:06:52,034-Speed 4951.99 samples/sec Loss 9.8164 LearningRate 0.2996 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:07:00,388-Speed 4903.56 samples/sec Loss 9.7794 LearningRate 0.3001 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:07:08,724-Speed 4914.27 samples/sec Loss 9.7443 LearningRate 0.3006 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:07:16,925-Speed 4995.09 samples/sec Loss 9.6562 LearningRate 0.3011 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:07:25,002-Speed 5071.92 samples/sec Loss 9.8454 LearningRate 0.3015 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:07:33,347-Speed 4909.32 samples/sec Loss 9.7735 LearningRate 0.3020 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:07:41,436-Speed 5064.67 samples/sec Loss 9.7930 LearningRate 0.3025 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:07:49,571-Speed 5035.44 samples/sec Loss 9.7089 LearningRate 0.3030 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:07:57,805-Speed 4975.03 samples/sec Loss 9.6912 LearningRate 0.3035 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:08:06,052-Speed 4967.25 samples/sec Loss 9.8562 LearningRate 0.3039 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:08:15,028-Speed 4563.96 samples/sec Loss 9.6810 LearningRate 0.3044 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:08:23,422-Speed 4880.11 samples/sec Loss 9.7333 LearningRate 0.3049 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:08:31,552-Speed 5039.26 samples/sec Loss 9.7293 LearningRate 0.3054 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:08:39,993-Speed 4853.18 samples/sec Loss 9.7428 LearningRate 0.3058 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:08:48,199-Speed 4991.87 samples/sec Loss 9.8648 LearningRate 0.3063 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:08:56,498-Speed 4936.68 samples/sec Loss 9.7416 LearningRate 0.3068 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:09:05,027-Speed 4803.32 samples/sec Loss 9.8376 LearningRate 0.3073 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:09:13,387-Speed 4899.63 samples/sec Loss 9.7398 LearningRate 0.3078 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:09:21,662-Speed 4950.90 samples/sec Loss 9.7189 LearningRate 0.3082 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:09:29,749-Speed 5065.14 samples/sec Loss 9.7447 LearningRate 0.3087 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:09:37,983-Speed 4975.37 samples/sec Loss 9.8415 LearningRate 0.3092 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:09:46,320-Speed 4913.71 samples/sec Loss 9.7508 LearningRate 0.3097 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:09:56,511-Speed 4019.73 samples/sec Loss 9.7437 LearningRate 0.3102 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:10:05,075-Speed 4783.60 samples/sec Loss 9.8261 LearningRate 0.3106 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:10:13,475-Speed 4877.30 samples/sec Loss 9.7754 LearningRate 0.3111 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:10:25,318-Speed 3458.90 samples/sec Loss 9.7778 LearningRate 0.3116 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:10:33,786-Speed 4837.71 samples/sec Loss 9.7363 LearningRate 0.3121 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:10:41,928-Speed 5031.67 samples/sec Loss 9.7632 LearningRate 0.3126 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:10:50,065-Speed 5034.27 samples/sec Loss 9.6915 LearningRate 0.3130 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:10:58,249-Speed 5005.53 samples/sec Loss 9.8376 LearningRate 0.3135 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:11:06,547-Speed 4936.97 samples/sec Loss 9.8593 LearningRate 0.3140 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:11:15,313-Speed 4672.89 samples/sec Loss 9.6810 LearningRate 0.3145 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:11:24,163-Speed 4629.01 samples/sec Loss 9.7878 LearningRate 0.3150 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:11:32,881-Speed 4698.41 samples/sec Loss 9.7225 LearningRate 0.3154 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:11:41,624-Speed 4686.06 samples/sec Loss 9.7709 LearningRate 0.3159 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:11:50,382-Speed 4677.10 samples/sec Loss 9.7864 LearningRate 0.3164 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:11:59,117-Speed 4690.06 samples/sec Loss 9.7697 LearningRate 0.3169 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:12:07,866-Speed 4682.07 samples/sec Loss 9.8232 LearningRate 0.3174 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:12:16,622-Speed 4678.53 samples/sec Loss 9.7505 LearningRate 0.3178 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:12:25,122-Speed 4820.01 samples/sec Loss 9.8181 LearningRate 0.3183 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:12:33,254-Speed 5036.92 samples/sec Loss 9.8230 LearningRate 0.3188 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:12:41,424-Speed 5014.06 samples/sec Loss 9.7301 LearningRate 0.3193 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:12:49,597-Speed 5012.81 samples/sec Loss 9.8011 LearningRate 0.3198 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:12:57,761-Speed 5017.72 samples/sec Loss 9.8463 LearningRate 0.3202 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:13:06,206-Speed 4850.81 samples/sec Loss 9.8670 LearningRate 0.3207 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:13:14,305-Speed 5058.18 samples/sec Loss 9.8120 LearningRate 0.3212 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:13:22,427-Speed 5044.03 samples/sec Loss 9.8388 LearningRate 0.3217 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:13:30,678-Speed 4964.97 samples/sec Loss 9.8096 LearningRate 0.3221 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:13:38,845-Speed 5015.40 samples/sec Loss 9.7683 LearningRate 0.3226 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:13:47,362-Speed 4809.99 samples/sec Loss 9.7058 LearningRate 0.3231 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:13:55,505-Speed 5030.99 samples/sec Loss 9.9139 LearningRate 0.3236 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:14:03,645-Speed 5032.31 samples/sec Loss 9.7575 LearningRate 0.3241 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:14:11,794-Speed 5027.42 samples/sec Loss 9.7419 LearningRate 0.3245 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:14:20,063-Speed 4953.82 samples/sec Loss 9.7529 LearningRate 0.3250 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:14:28,200-Speed 5034.75 samples/sec Loss 9.7894 LearningRate 0.3255 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:14:36,298-Speed 5058.53 samples/sec Loss 9.9798 LearningRate 0.3260 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:14:44,399-Speed 5056.98 samples/sec Loss 9.8896 LearningRate 0.3265 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:14:52,567-Speed 5015.90 samples/sec Loss 9.7689 LearningRate 0.3269 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:00,704-Speed 5034.64 samples/sec Loss 9.7905 LearningRate 0.3274 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:08,850-Speed 5028.63 samples/sec Loss 9.8285 LearningRate 0.3279 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:17,002-Speed 5025.36 samples/sec Loss 9.7322 LearningRate 0.3284 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:25,286-Speed 4944.65 samples/sec Loss 9.8050 LearningRate 0.3289 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:33,461-Speed 5011.93 samples/sec Loss 9.8257 LearningRate 0.3293 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:41,592-Speed 5037.77 samples/sec Loss 9.8636 LearningRate 0.3298 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:49,747-Speed 5023.17 samples/sec Loss 9.8871 LearningRate 0.3303 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:15:58,224-Speed 4833.11 samples/sec Loss 9.8260 LearningRate 0.3308 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:16:06,695-Speed 4835.84 samples/sec Loss 9.8112 LearningRate 0.3313 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:16:14,930-Speed 4974.26 samples/sec Loss 9.8527 LearningRate 0.3317 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:16:23,172-Speed 4970.56 samples/sec Loss 9.6881 LearningRate 0.3322 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:16:31,325-Speed 5024.00 samples/sec Loss 9.7886 LearningRate 0.3327 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:16:39,445-Speed 5045.54 samples/sec Loss 9.8250 LearningRate 0.3332 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:16:47,586-Speed 5031.97 samples/sec Loss 9.8561 LearningRate 0.3337 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:16:55,723-Speed 5034.15 samples/sec Loss 9.8775 LearningRate 0.3341 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:17:03,950-Speed 4979.71 samples/sec Loss 9.8064 LearningRate 0.3346 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:17:12,082-Speed 5037.95 samples/sec Loss 9.9173 LearningRate 0.3351 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:17:20,369-Speed 4943.01 samples/sec Loss 9.7748 LearningRate 0.3356 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:17:28,584-Speed 4986.89 samples/sec Loss 9.8251 LearningRate 0.3360 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:17:36,733-Speed 5026.88 samples/sec Loss 9.6942 LearningRate 0.3365 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:17:45,038-Speed 4932.37 samples/sec Loss 9.7443 LearningRate 0.3370 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:17:53,207-Speed 5015.42 samples/sec Loss 9.7997 LearningRate 0.3375 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:18:01,337-Speed 5038.43 samples/sec Loss 9.7733 LearningRate 0.3380 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:18:09,575-Speed 4973.38 samples/sec Loss 9.7921 LearningRate 0.3384 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:18:17,760-Speed 5005.18 samples/sec Loss 9.7844 LearningRate 0.3389 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:18:25,933-Speed 5012.01 samples/sec Loss 9.9559 LearningRate 0.3394 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:18:34,130-Speed 4997.06 samples/sec Loss 9.7697 LearningRate 0.3399 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:18:42,353-Speed 4982.30 samples/sec Loss 9.7838 LearningRate 0.3404 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:18:50,530-Speed 5009.69 samples/sec Loss 9.8546 LearningRate 0.3408 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:18:58,682-Speed 5025.38 samples/sec Loss 9.8146 LearningRate 0.3413 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:19:06,983-Speed 4934.73 samples/sec Loss 9.8155 LearningRate 0.3418 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:19:15,181-Speed 4996.83 samples/sec Loss 9.9897 LearningRate 0.3423 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:19:23,360-Speed 5008.70 samples/sec Loss 9.8275 LearningRate 0.3428 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:19:31,866-Speed 4815.67 samples/sec Loss 9.9264 LearningRate 0.3432 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:19:40,339-Speed 4834.90 samples/sec Loss 9.7454 LearningRate 0.3437 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:19:48,474-Speed 5035.96 samples/sec Loss 9.8044 LearningRate 0.3442 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:19:56,638-Speed 5017.48 samples/sec Loss 9.8325 LearningRate 0.3447 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:20:04,784-Speed 5029.11 samples/sec Loss 9.8461 LearningRate 0.3452 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:20:12,900-Speed 5047.76 samples/sec Loss 9.8275 LearningRate 0.3456 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:20:21,005-Speed 5054.12 samples/sec Loss 9.8531 LearningRate 0.3461 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:20:29,173-Speed 5015.33 samples/sec Loss 9.8544 LearningRate 0.3466 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:20:37,339-Speed 5016.41 samples/sec Loss 9.8268 LearningRate 0.3471 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:20:45,462-Speed 5043.76 samples/sec Loss 9.7438 LearningRate 0.3476 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:20:53,612-Speed 5026.13 samples/sec Loss 10.0000 LearningRate 0.3480 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:21:01,830-Speed 4985.18 samples/sec Loss 9.9096 LearningRate 0.3485 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:21:09,957-Speed 5040.94 samples/sec Loss 9.8712 LearningRate 0.3490 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:21:18,124-Speed 5015.99 samples/sec Loss 9.7898 LearningRate 0.3495 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:21:26,295-Speed 5013.21 samples/sec Loss 9.8282 LearningRate 0.3500 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:21:34,509-Speed 4987.32 samples/sec Loss 9.8030 LearningRate 0.3504 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:21:42,734-Speed 4980.79 samples/sec Loss 9.9569 LearningRate 0.3509 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:21:50,888-Speed 5023.96 samples/sec Loss 10.0445 LearningRate 0.3514 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:21:59,223-Speed 4914.60 samples/sec Loss 9.9180 LearningRate 0.3519 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:22:07,384-Speed 5019.83 samples/sec Loss 9.9145 LearningRate 0.3523 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:22:15,692-Speed 4930.93 samples/sec Loss 9.8147 LearningRate 0.3528 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:22:24,201-Speed 4814.39 samples/sec Loss 9.9359 LearningRate 0.3533 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:22:32,393-Speed 5000.54 samples/sec Loss 9.9129 LearningRate 0.3538 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:22:40,613-Speed 4983.15 samples/sec Loss 9.9963 LearningRate 0.3543 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:22:48,764-Speed 5026.45 samples/sec Loss 9.8483 LearningRate 0.3547 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:22:56,962-Speed 4997.16 samples/sec Loss 9.9440 LearningRate 0.3552 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:23:05,113-Speed 5025.32 samples/sec Loss 9.8265 LearningRate 0.3557 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:23:13,368-Speed 4962.98 samples/sec Loss 9.8764 LearningRate 0.3562 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:23:21,509-Speed 5032.30 samples/sec Loss 9.9166 LearningRate 0.3567 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:23:29,634-Speed 5042.09 samples/sec Loss 10.0692 LearningRate 0.3571 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:23:37,883-Speed 4966.20 samples/sec Loss 9.8692 LearningRate 0.3576 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:23:46,034-Speed 5025.92 samples/sec Loss 9.9661 LearningRate 0.3581 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:23:54,252-Speed 4984.18 samples/sec Loss 9.8870 LearningRate 0.3586 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:24:02,347-Speed 5061.06 samples/sec Loss 9.8035 LearningRate 0.3591 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:24:10,909-Speed 4784.82 samples/sec Loss 9.9499 LearningRate 0.3595 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:24:19,461-Speed 4790.04 samples/sec Loss 9.8243 LearningRate 0.3600 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:24:27,724-Speed 4958.25 samples/sec Loss 9.8144 LearningRate 0.3605 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:24:36,012-Speed 4942.64 samples/sec Loss 9.9107 LearningRate 0.3610 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:24:44,559-Speed 4792.57 samples/sec Loss 9.9022 LearningRate 0.3615 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:24:53,279-Speed 4697.73 samples/sec Loss 9.8414 LearningRate 0.3619 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:25:01,840-Speed 4785.60 samples/sec Loss 9.8587 LearningRate 0.3624 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:25:10,136-Speed 4938.17 samples/sec Loss 9.9209 LearningRate 0.3629 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:25:18,349-Speed 4987.72 samples/sec Loss 10.0144 LearningRate 0.3634 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:25:26,954-Speed 4760.98 samples/sec Loss 9.9728 LearningRate 0.3639 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:25:35,739-Speed 4662.86 samples/sec Loss 9.8640 LearningRate 0.3643 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:25:44,657-Speed 4593.36 samples/sec Loss 9.9244 LearningRate 0.3648 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:25:53,669-Speed 4545.58 samples/sec Loss 9.9602 LearningRate 0.3653 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:26:02,695-Speed 4538.33 samples/sec Loss 9.8200 LearningRate 0.3658 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:26:12,126-Speed 4343.81 samples/sec Loss 9.9075 LearningRate 0.3663 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:26:21,465-Speed 4386.50 samples/sec Loss 9.9312 LearningRate 0.3667 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:26:30,847-Speed 4366.46 samples/sec Loss 9.9441 LearningRate 0.3672 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:26:39,873-Speed 4538.31 samples/sec Loss 9.9311 LearningRate 0.3677 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:26:48,135-Speed 4958.47 samples/sec Loss 10.0925 LearningRate 0.3682 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:26:56,342-Speed 4991.23 samples/sec Loss 10.0710 LearningRate 0.3686 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:27:04,410-Speed 5077.83 samples/sec Loss 10.0226 LearningRate 0.3691 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:27:12,554-Speed 5030.53 samples/sec Loss 9.9020 LearningRate 0.3696 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:27:20,881-Speed 4919.25 samples/sec Loss 9.8968 LearningRate 0.3701 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:27:29,451-Speed 4780.29 samples/sec Loss 9.8472 LearningRate 0.3706 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:27:38,108-Speed 4731.82 samples/sec Loss 10.0181 LearningRate 0.3710 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:27:46,906-Speed 4656.50 samples/sec Loss 9.9511 LearningRate 0.3715 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:27:55,929-Speed 4540.04 samples/sec Loss 9.9515 LearningRate 0.3720 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:28:04,740-Speed 4648.92 samples/sec Loss 9.8227 LearningRate 0.3725 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:28:13,509-Speed 4671.85 samples/sec Loss 9.8915 LearningRate 0.3730 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:28:22,229-Speed 4698.53 samples/sec Loss 9.8807 LearningRate 0.3734 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:28:30,638-Speed 4871.15 samples/sec Loss 10.0459 LearningRate 0.3739 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:28:38,737-Speed 5058.16 samples/sec Loss 9.9286 LearningRate 0.3744 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:28:46,809-Speed 5075.75 samples/sec Loss 9.9988 LearningRate 0.3749 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:28:54,901-Speed 5062.71 samples/sec Loss 9.9706 LearningRate 0.3754 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:29:03,150-Speed 4966.12 samples/sec Loss 9.9899 LearningRate 0.3758 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:29:11,295-Speed 5029.47 samples/sec Loss 9.9502 LearningRate 0.3763 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-16 23:29:19,451-Speed 5022.82 samples/sec Loss 9.9488 LearningRate 0.3768 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:29:27,324-Speed 5203.48 samples/sec Loss 10.0000 LearningRate 0.3773 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:29:35,507-Speed 5006.30 samples/sec Loss 9.9405 LearningRate 0.3778 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:29:43,637-Speed 5038.74 samples/sec Loss 10.0185 LearningRate 0.3782 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:29:51,748-Speed 5050.65 samples/sec Loss 10.0525 LearningRate 0.3787 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:29:59,853-Speed 5054.07 samples/sec Loss 10.0102 LearningRate 0.3792 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:30:07,960-Speed 5053.36 samples/sec Loss 9.9189 LearningRate 0.3797 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:30:16,074-Speed 5048.97 samples/sec Loss 9.9507 LearningRate 0.3802 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:30:24,169-Speed 5060.11 samples/sec Loss 9.9387 LearningRate 0.3806 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:30:32,588-Speed 4866.03 samples/sec Loss 9.9166 LearningRate 0.3811 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:30:40,948-Speed 4900.00 samples/sec Loss 9.9434 LearningRate 0.3816 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:30:49,394-Speed 4850.30 samples/sec Loss 10.0600 LearningRate 0.3821 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:30:57,561-Speed 5016.52 samples/sec Loss 10.1023 LearningRate 0.3826 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:31:05,673-Speed 5049.91 samples/sec Loss 10.0685 LearningRate 0.3830 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:31:13,825-Speed 5025.05 samples/sec Loss 10.0463 LearningRate 0.3835 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:31:21,923-Speed 5058.61 samples/sec Loss 10.0241 LearningRate 0.3840 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:31:30,033-Speed 5051.47 samples/sec Loss 10.0311 LearningRate 0.3845 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:31:38,232-Speed 4996.10 samples/sec Loss 10.0694 LearningRate 0.3849 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:31:46,493-Speed 4959.11 samples/sec Loss 10.0310 LearningRate 0.3854 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:31:54,620-Speed 5040.43 samples/sec Loss 9.9311 LearningRate 0.3859 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:32:02,848-Speed 4979.55 samples/sec Loss 10.0133 LearningRate 0.3864 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:32:11,021-Speed 5011.76 samples/sec Loss 10.0304 LearningRate 0.3869 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-16 23:32:19,285-Speed 4956.82 samples/sec Loss 9.9218 LearningRate 0.3873 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:32:27,511-Speed 4980.60 samples/sec Loss 10.0625 LearningRate 0.3878 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:32:35,637-Speed 5041.00 samples/sec Loss 10.0509 LearningRate 0.3883 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:32:43,873-Speed 4973.81 samples/sec Loss 10.0882 LearningRate 0.3888 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:32:52,152-Speed 4948.11 samples/sec Loss 10.1851 LearningRate 0.3893 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:00,382-Speed 4977.47 samples/sec Loss 10.0914 LearningRate 0.3897 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:08,552-Speed 5014.21 samples/sec Loss 9.9578 LearningRate 0.3902 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:16,711-Speed 5021.18 samples/sec Loss 10.0143 LearningRate 0.3907 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:24,858-Speed 5028.30 samples/sec Loss 10.0550 LearningRate 0.3912 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:33,084-Speed 4979.69 samples/sec Loss 10.0520 LearningRate 0.3917 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:41,258-Speed 5012.29 samples/sec Loss 10.1287 LearningRate 0.3921 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:49,406-Speed 5027.18 samples/sec Loss 9.9512 LearningRate 0.3926 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:33:58,078-Speed 4724.35 samples/sec Loss 10.0176 LearningRate 0.3931 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:34:06,861-Speed 4663.79 samples/sec Loss 9.9897 LearningRate 0.3936 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:34:15,667-Speed 4652.27 samples/sec Loss 10.0142 LearningRate 0.3941 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:34:24,462-Speed 4657.43 samples/sec Loss 10.0282 LearningRate 0.3945 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:34:33,273-Speed 4649.69 samples/sec Loss 9.9598 LearningRate 0.3950 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:34:42,050-Speed 4667.04 samples/sec Loss 10.1191 LearningRate 0.3955 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:34:50,773-Speed 4696.54 samples/sec Loss 10.0562 LearningRate 0.3960 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:34:59,559-Speed 4662.27 samples/sec Loss 10.1419 LearningRate 0.3965 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:35:08,365-Speed 4651.73 samples/sec Loss 10.1427 LearningRate 0.3969 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:35:16,667-Speed 4935.05 samples/sec Loss 10.0325 LearningRate 0.3974 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:35:24,818-Speed 5025.55 samples/sec Loss 10.0845 LearningRate 0.3979 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:35:32,973-Speed 5023.70 samples/sec Loss 10.0206 LearningRate 0.3984 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:35:41,158-Speed 5004.58 samples/sec Loss 10.1022 LearningRate 0.3988 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:35:49,341-Speed 5006.17 samples/sec Loss 10.1082 LearningRate 0.3993 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:35:58,039-Speed 4709.66 samples/sec Loss 10.0676 LearningRate 0.3998 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:36:34,031-Speed 1138.09 samples/sec Loss 9.7822 LearningRate 0.3999 Epoch: 2 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:36:42,407-Speed 4891.12 samples/sec Loss 9.5873 LearningRate 0.3998 Epoch: 2 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:36:50,756-Speed 4906.48 samples/sec Loss 9.5136 LearningRate 0.3997 Epoch: 2 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:36:58,994-Speed 4972.69 samples/sec Loss 9.5716 LearningRate 0.3996 Epoch: 2 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:37:07,970-Speed 4563.78 samples/sec Loss 9.5764 LearningRate 0.3995 Epoch: 2 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:37:16,472-Speed 4818.89 samples/sec Loss 9.6033 LearningRate 0.3994 Epoch: 2 Global Step: 8400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:37:24,601-Speed 5039.17 samples/sec Loss 9.6344 LearningRate 0.3993 Epoch: 2 Global Step: 8410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:37:32,914-Speed 4927.71 samples/sec Loss 9.5614 LearningRate 0.3992 Epoch: 2 Global Step: 8420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:37:41,091-Speed 5011.18 samples/sec Loss 9.5793 LearningRate 0.3991 Epoch: 2 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:37:49,173-Speed 5068.90 samples/sec Loss 9.7396 LearningRate 0.3990 Epoch: 2 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:37:57,194-Speed 5107.63 samples/sec Loss 9.6882 LearningRate 0.3989 Epoch: 2 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:38:05,318-Speed 5042.89 samples/sec Loss 9.7981 LearningRate 0.3988 Epoch: 2 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:38:13,424-Speed 5054.01 samples/sec Loss 9.7985 LearningRate 0.3987 Epoch: 2 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:38:21,544-Speed 5045.19 samples/sec Loss 9.7776 LearningRate 0.3986 Epoch: 2 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:38:29,688-Speed 5030.10 samples/sec Loss 9.7421 LearningRate 0.3984 Epoch: 2 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:38:37,758-Speed 5075.59 samples/sec Loss 9.6693 LearningRate 0.3983 Epoch: 2 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:38:45,875-Speed 5047.56 samples/sec Loss 9.7848 LearningRate 0.3982 Epoch: 2 Global Step: 8510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:38:53,946-Speed 5076.03 samples/sec Loss 9.7555 LearningRate 0.3981 Epoch: 2 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:39:02,078-Speed 5037.65 samples/sec Loss 9.7788 LearningRate 0.3980 Epoch: 2 Global Step: 8530 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:39:10,194-Speed 5047.32 samples/sec Loss 9.7268 LearningRate 0.3979 Epoch: 2 Global Step: 8540 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:39:18,559-Speed 4897.33 samples/sec Loss 9.7908 LearningRate 0.3978 Epoch: 2 Global Step: 8550 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:39:26,760-Speed 4995.64 samples/sec Loss 9.9481 LearningRate 0.3977 Epoch: 2 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:39:34,932-Speed 5013.05 samples/sec Loss 10.2003 LearningRate 0.3976 Epoch: 2 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:39:43,012-Speed 5069.81 samples/sec Loss 10.0130 LearningRate 0.3975 Epoch: 2 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:39:51,236-Speed 4981.25 samples/sec Loss 9.8388 LearningRate 0.3974 Epoch: 2 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:39:59,190-Speed 5150.29 samples/sec Loss 9.9290 LearningRate 0.3973 Epoch: 2 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:40:07,184-Speed 5124.78 samples/sec Loss 9.8288 LearningRate 0.3972 Epoch: 2 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:40:15,474-Speed 4940.86 samples/sec Loss 9.8861 LearningRate 0.3971 Epoch: 2 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:40:23,697-Speed 4982.36 samples/sec Loss 9.8112 LearningRate 0.3970 Epoch: 2 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:40:31,819-Speed 5043.80 samples/sec Loss 9.8671 LearningRate 0.3969 Epoch: 2 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:40:39,990-Speed 5013.27 samples/sec Loss 9.8257 LearningRate 0.3967 Epoch: 2 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:40:47,851-Speed 5211.20 samples/sec Loss 9.7413 LearningRate 0.3966 Epoch: 2 Global Step: 8660 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:40:56,201-Speed 4906.36 samples/sec Loss 9.8638 LearningRate 0.3965 Epoch: 2 Global Step: 8670 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:41:04,301-Speed 5057.55 samples/sec Loss 9.8275 LearningRate 0.3964 Epoch: 2 Global Step: 8680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:41:12,518-Speed 4985.82 samples/sec Loss 9.9017 LearningRate 0.3963 Epoch: 2 Global Step: 8690 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:41:20,634-Speed 5046.99 samples/sec Loss 9.7736 LearningRate 0.3962 Epoch: 2 Global Step: 8700 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:41:28,752-Speed 5046.11 samples/sec Loss 9.7545 LearningRate 0.3961 Epoch: 2 Global Step: 8710 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:41:36,838-Speed 5067.03 samples/sec Loss 9.9371 LearningRate 0.3960 Epoch: 2 Global Step: 8720 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:41:44,930-Speed 5061.70 samples/sec Loss 9.8609 LearningRate 0.3959 Epoch: 2 Global Step: 8730 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:41:53,075-Speed 5029.96 samples/sec Loss 9.7362 LearningRate 0.3958 Epoch: 2 Global Step: 8740 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:42:01,494-Speed 4865.81 samples/sec Loss 9.8226 LearningRate 0.3957 Epoch: 2 Global Step: 8750 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:42:09,553-Speed 5083.32 samples/sec Loss 9.7463 LearningRate 0.3956 Epoch: 2 Global Step: 8760 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:42:17,961-Speed 4872.00 samples/sec Loss 9.8714 LearningRate 0.3955 Epoch: 2 Global Step: 8770 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:42:26,001-Speed 5095.58 samples/sec Loss 9.8190 LearningRate 0.3954 Epoch: 2 Global Step: 8780 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:42:34,133-Speed 5037.69 samples/sec Loss 9.8505 LearningRate 0.3953 Epoch: 2 Global Step: 8790 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:42:42,284-Speed 5025.61 samples/sec Loss 9.8336 LearningRate 0.3952 Epoch: 2 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:42:50,665-Speed 4888.38 samples/sec Loss 9.7979 LearningRate 0.3951 Epoch: 2 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:42:58,869-Speed 4992.99 samples/sec Loss 9.7203 LearningRate 0.3949 Epoch: 2 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:43:07,126-Speed 4961.61 samples/sec Loss 9.8126 LearningRate 0.3948 Epoch: 2 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:43:15,220-Speed 5060.91 samples/sec Loss 9.8137 LearningRate 0.3947 Epoch: 2 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:43:23,308-Speed 5065.06 samples/sec Loss 9.9448 LearningRate 0.3946 Epoch: 2 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:43:31,434-Speed 5041.10 samples/sec Loss 9.9011 LearningRate 0.3945 Epoch: 2 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:43:39,569-Speed 5035.99 samples/sec Loss 9.7891 LearningRate 0.3944 Epoch: 2 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:43:47,723-Speed 5023.56 samples/sec Loss 9.8606 LearningRate 0.3943 Epoch: 2 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:43:56,016-Speed 4939.74 samples/sec Loss 9.8362 LearningRate 0.3942 Epoch: 2 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:44:04,123-Speed 5053.13 samples/sec Loss 9.8692 LearningRate 0.3941 Epoch: 2 Global Step: 8900 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:44:12,309-Speed 5004.92 samples/sec Loss 9.7901 LearningRate 0.3940 Epoch: 2 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:44:20,401-Speed 5062.13 samples/sec Loss 10.4584 LearningRate 0.3939 Epoch: 2 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:44:28,459-Speed 5083.85 samples/sec Loss 10.4817 LearningRate 0.3938 Epoch: 2 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:44:36,513-Speed 5086.27 samples/sec Loss 10.2627 LearningRate 0.3937 Epoch: 2 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:44:44,601-Speed 5064.95 samples/sec Loss 9.9072 LearningRate 0.3936 Epoch: 2 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:44:52,708-Speed 5052.95 samples/sec Loss 9.8149 LearningRate 0.3935 Epoch: 2 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:45:00,884-Speed 5010.59 samples/sec Loss 9.7774 LearningRate 0.3934 Epoch: 2 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:45:09,040-Speed 5022.73 samples/sec Loss 9.8204 LearningRate 0.3933 Epoch: 2 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:45:17,231-Speed 5001.04 samples/sec Loss 9.8169 LearningRate 0.3931 Epoch: 2 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:45:25,395-Speed 5018.08 samples/sec Loss 9.8287 LearningRate 0.3930 Epoch: 2 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:45:33,461-Speed 5078.45 samples/sec Loss 9.7416 LearningRate 0.3929 Epoch: 2 Global Step: 9010 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:45:41,573-Speed 5050.36 samples/sec Loss 9.7395 LearningRate 0.3928 Epoch: 2 Global Step: 9020 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:45:49,702-Speed 5039.26 samples/sec Loss 9.8145 LearningRate 0.3927 Epoch: 2 Global Step: 9030 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:45:57,881-Speed 5008.46 samples/sec Loss 9.7257 LearningRate 0.3926 Epoch: 2 Global Step: 9040 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:46:06,021-Speed 5032.42 samples/sec Loss 9.7930 LearningRate 0.3925 Epoch: 2 Global Step: 9050 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:46:14,118-Speed 5059.66 samples/sec Loss 9.8136 LearningRate 0.3924 Epoch: 2 Global Step: 9060 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:46:22,252-Speed 5036.27 samples/sec Loss 9.8011 LearningRate 0.3923 Epoch: 2 Global Step: 9070 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:46:30,049-Speed 5253.89 samples/sec Loss 9.7717 LearningRate 0.3922 Epoch: 2 Global Step: 9080 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:46:38,177-Speed 5039.86 samples/sec Loss 9.9278 LearningRate 0.3921 Epoch: 2 Global Step: 9090 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:46:46,308-Speed 5038.08 samples/sec Loss 9.7567 LearningRate 0.3920 Epoch: 2 Global Step: 9100 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:46:54,400-Speed 5062.68 samples/sec Loss 9.7050 LearningRate 0.3919 Epoch: 2 Global Step: 9110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:02,496-Speed 5060.33 samples/sec Loss 9.8165 LearningRate 0.3918 Epoch: 2 Global Step: 9120 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:10,598-Speed 5056.07 samples/sec Loss 9.6287 LearningRate 0.3917 Epoch: 2 Global Step: 9130 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:18,732-Speed 5036.72 samples/sec Loss 9.7019 LearningRate 0.3916 Epoch: 2 Global Step: 9140 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:26,951-Speed 4983.69 samples/sec Loss 9.6925 LearningRate 0.3915 Epoch: 2 Global Step: 9150 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:35,174-Speed 4981.71 samples/sec Loss 10.0150 LearningRate 0.3914 Epoch: 2 Global Step: 9160 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:43,355-Speed 5007.29 samples/sec Loss 9.8057 LearningRate 0.3912 Epoch: 2 Global Step: 9170 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:51,464-Speed 5051.73 samples/sec Loss 9.7580 LearningRate 0.3911 Epoch: 2 Global Step: 9180 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:47:59,534-Speed 5076.54 samples/sec Loss 9.7454 LearningRate 0.3910 Epoch: 2 Global Step: 9190 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:48:07,940-Speed 4873.40 samples/sec Loss 9.7642 LearningRate 0.3909 Epoch: 2 Global Step: 9200 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:48:16,167-Speed 4979.18 samples/sec Loss 9.7910 LearningRate 0.3908 Epoch: 2 Global Step: 9210 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-16 23:48:24,244-Speed 5072.01 samples/sec Loss 9.7993 LearningRate 0.3907 Epoch: 2 Global Step: 9220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:48:32,381-Speed 5034.49 samples/sec Loss 9.7810 LearningRate 0.3906 Epoch: 2 Global Step: 9230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:48:40,574-Speed 5000.17 samples/sec Loss 9.7566 LearningRate 0.3905 Epoch: 2 Global Step: 9240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:48:48,734-Speed 5020.12 samples/sec Loss 9.6595 LearningRate 0.3904 Epoch: 2 Global Step: 9250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:48:56,937-Speed 4994.10 samples/sec Loss 9.7172 LearningRate 0.3903 Epoch: 2 Global Step: 9260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:49:05,038-Speed 5056.78 samples/sec Loss 9.6965 LearningRate 0.3902 Epoch: 2 Global Step: 9270 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:49:13,161-Speed 5042.92 samples/sec Loss 9.7987 LearningRate 0.3901 Epoch: 2 Global Step: 9280 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:49:21,316-Speed 5023.47 samples/sec Loss 9.7055 LearningRate 0.3900 Epoch: 2 Global Step: 9290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:49:29,454-Speed 5033.86 samples/sec Loss 9.8106 LearningRate 0.3899 Epoch: 2 Global Step: 9300 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:49:37,560-Speed 5053.32 samples/sec Loss 9.7702 LearningRate 0.3898 Epoch: 2 Global Step: 9310 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:49:45,670-Speed 5051.77 samples/sec Loss 9.7623 LearningRate 0.3897 Epoch: 2 Global Step: 9320 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:49:53,829-Speed 5021.08 samples/sec Loss 9.7869 LearningRate 0.3896 Epoch: 2 Global Step: 9330 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:02,041-Speed 4988.58 samples/sec Loss 9.7169 LearningRate 0.3895 Epoch: 2 Global Step: 9340 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:10,251-Speed 4989.40 samples/sec Loss 9.7414 LearningRate 0.3894 Epoch: 2 Global Step: 9350 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:18,403-Speed 5025.07 samples/sec Loss 9.8141 LearningRate 0.3892 Epoch: 2 Global Step: 9360 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:26,734-Speed 4917.86 samples/sec Loss 9.7753 LearningRate 0.3891 Epoch: 2 Global Step: 9370 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:35,143-Speed 4871.45 samples/sec Loss 9.7899 LearningRate 0.3890 Epoch: 2 Global Step: 9380 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:43,464-Speed 4922.82 samples/sec Loss 9.6814 LearningRate 0.3889 Epoch: 2 Global Step: 9390 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:51,687-Speed 4981.75 samples/sec Loss 9.6462 LearningRate 0.3888 Epoch: 2 Global Step: 9400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:50:59,788-Speed 5057.21 samples/sec Loss 9.7345 LearningRate 0.3887 Epoch: 2 Global Step: 9410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:51:07,526-Speed 5294.20 samples/sec Loss 9.7052 LearningRate 0.3886 Epoch: 2 Global Step: 9420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:51:15,703-Speed 5010.42 samples/sec Loss 9.6251 LearningRate 0.3885 Epoch: 2 Global Step: 9430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:51:23,880-Speed 5009.75 samples/sec Loss 9.6201 LearningRate 0.3884 Epoch: 2 Global Step: 9440 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:51:32,139-Speed 4960.14 samples/sec Loss 9.7737 LearningRate 0.3883 Epoch: 2 Global Step: 9450 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:51:40,242-Speed 5055.42 samples/sec Loss 9.6441 LearningRate 0.3882 Epoch: 2 Global Step: 9460 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:51:48,384-Speed 5031.03 samples/sec Loss 9.6760 LearningRate 0.3881 Epoch: 2 Global Step: 9470 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-16 23:51:56,536-Speed 5025.20 samples/sec Loss 9.6520 LearningRate 0.3880 Epoch: 2 Global Step: 9480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:52:04,652-Speed 5047.98 samples/sec Loss 9.6142 LearningRate 0.3879 Epoch: 2 Global Step: 9490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-16 23:52:12,742-Speed 5063.73 samples/sec Loss 9.5928 LearningRate 0.3878 Epoch: 2 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:52:20,853-Speed 5050.81 samples/sec Loss 9.6450 LearningRate 0.3877 Epoch: 2 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:52:29,020-Speed 5016.19 samples/sec Loss 9.6490 LearningRate 0.3876 Epoch: 2 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:52:37,206-Speed 5004.14 samples/sec Loss 9.7151 LearningRate 0.3875 Epoch: 2 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:52:45,426-Speed 4983.88 samples/sec Loss 9.6123 LearningRate 0.3874 Epoch: 2 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:52:53,568-Speed 5031.08 samples/sec Loss 9.6441 LearningRate 0.3873 Epoch: 2 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:53:01,658-Speed 5064.06 samples/sec Loss 9.7161 LearningRate 0.3872 Epoch: 2 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:53:09,777-Speed 5045.58 samples/sec Loss 9.6947 LearningRate 0.3870 Epoch: 2 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:53:18,002-Speed 4980.11 samples/sec Loss 9.6013 LearningRate 0.3869 Epoch: 2 Global Step: 9580 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:53:26,160-Speed 5021.83 samples/sec Loss 9.6312 LearningRate 0.3868 Epoch: 2 Global Step: 9590 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:53:34,434-Speed 4950.78 samples/sec Loss 9.6416 LearningRate 0.3867 Epoch: 2 Global Step: 9600 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:53:42,638-Speed 4993.34 samples/sec Loss 9.5953 LearningRate 0.3866 Epoch: 2 Global Step: 9610 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:53:50,895-Speed 4961.28 samples/sec Loss 9.6505 LearningRate 0.3865 Epoch: 2 Global Step: 9620 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:53:59,012-Speed 5047.55 samples/sec Loss 9.6545 LearningRate 0.3864 Epoch: 2 Global Step: 9630 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:54:07,125-Speed 5048.89 samples/sec Loss 9.7296 LearningRate 0.3863 Epoch: 2 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:54:15,234-Speed 5051.93 samples/sec Loss 9.5644 LearningRate 0.3862 Epoch: 2 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:54:23,551-Speed 4925.38 samples/sec Loss 9.6304 LearningRate 0.3861 Epoch: 2 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:54:31,679-Speed 5040.40 samples/sec Loss 9.6765 LearningRate 0.3860 Epoch: 2 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:54:39,827-Speed 5027.58 samples/sec Loss 9.5983 LearningRate 0.3859 Epoch: 2 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:54:48,000-Speed 5012.22 samples/sec Loss 9.5709 LearningRate 0.3858 Epoch: 2 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:54:56,057-Speed 5084.60 samples/sec Loss 9.6188 LearningRate 0.3857 Epoch: 2 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:55:04,205-Speed 5027.88 samples/sec Loss 9.6044 LearningRate 0.3856 Epoch: 2 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:55:12,319-Speed 5048.92 samples/sec Loss 9.6062 LearningRate 0.3855 Epoch: 2 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:55:20,490-Speed 5013.31 samples/sec Loss 9.5698 LearningRate 0.3854 Epoch: 2 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-16 23:55:28,802-Speed 4928.85 samples/sec Loss 9.5885 LearningRate 0.3853 Epoch: 2 Global Step: 9740 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:55:37,001-Speed 4996.24 samples/sec Loss 9.6147 LearningRate 0.3852 Epoch: 2 Global Step: 9750 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:55:45,167-Speed 5016.75 samples/sec Loss 9.5754 LearningRate 0.3851 Epoch: 2 Global Step: 9760 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:55:53,293-Speed 5041.46 samples/sec Loss 9.5754 LearningRate 0.3850 Epoch: 2 Global Step: 9770 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:01,457-Speed 5017.20 samples/sec Loss 9.5179 LearningRate 0.3848 Epoch: 2 Global Step: 9780 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:09,725-Speed 4954.82 samples/sec Loss 9.4864 LearningRate 0.3847 Epoch: 2 Global Step: 9790 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:17,895-Speed 5014.23 samples/sec Loss 9.6146 LearningRate 0.3846 Epoch: 2 Global Step: 9800 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:26,091-Speed 4998.11 samples/sec Loss 9.5912 LearningRate 0.3845 Epoch: 2 Global Step: 9810 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:34,414-Speed 4922.23 samples/sec Loss 9.6465 LearningRate 0.3844 Epoch: 2 Global Step: 9820 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:42,720-Speed 4931.70 samples/sec Loss 9.6109 LearningRate 0.3843 Epoch: 2 Global Step: 9830 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:50,948-Speed 4979.01 samples/sec Loss 9.5069 LearningRate 0.3842 Epoch: 2 Global Step: 9840 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:56:59,108-Speed 5019.96 samples/sec Loss 9.5349 LearningRate 0.3841 Epoch: 2 Global Step: 9850 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:57:07,458-Speed 4905.79 samples/sec Loss 9.5622 LearningRate 0.3840 Epoch: 2 Global Step: 9860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:57:15,650-Speed 5000.95 samples/sec Loss 9.5289 LearningRate 0.3839 Epoch: 2 Global Step: 9870 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:57:23,753-Speed 5055.21 samples/sec Loss 9.5622 LearningRate 0.3838 Epoch: 2 Global Step: 9880 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:57:31,925-Speed 5013.14 samples/sec Loss 9.4817 LearningRate 0.3837 Epoch: 2 Global Step: 9890 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:57:40,171-Speed 4968.15 samples/sec Loss 9.3856 LearningRate 0.3836 Epoch: 2 Global Step: 9900 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-16 23:57:48,322-Speed 5025.02 samples/sec Loss 9.5610 LearningRate 0.3835 Epoch: 2 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:57:56,437-Speed 5048.71 samples/sec Loss 9.4139 LearningRate 0.3834 Epoch: 2 Global Step: 9920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:58:04,696-Speed 4959.83 samples/sec Loss 9.5136 LearningRate 0.3833 Epoch: 2 Global Step: 9930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:58:12,901-Speed 4993.46 samples/sec Loss 9.6454 LearningRate 0.3832 Epoch: 2 Global Step: 9940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:58:21,057-Speed 5021.94 samples/sec Loss 9.5839 LearningRate 0.3831 Epoch: 2 Global Step: 9950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:58:29,506-Speed 4848.33 samples/sec Loss 9.5685 LearningRate 0.3830 Epoch: 2 Global Step: 9960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:58:37,759-Speed 4964.32 samples/sec Loss 9.5584 LearningRate 0.3829 Epoch: 2 Global Step: 9970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:58:45,873-Speed 5048.37 samples/sec Loss 9.5737 LearningRate 0.3828 Epoch: 2 Global Step: 9980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:58:54,091-Speed 4984.86 samples/sec Loss 9.4237 LearningRate 0.3827 Epoch: 2 Global Step: 9990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:59:02,135-Speed 5092.72 samples/sec Loss 9.5065 LearningRate 0.3826 Epoch: 2 Global Step: 10000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-16 23:59:48,884-[lfw][10000]XNorm: 20.740598 Training: 2022-01-16 23:59:48,884-[lfw][10000]Accuracy-Flip: 0.99750+-0.00271 Training: 2022-01-16 23:59:48,885-[lfw][10000]Accuracy-Highest: 0.99750 Training: 2022-01-17 00:00:43,912-[cfp_fp][10000]XNorm: 18.147787 Training: 2022-01-17 00:00:43,913-[cfp_fp][10000]Accuracy-Flip: 0.95943+-0.01092 Training: 2022-01-17 00:00:43,913-[cfp_fp][10000]Accuracy-Highest: 0.95943 Training: 2022-01-17 00:01:31,217-[agedb_30][10000]XNorm: 20.103572 Training: 2022-01-17 00:01:31,218-[agedb_30][10000]Accuracy-Flip: 0.96283+-0.00986 Training: 2022-01-17 00:01:31,218-[agedb_30][10000]Accuracy-Highest: 0.96283 Training: 2022-01-17 00:01:39,259-Speed 260.69 samples/sec Loss 9.4067 LearningRate 0.3824 Epoch: 2 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:01:47,372-Speed 5049.01 samples/sec Loss 9.5706 LearningRate 0.3823 Epoch: 2 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:01:55,412-Speed 5095.43 samples/sec Loss 9.4542 LearningRate 0.3822 Epoch: 2 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:02:03,520-Speed 5052.72 samples/sec Loss 9.4118 LearningRate 0.3821 Epoch: 2 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:02:11,691-Speed 5013.43 samples/sec Loss 9.3801 LearningRate 0.3820 Epoch: 2 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:02:19,703-Speed 5112.95 samples/sec Loss 9.5339 LearningRate 0.3819 Epoch: 2 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:02:27,793-Speed 5064.10 samples/sec Loss 9.4696 LearningRate 0.3818 Epoch: 2 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:02:35,907-Speed 5048.82 samples/sec Loss 9.4384 LearningRate 0.3817 Epoch: 2 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:02:43,968-Speed 5081.50 samples/sec Loss 9.5262 LearningRate 0.3816 Epoch: 2 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:02:52,071-Speed 5055.92 samples/sec Loss 9.5301 LearningRate 0.3815 Epoch: 2 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:03:00,161-Speed 5063.94 samples/sec Loss 9.4244 LearningRate 0.3814 Epoch: 2 Global Step: 10110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:03:08,285-Speed 5042.70 samples/sec Loss 9.4893 LearningRate 0.3813 Epoch: 2 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:03:16,441-Speed 5022.52 samples/sec Loss 9.7338 LearningRate 0.3812 Epoch: 2 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:03:24,556-Speed 5047.71 samples/sec Loss 9.5887 LearningRate 0.3811 Epoch: 2 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:03:32,640-Speed 5067.64 samples/sec Loss 9.4320 LearningRate 0.3810 Epoch: 2 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:03:40,771-Speed 5038.77 samples/sec Loss 9.5262 LearningRate 0.3809 Epoch: 2 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:03:48,851-Speed 5069.56 samples/sec Loss 9.4350 LearningRate 0.3808 Epoch: 2 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:03:56,955-Speed 5054.79 samples/sec Loss 9.5191 LearningRate 0.3807 Epoch: 2 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:04:05,195-Speed 4971.73 samples/sec Loss 9.4536 LearningRate 0.3806 Epoch: 2 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:04:13,325-Speed 5039.19 samples/sec Loss 9.3261 LearningRate 0.3805 Epoch: 2 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:04:21,434-Speed 5051.94 samples/sec Loss 9.5253 LearningRate 0.3804 Epoch: 2 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:04:29,695-Speed 4958.31 samples/sec Loss 9.4041 LearningRate 0.3803 Epoch: 2 Global Step: 10220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:04:37,802-Speed 5053.42 samples/sec Loss 9.4174 LearningRate 0.3802 Epoch: 2 Global Step: 10230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:04:45,880-Speed 5071.18 samples/sec Loss 9.4710 LearningRate 0.3801 Epoch: 2 Global Step: 10240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:04:53,908-Speed 5102.78 samples/sec Loss 9.3877 LearningRate 0.3800 Epoch: 2 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:02,181-Speed 4951.87 samples/sec Loss 9.3891 LearningRate 0.3798 Epoch: 2 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:10,251-Speed 5076.74 samples/sec Loss 9.4956 LearningRate 0.3797 Epoch: 2 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:18,423-Speed 5012.79 samples/sec Loss 9.4039 LearningRate 0.3796 Epoch: 2 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:26,654-Speed 4977.18 samples/sec Loss 9.4117 LearningRate 0.3795 Epoch: 2 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:34,899-Speed 4968.96 samples/sec Loss 9.5569 LearningRate 0.3794 Epoch: 2 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:43,122-Speed 4981.60 samples/sec Loss 9.3990 LearningRate 0.3793 Epoch: 2 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:51,299-Speed 5009.30 samples/sec Loss 9.4608 LearningRate 0.3792 Epoch: 2 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:05:59,527-Speed 4979.15 samples/sec Loss 9.4769 LearningRate 0.3791 Epoch: 2 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:06:07,931-Speed 4874.10 samples/sec Loss 9.4926 LearningRate 0.3790 Epoch: 2 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-17 00:06:16,105-Speed 5012.07 samples/sec Loss 9.6106 LearningRate 0.3789 Epoch: 2 Global Step: 10350 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:06:24,232-Speed 5040.47 samples/sec Loss 9.4026 LearningRate 0.3788 Epoch: 2 Global Step: 10360 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:06:32,376-Speed 5030.33 samples/sec Loss 9.3757 LearningRate 0.3787 Epoch: 2 Global Step: 10370 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:06:40,609-Speed 4975.83 samples/sec Loss 9.4522 LearningRate 0.3786 Epoch: 2 Global Step: 10380 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:06:48,770-Speed 5019.53 samples/sec Loss 9.3368 LearningRate 0.3785 Epoch: 2 Global Step: 10390 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:06:56,943-Speed 5012.03 samples/sec Loss 9.3706 LearningRate 0.3784 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:07:05,197-Speed 4962.80 samples/sec Loss 9.3637 LearningRate 0.3783 Epoch: 2 Global Step: 10410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:07:13,538-Speed 4911.42 samples/sec Loss 9.4158 LearningRate 0.3782 Epoch: 2 Global Step: 10420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:07:21,742-Speed 4993.73 samples/sec Loss 9.4383 LearningRate 0.3781 Epoch: 2 Global Step: 10430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:07:29,849-Speed 5052.89 samples/sec Loss 9.3795 LearningRate 0.3780 Epoch: 2 Global Step: 10440 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-17 00:07:37,965-Speed 5047.49 samples/sec Loss 9.2379 LearningRate 0.3779 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:07:46,122-Speed 5022.40 samples/sec Loss 9.3307 LearningRate 0.3778 Epoch: 2 Global Step: 10460 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:07:54,271-Speed 5027.20 samples/sec Loss 9.4307 LearningRate 0.3777 Epoch: 2 Global Step: 10470 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:02,419-Speed 5027.57 samples/sec Loss 9.3355 LearningRate 0.3776 Epoch: 2 Global Step: 10480 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:10,592-Speed 5012.44 samples/sec Loss 9.4253 LearningRate 0.3775 Epoch: 2 Global Step: 10490 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:18,728-Speed 5035.21 samples/sec Loss 9.4135 LearningRate 0.3774 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:27,000-Speed 4951.91 samples/sec Loss 9.4350 LearningRate 0.3773 Epoch: 2 Global Step: 10510 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:35,112-Speed 5050.22 samples/sec Loss 9.3969 LearningRate 0.3772 Epoch: 2 Global Step: 10520 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:43,272-Speed 5019.83 samples/sec Loss 9.3522 LearningRate 0.3771 Epoch: 2 Global Step: 10530 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:51,389-Speed 5047.22 samples/sec Loss 9.3684 LearningRate 0.3769 Epoch: 2 Global Step: 10540 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:08:59,512-Speed 5043.18 samples/sec Loss 9.3971 LearningRate 0.3768 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:09:07,664-Speed 5025.09 samples/sec Loss 9.4513 LearningRate 0.3767 Epoch: 2 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:09:15,743-Speed 5070.87 samples/sec Loss 9.3397 LearningRate 0.3766 Epoch: 2 Global Step: 10570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:09:23,879-Speed 5035.35 samples/sec Loss 9.3313 LearningRate 0.3765 Epoch: 2 Global Step: 10580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:09:31,965-Speed 5066.10 samples/sec Loss 9.2938 LearningRate 0.3764 Epoch: 2 Global Step: 10590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:09:40,139-Speed 5011.63 samples/sec Loss 9.3538 LearningRate 0.3763 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:09:48,261-Speed 5042.97 samples/sec Loss 9.3424 LearningRate 0.3762 Epoch: 2 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:09:56,364-Speed 5056.28 samples/sec Loss 9.2489 LearningRate 0.3761 Epoch: 2 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:10:04,605-Speed 4970.60 samples/sec Loss 9.3448 LearningRate 0.3760 Epoch: 2 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:10:12,710-Speed 5053.95 samples/sec Loss 9.2744 LearningRate 0.3759 Epoch: 2 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:10:20,800-Speed 5064.30 samples/sec Loss 9.2380 LearningRate 0.3758 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:10:28,950-Speed 5026.39 samples/sec Loss 9.3659 LearningRate 0.3757 Epoch: 2 Global Step: 10660 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:10:37,267-Speed 4925.29 samples/sec Loss 9.2267 LearningRate 0.3756 Epoch: 2 Global Step: 10670 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:10:45,468-Speed 4995.40 samples/sec Loss 9.2878 LearningRate 0.3755 Epoch: 2 Global Step: 10680 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:10:53,724-Speed 4962.13 samples/sec Loss 9.2843 LearningRate 0.3754 Epoch: 2 Global Step: 10690 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:11:01,963-Speed 4972.33 samples/sec Loss 9.2140 LearningRate 0.3753 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:11:10,103-Speed 5032.21 samples/sec Loss 9.3217 LearningRate 0.3752 Epoch: 2 Global Step: 10710 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:11:18,204-Speed 5056.18 samples/sec Loss 9.2598 LearningRate 0.3751 Epoch: 2 Global Step: 10720 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:11:26,378-Speed 5012.41 samples/sec Loss 9.2880 LearningRate 0.3750 Epoch: 2 Global Step: 10730 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:11:34,760-Speed 4887.11 samples/sec Loss 9.2633 LearningRate 0.3749 Epoch: 2 Global Step: 10740 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:11:42,897-Speed 5034.42 samples/sec Loss 9.3283 LearningRate 0.3748 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:11:51,055-Speed 5021.45 samples/sec Loss 9.3445 LearningRate 0.3747 Epoch: 2 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:11:59,305-Speed 4965.63 samples/sec Loss 9.2382 LearningRate 0.3746 Epoch: 2 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:12:07,436-Speed 5038.15 samples/sec Loss 9.2267 LearningRate 0.3745 Epoch: 2 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:12:15,584-Speed 5027.84 samples/sec Loss 9.1626 LearningRate 0.3744 Epoch: 2 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:12:23,706-Speed 5043.37 samples/sec Loss 9.2615 LearningRate 0.3743 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:12:31,811-Speed 5054.75 samples/sec Loss 9.2537 LearningRate 0.3742 Epoch: 2 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:12:39,917-Speed 5053.44 samples/sec Loss 9.2479 LearningRate 0.3741 Epoch: 2 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:12:48,106-Speed 5003.08 samples/sec Loss 9.3422 LearningRate 0.3740 Epoch: 2 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:12:56,296-Speed 5001.58 samples/sec Loss 9.3530 LearningRate 0.3739 Epoch: 2 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:13:04,569-Speed 4951.30 samples/sec Loss 9.2636 LearningRate 0.3737 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:13:12,747-Speed 5009.85 samples/sec Loss 9.2273 LearningRate 0.3736 Epoch: 2 Global Step: 10860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:13:20,941-Speed 4999.26 samples/sec Loss 9.2376 LearningRate 0.3735 Epoch: 2 Global Step: 10870 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:13:29,132-Speed 5001.03 samples/sec Loss 9.2642 LearningRate 0.3734 Epoch: 2 Global Step: 10880 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:13:37,359-Speed 4979.70 samples/sec Loss 9.2084 LearningRate 0.3733 Epoch: 2 Global Step: 10890 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:13:45,710-Speed 4905.02 samples/sec Loss 9.1832 LearningRate 0.3732 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:13:53,839-Speed 5039.73 samples/sec Loss 9.3230 LearningRate 0.3731 Epoch: 2 Global Step: 10910 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:02,309-Speed 4836.84 samples/sec Loss 9.3241 LearningRate 0.3730 Epoch: 2 Global Step: 10920 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:10,435-Speed 5041.25 samples/sec Loss 9.2409 LearningRate 0.3729 Epoch: 2 Global Step: 10930 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:18,544-Speed 5051.45 samples/sec Loss 9.3126 LearningRate 0.3728 Epoch: 2 Global Step: 10940 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:26,616-Speed 5075.19 samples/sec Loss 9.2774 LearningRate 0.3727 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:34,726-Speed 5051.18 samples/sec Loss 9.4204 LearningRate 0.3726 Epoch: 2 Global Step: 10960 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:42,823-Speed 5059.31 samples/sec Loss 9.2030 LearningRate 0.3725 Epoch: 2 Global Step: 10970 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:50,562-Speed 5293.47 samples/sec Loss 9.2226 LearningRate 0.3724 Epoch: 2 Global Step: 10980 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:14:58,702-Speed 5032.26 samples/sec Loss 9.2339 LearningRate 0.3723 Epoch: 2 Global Step: 10990 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:15:06,860-Speed 5021.69 samples/sec Loss 9.1792 LearningRate 0.3722 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:15:15,046-Speed 5004.41 samples/sec Loss 9.2258 LearningRate 0.3721 Epoch: 2 Global Step: 11010 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:15:23,219-Speed 5012.19 samples/sec Loss 9.1782 LearningRate 0.3720 Epoch: 2 Global Step: 11020 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:15:31,311-Speed 5062.32 samples/sec Loss 9.2078 LearningRate 0.3719 Epoch: 2 Global Step: 11030 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:15:39,474-Speed 5018.78 samples/sec Loss 9.2321 LearningRate 0.3718 Epoch: 2 Global Step: 11040 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:15:47,548-Speed 5073.35 samples/sec Loss 9.1892 LearningRate 0.3717 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:15:55,674-Speed 5041.60 samples/sec Loss 9.2365 LearningRate 0.3716 Epoch: 2 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:16:03,799-Speed 5041.62 samples/sec Loss 9.2844 LearningRate 0.3715 Epoch: 2 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:16:11,940-Speed 5032.35 samples/sec Loss 9.2969 LearningRate 0.3714 Epoch: 2 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:16:20,089-Speed 5026.93 samples/sec Loss 9.2600 LearningRate 0.3713 Epoch: 2 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:16:28,286-Speed 4997.59 samples/sec Loss 9.2079 LearningRate 0.3712 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:16:36,467-Speed 5007.74 samples/sec Loss 9.1906 LearningRate 0.3711 Epoch: 2 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:16:44,683-Speed 4985.69 samples/sec Loss 9.1663 LearningRate 0.3710 Epoch: 2 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:16:52,824-Speed 5032.17 samples/sec Loss 9.1783 LearningRate 0.3709 Epoch: 2 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:17:00,963-Speed 5033.28 samples/sec Loss 9.1909 LearningRate 0.3708 Epoch: 2 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:17:09,142-Speed 5008.21 samples/sec Loss 9.1590 LearningRate 0.3707 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:17:17,364-Speed 4983.19 samples/sec Loss 9.1766 LearningRate 0.3706 Epoch: 2 Global Step: 11160 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:17:25,540-Speed 5010.07 samples/sec Loss 9.2075 LearningRate 0.3705 Epoch: 2 Global Step: 11170 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:17:33,690-Speed 5026.30 samples/sec Loss 9.1605 LearningRate 0.3704 Epoch: 2 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:17:41,803-Speed 5049.59 samples/sec Loss 9.1020 LearningRate 0.3703 Epoch: 2 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:17:50,024-Speed 4982.64 samples/sec Loss 9.0257 LearningRate 0.3702 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:17:58,159-Speed 5035.71 samples/sec Loss 9.2051 LearningRate 0.3701 Epoch: 2 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:18:06,306-Speed 5028.57 samples/sec Loss 9.1479 LearningRate 0.3699 Epoch: 2 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:18:14,443-Speed 5034.53 samples/sec Loss 9.0769 LearningRate 0.3698 Epoch: 2 Global Step: 11230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:18:22,682-Speed 4972.09 samples/sec Loss 9.9356 LearningRate 0.3697 Epoch: 2 Global Step: 11240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:18:30,872-Speed 5002.09 samples/sec Loss 9.7256 LearningRate 0.3696 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:18:39,140-Speed 4954.68 samples/sec Loss 9.6398 LearningRate 0.3695 Epoch: 2 Global Step: 11260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:18:47,307-Speed 5015.95 samples/sec Loss 10.2462 LearningRate 0.3694 Epoch: 2 Global Step: 11270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:18:55,440-Speed 5036.47 samples/sec Loss 10.8485 LearningRate 0.3693 Epoch: 2 Global Step: 11280 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:19:03,787-Speed 4908.17 samples/sec Loss 9.9949 LearningRate 0.3692 Epoch: 2 Global Step: 11290 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:19:12,039-Speed 4964.01 samples/sec Loss 9.6109 LearningRate 0.3691 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:19:20,259-Speed 4983.41 samples/sec Loss 9.3828 LearningRate 0.3690 Epoch: 2 Global Step: 11310 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:19:28,484-Speed 4980.88 samples/sec Loss 9.2549 LearningRate 0.3689 Epoch: 2 Global Step: 11320 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:19:36,681-Speed 4997.58 samples/sec Loss 9.1888 LearningRate 0.3688 Epoch: 2 Global Step: 11330 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:19:44,937-Speed 4961.63 samples/sec Loss 9.1823 LearningRate 0.3687 Epoch: 2 Global Step: 11340 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:19:53,185-Speed 4967.11 samples/sec Loss 9.2388 LearningRate 0.3686 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:20:01,451-Speed 4955.36 samples/sec Loss 9.1239 LearningRate 0.3685 Epoch: 2 Global Step: 11360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:20:09,687-Speed 4973.80 samples/sec Loss 9.1548 LearningRate 0.3684 Epoch: 2 Global Step: 11370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-01-17 00:20:17,785-Speed 5059.19 samples/sec Loss 9.0963 LearningRate 0.3683 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:20:25,894-Speed 5051.64 samples/sec Loss 9.1281 LearningRate 0.3682 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:20:34,078-Speed 5005.69 samples/sec Loss 9.0678 LearningRate 0.3681 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:20:42,254-Speed 5010.04 samples/sec Loss 9.0897 LearningRate 0.3680 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:20:50,492-Speed 4972.86 samples/sec Loss 9.0864 LearningRate 0.3679 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:20:58,769-Speed 4948.95 samples/sec Loss 9.1022 LearningRate 0.3678 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:21:07,427-Speed 4731.95 samples/sec Loss 9.0395 LearningRate 0.3677 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:21:15,630-Speed 4993.88 samples/sec Loss 9.0341 LearningRate 0.3676 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:21:23,812-Speed 5006.88 samples/sec Loss 9.0599 LearningRate 0.3675 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:21:32,015-Speed 4993.92 samples/sec Loss 9.0801 LearningRate 0.3674 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-01-17 00:21:40,480-Speed 4839.34 samples/sec Loss 9.0612 LearningRate 0.3673 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:21:48,951-Speed 4835.55 samples/sec Loss 9.1947 LearningRate 0.3672 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:21:57,156-Speed 4993.12 samples/sec Loss 9.1688 LearningRate 0.3671 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:22:05,341-Speed 5004.52 samples/sec Loss 9.0287 LearningRate 0.3670 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:22:13,447-Speed 5053.99 samples/sec Loss 9.0629 LearningRate 0.3669 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:22:21,661-Speed 4987.53 samples/sec Loss 9.0679 LearningRate 0.3668 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:22:29,829-Speed 5015.59 samples/sec Loss 9.0016 LearningRate 0.3667 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:22:37,980-Speed 5025.18 samples/sec Loss 9.0394 LearningRate 0.3666 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:22:46,112-Speed 5038.01 samples/sec Loss 9.1017 LearningRate 0.3665 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:22:54,436-Speed 4920.97 samples/sec Loss 9.0572 LearningRate 0.3664 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-17 00:23:02,788-Speed 4905.22 samples/sec Loss 9.0436 LearningRate 0.3663 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:23:10,941-Speed 5024.26 samples/sec Loss 8.9881 LearningRate 0.3662 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:23:19,272-Speed 4917.15 samples/sec Loss 9.0680 LearningRate 0.3661 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:23:27,612-Speed 4912.38 samples/sec Loss 9.0078 LearningRate 0.3660 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:23:35,805-Speed 5000.05 samples/sec Loss 8.9897 LearningRate 0.3659 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:23:44,175-Speed 4893.88 samples/sec Loss 8.9864 LearningRate 0.3658 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:23:52,470-Speed 4938.57 samples/sec Loss 9.1255 LearningRate 0.3657 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:24:00,784-Speed 4927.35 samples/sec Loss 9.0017 LearningRate 0.3656 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:24:09,235-Speed 4847.50 samples/sec Loss 8.9774 LearningRate 0.3655 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:24:17,393-Speed 5021.63 samples/sec Loss 9.0555 LearningRate 0.3654 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:24:25,573-Speed 5007.90 samples/sec Loss 9.0293 LearningRate 0.3653 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:24:33,723-Speed 5026.39 samples/sec Loss 9.0180 LearningRate 0.3651 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:24:41,962-Speed 4971.98 samples/sec Loss 9.1182 LearningRate 0.3650 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:24:50,116-Speed 5024.34 samples/sec Loss 9.0735 LearningRate 0.3649 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:24:58,279-Speed 5018.20 samples/sec Loss 9.0117 LearningRate 0.3648 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:25:06,489-Speed 4990.17 samples/sec Loss 9.0598 LearningRate 0.3647 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:25:14,656-Speed 5015.85 samples/sec Loss 9.0332 LearningRate 0.3646 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:25:22,821-Speed 5016.56 samples/sec Loss 9.0291 LearningRate 0.3645 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:25:31,081-Speed 4959.70 samples/sec Loss 8.9966 LearningRate 0.3644 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:25:39,324-Speed 4969.87 samples/sec Loss 8.9650 LearningRate 0.3643 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:25:47,623-Speed 4936.35 samples/sec Loss 8.9987 LearningRate 0.3642 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:25:56,072-Speed 4848.53 samples/sec Loss 8.9557 LearningRate 0.3641 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:26:04,352-Speed 4946.94 samples/sec Loss 8.9609 LearningRate 0.3640 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:26:12,532-Speed 5007.93 samples/sec Loss 9.0841 LearningRate 0.3639 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:26:20,833-Speed 4935.31 samples/sec Loss 9.0694 LearningRate 0.3638 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:26:28,987-Speed 5024.17 samples/sec Loss 9.0493 LearningRate 0.3637 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:26:37,730-Speed 4685.40 samples/sec Loss 9.1190 LearningRate 0.3636 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:26:46,659-Speed 4587.73 samples/sec Loss 9.0732 LearningRate 0.3635 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:26:55,430-Speed 4670.82 samples/sec Loss 9.0452 LearningRate 0.3634 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:27:04,254-Speed 4642.77 samples/sec Loss 9.0563 LearningRate 0.3633 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:27:12,735-Speed 4830.04 samples/sec Loss 9.0136 LearningRate 0.3632 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:27:21,013-Speed 4948.83 samples/sec Loss 8.9182 LearningRate 0.3631 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:27:29,144-Speed 5038.05 samples/sec Loss 9.0321 LearningRate 0.3630 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:27:37,373-Speed 4978.43 samples/sec Loss 8.9768 LearningRate 0.3629 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:27:45,715-Speed 4911.02 samples/sec Loss 9.0225 LearningRate 0.3628 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:27:53,920-Speed 4992.45 samples/sec Loss 9.0354 LearningRate 0.3627 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:28:02,085-Speed 5016.64 samples/sec Loss 9.0247 LearningRate 0.3626 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:28:10,253-Speed 5015.82 samples/sec Loss 9.0849 LearningRate 0.3625 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:28:18,380-Speed 5040.36 samples/sec Loss 9.0092 LearningRate 0.3624 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:28:26,625-Speed 4968.47 samples/sec Loss 9.0431 LearningRate 0.3623 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:28:34,723-Speed 5059.33 samples/sec Loss 8.9965 LearningRate 0.3622 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:28:42,949-Speed 4979.84 samples/sec Loss 8.9301 LearningRate 0.3621 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:28:51,045-Speed 5059.73 samples/sec Loss 8.8427 LearningRate 0.3620 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:28:59,245-Speed 4996.34 samples/sec Loss 8.9634 LearningRate 0.3619 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:29:07,381-Speed 5035.20 samples/sec Loss 8.8641 LearningRate 0.3618 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:29:15,651-Speed 4953.06 samples/sec Loss 8.9417 LearningRate 0.3617 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:29:23,806-Speed 5023.48 samples/sec Loss 8.9843 LearningRate 0.3616 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:29:31,945-Speed 5032.91 samples/sec Loss 8.9370 LearningRate 0.3615 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:29:40,196-Speed 4965.07 samples/sec Loss 8.9077 LearningRate 0.3614 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:29:48,376-Speed 5008.38 samples/sec Loss 9.0527 LearningRate 0.3613 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:29:56,523-Speed 5027.61 samples/sec Loss 8.9807 LearningRate 0.3612 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:30:04,765-Speed 4970.76 samples/sec Loss 8.8769 LearningRate 0.3611 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:30:12,986-Speed 4982.52 samples/sec Loss 8.9405 LearningRate 0.3610 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:30:21,162-Speed 5010.66 samples/sec Loss 8.8403 LearningRate 0.3609 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:30:29,377-Speed 4986.83 samples/sec Loss 8.9076 LearningRate 0.3608 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:30:37,652-Speed 4950.46 samples/sec Loss 8.9074 LearningRate 0.3607 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:30:45,797-Speed 5029.17 samples/sec Loss 8.9021 LearningRate 0.3606 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:30:53,930-Speed 5037.50 samples/sec Loss 8.8537 LearningRate 0.3605 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:02,047-Speed 5046.21 samples/sec Loss 9.0325 LearningRate 0.3604 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:10,295-Speed 4966.89 samples/sec Loss 8.9588 LearningRate 0.3603 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:18,562-Speed 4955.14 samples/sec Loss 8.9277 LearningRate 0.3602 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:26,840-Speed 4948.58 samples/sec Loss 8.8633 LearningRate 0.3601 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:35,092-Speed 4964.21 samples/sec Loss 8.9124 LearningRate 0.3600 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:43,013-Speed 5171.79 samples/sec Loss 8.9172 LearningRate 0.3599 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:51,034-Speed 5107.77 samples/sec Loss 8.9010 LearningRate 0.3598 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:31:59,186-Speed 5025.33 samples/sec Loss 8.9039 LearningRate 0.3597 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:32:07,424-Speed 4972.53 samples/sec Loss 8.8383 LearningRate 0.3596 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:32:15,680-Speed 4962.31 samples/sec Loss 8.9427 LearningRate 0.3595 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:32:23,832-Speed 5024.70 samples/sec Loss 8.8613 LearningRate 0.3594 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:32:32,012-Speed 5008.02 samples/sec Loss 8.8574 LearningRate 0.3593 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:32:40,165-Speed 5024.74 samples/sec Loss 8.9057 LearningRate 0.3592 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:32:48,297-Speed 5037.72 samples/sec Loss 8.9103 LearningRate 0.3591 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:32:56,498-Speed 4995.25 samples/sec Loss 8.9111 LearningRate 0.3590 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:33:04,645-Speed 5028.41 samples/sec Loss 8.8720 LearningRate 0.3589 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:33:12,770-Speed 5041.58 samples/sec Loss 8.9006 LearningRate 0.3588 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:33:21,020-Speed 4965.55 samples/sec Loss 8.8045 LearningRate 0.3587 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:33:29,272-Speed 4964.24 samples/sec Loss 8.8687 LearningRate 0.3586 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:33:37,445-Speed 5012.36 samples/sec Loss 8.9206 LearningRate 0.3585 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:33:45,762-Speed 4925.94 samples/sec Loss 8.9240 LearningRate 0.3584 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:33:53,847-Speed 5067.06 samples/sec Loss 8.9096 LearningRate 0.3583 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:01,966-Speed 5045.47 samples/sec Loss 8.9142 LearningRate 0.3582 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:10,128-Speed 5018.92 samples/sec Loss 8.8763 LearningRate 0.3581 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:18,294-Speed 5016.43 samples/sec Loss 8.8521 LearningRate 0.3580 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:26,556-Speed 4958.76 samples/sec Loss 8.8500 LearningRate 0.3579 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:34,843-Speed 4942.98 samples/sec Loss 8.8698 LearningRate 0.3578 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:43,232-Speed 4883.40 samples/sec Loss 8.9059 LearningRate 0.3577 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:51,440-Speed 4991.27 samples/sec Loss 8.9383 LearningRate 0.3576 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:34:59,564-Speed 5042.33 samples/sec Loss 8.9005 LearningRate 0.3575 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:35:07,741-Speed 5009.85 samples/sec Loss 8.8397 LearningRate 0.3574 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:35:15,896-Speed 5023.24 samples/sec Loss 8.8394 LearningRate 0.3573 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:35:23,978-Speed 5068.95 samples/sec Loss 8.8348 LearningRate 0.3572 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:35:32,164-Speed 5004.43 samples/sec Loss 8.9522 LearningRate 0.3571 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:35:40,292-Speed 5039.72 samples/sec Loss 8.9137 LearningRate 0.3570 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:35:48,714-Speed 4864.09 samples/sec Loss 8.9542 LearningRate 0.3569 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:36:23,672-Speed 1171.76 samples/sec Loss 8.5697 LearningRate 0.3567 Epoch: 3 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:36:31,887-Speed 4986.96 samples/sec Loss 8.2461 LearningRate 0.3566 Epoch: 3 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:36:40,162-Speed 4950.65 samples/sec Loss 8.2814 LearningRate 0.3565 Epoch: 3 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:36:48,588-Speed 4861.49 samples/sec Loss 8.2548 LearningRate 0.3564 Epoch: 3 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:36:56,914-Speed 4920.20 samples/sec Loss 8.3143 LearningRate 0.3563 Epoch: 3 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:37:05,117-Speed 4994.47 samples/sec Loss 8.2864 LearningRate 0.3562 Epoch: 3 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:37:13,271-Speed 5024.55 samples/sec Loss 8.3272 LearningRate 0.3561 Epoch: 3 Global Step: 12580 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:37:21,491-Speed 4983.26 samples/sec Loss 8.4879 LearningRate 0.3560 Epoch: 3 Global Step: 12590 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:37:29,678-Speed 5003.73 samples/sec Loss 8.4093 LearningRate 0.3559 Epoch: 3 Global Step: 12600 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:37:37,924-Speed 4967.68 samples/sec Loss 8.4362 LearningRate 0.3558 Epoch: 3 Global Step: 12610 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:37:46,431-Speed 4815.40 samples/sec Loss 8.3449 LearningRate 0.3557 Epoch: 3 Global Step: 12620 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:37:54,945-Speed 4811.76 samples/sec Loss 8.5374 LearningRate 0.3556 Epoch: 3 Global Step: 12630 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:38:03,264-Speed 4924.27 samples/sec Loss 8.5217 LearningRate 0.3555 Epoch: 3 Global Step: 12640 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:38:11,439-Speed 5010.93 samples/sec Loss 8.4556 LearningRate 0.3554 Epoch: 3 Global Step: 12650 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:38:19,713-Speed 4951.13 samples/sec Loss 8.4556 LearningRate 0.3553 Epoch: 3 Global Step: 12660 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:38:27,955-Speed 4970.13 samples/sec Loss 8.3887 LearningRate 0.3552 Epoch: 3 Global Step: 12670 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:38:36,446-Speed 4824.85 samples/sec Loss 8.5069 LearningRate 0.3551 Epoch: 3 Global Step: 12680 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:38:44,673-Speed 4979.33 samples/sec Loss 8.5646 LearningRate 0.3550 Epoch: 3 Global Step: 12690 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:38:52,761-Speed 5064.66 samples/sec Loss 8.7603 LearningRate 0.3549 Epoch: 3 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:00,918-Speed 5022.30 samples/sec Loss 8.5602 LearningRate 0.3548 Epoch: 3 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:09,021-Speed 5055.70 samples/sec Loss 8.5719 LearningRate 0.3547 Epoch: 3 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:17,215-Speed 4999.50 samples/sec Loss 8.6091 LearningRate 0.3546 Epoch: 3 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:25,758-Speed 4795.06 samples/sec Loss 8.6177 LearningRate 0.3545 Epoch: 3 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:34,280-Speed 4807.31 samples/sec Loss 8.5347 LearningRate 0.3544 Epoch: 3 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:42,621-Speed 4911.17 samples/sec Loss 8.5081 LearningRate 0.3543 Epoch: 3 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:50,819-Speed 4996.71 samples/sec Loss 8.5706 LearningRate 0.3542 Epoch: 3 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:39:59,194-Speed 4891.83 samples/sec Loss 8.5991 LearningRate 0.3541 Epoch: 3 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:40:07,444-Speed 4965.41 samples/sec Loss 8.5988 LearningRate 0.3540 Epoch: 3 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:40:15,580-Speed 5035.06 samples/sec Loss 8.6332 LearningRate 0.3539 Epoch: 3 Global Step: 12800 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:40:23,707-Speed 5040.58 samples/sec Loss 8.5955 LearningRate 0.3538 Epoch: 3 Global Step: 12810 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:40:31,815-Speed 5052.30 samples/sec Loss 8.5680 LearningRate 0.3537 Epoch: 3 Global Step: 12820 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:40:39,977-Speed 5019.23 samples/sec Loss 8.4481 LearningRate 0.3536 Epoch: 3 Global Step: 12830 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:40:48,868-Speed 4607.94 samples/sec Loss 8.7305 LearningRate 0.3535 Epoch: 3 Global Step: 12840 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:40:57,740-Speed 4616.75 samples/sec Loss 8.7043 LearningRate 0.3534 Epoch: 3 Global Step: 12850 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:41:05,842-Speed 5056.10 samples/sec Loss 8.7421 LearningRate 0.3533 Epoch: 3 Global Step: 12860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:41:13,964-Speed 5044.01 samples/sec Loss 8.6681 LearningRate 0.3532 Epoch: 3 Global Step: 12870 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:41:22,121-Speed 5022.32 samples/sec Loss 8.6426 LearningRate 0.3531 Epoch: 3 Global Step: 12880 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:41:30,295-Speed 5011.40 samples/sec Loss 8.6194 LearningRate 0.3530 Epoch: 3 Global Step: 12890 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:41:38,464-Speed 5015.29 samples/sec Loss 8.7349 LearningRate 0.3529 Epoch: 3 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:41:46,753-Speed 4942.10 samples/sec Loss 8.8557 LearningRate 0.3528 Epoch: 3 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:41:54,875-Speed 5043.60 samples/sec Loss 8.7343 LearningRate 0.3527 Epoch: 3 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:42:03,008-Speed 5036.73 samples/sec Loss 8.6569 LearningRate 0.3526 Epoch: 3 Global Step: 12930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:42:11,219-Speed 4989.34 samples/sec Loss 8.6187 LearningRate 0.3525 Epoch: 3 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:42:19,394-Speed 5011.32 samples/sec Loss 8.6360 LearningRate 0.3524 Epoch: 3 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:42:27,511-Speed 5047.00 samples/sec Loss 8.6036 LearningRate 0.3523 Epoch: 3 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:42:35,768-Speed 4960.82 samples/sec Loss 8.6237 LearningRate 0.3522 Epoch: 3 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:42:43,914-Speed 5028.92 samples/sec Loss 8.6406 LearningRate 0.3521 Epoch: 3 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:42:52,087-Speed 5012.46 samples/sec Loss 8.6251 LearningRate 0.3520 Epoch: 3 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:43:00,209-Speed 5043.93 samples/sec Loss 8.6454 LearningRate 0.3519 Epoch: 3 Global Step: 13000 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:43:08,347-Speed 5033.89 samples/sec Loss 8.6395 LearningRate 0.3518 Epoch: 3 Global Step: 13010 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:43:16,588-Speed 4971.10 samples/sec Loss 8.6583 LearningRate 0.3517 Epoch: 3 Global Step: 13020 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:43:24,815-Speed 4979.62 samples/sec Loss 8.6167 LearningRate 0.3516 Epoch: 3 Global Step: 13030 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:43:33,064-Speed 4965.71 samples/sec Loss 8.6488 LearningRate 0.3515 Epoch: 3 Global Step: 13040 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:43:41,214-Speed 5026.32 samples/sec Loss 8.6702 LearningRate 0.3514 Epoch: 3 Global Step: 13050 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:43:49,367-Speed 5024.38 samples/sec Loss 8.6178 LearningRate 0.3513 Epoch: 3 Global Step: 13060 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:43:57,533-Speed 5017.15 samples/sec Loss 8.6452 LearningRate 0.3512 Epoch: 3 Global Step: 13070 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:44:05,655-Speed 5043.75 samples/sec Loss 8.6266 LearningRate 0.3511 Epoch: 3 Global Step: 13080 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:44:13,753-Speed 5058.79 samples/sec Loss 8.6672 LearningRate 0.3510 Epoch: 3 Global Step: 13090 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:44:21,908-Speed 5023.00 samples/sec Loss 8.5792 LearningRate 0.3509 Epoch: 3 Global Step: 13100 Fp16 Grad Scale: 524288 Required: 17 hours Training: 2022-01-17 00:44:30,058-Speed 5026.83 samples/sec Loss 8.6402 LearningRate 0.3508 Epoch: 3 Global Step: 13110 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:44:38,194-Speed 5034.61 samples/sec Loss 8.6385 LearningRate 0.3507 Epoch: 3 Global Step: 13120 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:44:46,332-Speed 5034.09 samples/sec Loss 8.6210 LearningRate 0.3506 Epoch: 3 Global Step: 13130 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:44:54,868-Speed 4799.21 samples/sec Loss 8.6379 LearningRate 0.3505 Epoch: 3 Global Step: 13140 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:45:03,045-Speed 5010.17 samples/sec Loss 8.6660 LearningRate 0.3504 Epoch: 3 Global Step: 13150 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:45:11,188-Speed 5030.53 samples/sec Loss 8.6574 LearningRate 0.3503 Epoch: 3 Global Step: 13160 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:45:19,314-Speed 5041.25 samples/sec Loss 8.7103 LearningRate 0.3502 Epoch: 3 Global Step: 13170 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:45:27,467-Speed 5024.45 samples/sec Loss 8.6581 LearningRate 0.3501 Epoch: 3 Global Step: 13180 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:45:35,560-Speed 5062.02 samples/sec Loss 8.5944 LearningRate 0.3500 Epoch: 3 Global Step: 13190 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:45:43,658-Speed 5058.84 samples/sec Loss 8.6383 LearningRate 0.3499 Epoch: 3 Global Step: 13200 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:45:52,094-Speed 4855.65 samples/sec Loss 8.6771 LearningRate 0.3498 Epoch: 3 Global Step: 13210 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:00,445-Speed 4906.06 samples/sec Loss 8.6444 LearningRate 0.3497 Epoch: 3 Global Step: 13220 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:08,717-Speed 4952.20 samples/sec Loss 8.7590 LearningRate 0.3496 Epoch: 3 Global Step: 13230 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:16,914-Speed 4997.08 samples/sec Loss 8.6199 LearningRate 0.3495 Epoch: 3 Global Step: 13240 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:25,125-Speed 4989.15 samples/sec Loss 8.5773 LearningRate 0.3494 Epoch: 3 Global Step: 13250 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:33,342-Speed 4985.62 samples/sec Loss 8.5799 LearningRate 0.3493 Epoch: 3 Global Step: 13260 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:41,490-Speed 5027.92 samples/sec Loss 8.5340 LearningRate 0.3492 Epoch: 3 Global Step: 13270 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:49,825-Speed 4914.60 samples/sec Loss 8.6978 LearningRate 0.3491 Epoch: 3 Global Step: 13280 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:46:57,950-Speed 5041.86 samples/sec Loss 8.6676 LearningRate 0.3490 Epoch: 3 Global Step: 13290 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:47:06,047-Speed 5059.64 samples/sec Loss 8.5582 LearningRate 0.3489 Epoch: 3 Global Step: 13300 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:47:14,200-Speed 5023.91 samples/sec Loss 8.6198 LearningRate 0.3488 Epoch: 3 Global Step: 13310 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:47:22,335-Speed 5036.34 samples/sec Loss 8.6610 LearningRate 0.3487 Epoch: 3 Global Step: 13320 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:47:30,482-Speed 5027.84 samples/sec Loss 8.6700 LearningRate 0.3486 Epoch: 3 Global Step: 13330 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:47:38,688-Speed 4992.19 samples/sec Loss 8.6747 LearningRate 0.3485 Epoch: 3 Global Step: 13340 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:47:46,834-Speed 5029.09 samples/sec Loss 8.7143 LearningRate 0.3484 Epoch: 3 Global Step: 13350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:47:55,109-Speed 4950.59 samples/sec Loss 8.6773 LearningRate 0.3483 Epoch: 3 Global Step: 13360 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:48:03,236-Speed 5040.51 samples/sec Loss 8.6470 LearningRate 0.3482 Epoch: 3 Global Step: 13370 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:48:11,340-Speed 5054.91 samples/sec Loss 8.7298 LearningRate 0.3482 Epoch: 3 Global Step: 13380 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:48:19,477-Speed 5034.77 samples/sec Loss 8.6365 LearningRate 0.3481 Epoch: 3 Global Step: 13390 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:48:27,819-Speed 4910.63 samples/sec Loss 8.6640 LearningRate 0.3480 Epoch: 3 Global Step: 13400 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:48:36,002-Speed 5005.76 samples/sec Loss 8.6856 LearningRate 0.3479 Epoch: 3 Global Step: 13410 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:48:44,535-Speed 4800.70 samples/sec Loss 8.5981 LearningRate 0.3478 Epoch: 3 Global Step: 13420 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:48:52,765-Speed 4977.51 samples/sec Loss 8.6885 LearningRate 0.3477 Epoch: 3 Global Step: 13430 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:01,232-Speed 4838.05 samples/sec Loss 8.5795 LearningRate 0.3476 Epoch: 3 Global Step: 13440 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:09,487-Speed 4963.19 samples/sec Loss 8.5797 LearningRate 0.3475 Epoch: 3 Global Step: 13450 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:17,657-Speed 5013.94 samples/sec Loss 8.6072 LearningRate 0.3474 Epoch: 3 Global Step: 13460 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:25,785-Speed 5040.62 samples/sec Loss 8.6282 LearningRate 0.3473 Epoch: 3 Global Step: 13470 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:33,917-Speed 5037.49 samples/sec Loss 8.5431 LearningRate 0.3472 Epoch: 3 Global Step: 13480 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:42,143-Speed 4980.27 samples/sec Loss 8.5459 LearningRate 0.3471 Epoch: 3 Global Step: 13490 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:50,467-Speed 4921.46 samples/sec Loss 8.6028 LearningRate 0.3470 Epoch: 3 Global Step: 13500 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:49:58,603-Speed 5035.15 samples/sec Loss 8.6452 LearningRate 0.3469 Epoch: 3 Global Step: 13510 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:50:06,899-Speed 4937.41 samples/sec Loss 8.5993 LearningRate 0.3468 Epoch: 3 Global Step: 13520 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:50:15,065-Speed 5016.73 samples/sec Loss 8.6138 LearningRate 0.3467 Epoch: 3 Global Step: 13530 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:50:23,276-Speed 4988.94 samples/sec Loss 8.5937 LearningRate 0.3466 Epoch: 3 Global Step: 13540 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:50:31,448-Speed 5013.11 samples/sec Loss 8.5242 LearningRate 0.3465 Epoch: 3 Global Step: 13550 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:50:39,776-Speed 4918.67 samples/sec Loss 8.6400 LearningRate 0.3464 Epoch: 3 Global Step: 13560 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:50:47,958-Speed 5006.97 samples/sec Loss 8.5532 LearningRate 0.3463 Epoch: 3 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:50:56,106-Speed 5027.42 samples/sec Loss 8.5874 LearningRate 0.3462 Epoch: 3 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:51:04,349-Speed 4969.84 samples/sec Loss 8.5693 LearningRate 0.3461 Epoch: 3 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:51:12,540-Speed 5001.31 samples/sec Loss 8.5368 LearningRate 0.3460 Epoch: 3 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:51:20,727-Speed 5003.89 samples/sec Loss 8.6135 LearningRate 0.3459 Epoch: 3 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:51:29,018-Speed 4940.68 samples/sec Loss 8.5119 LearningRate 0.3458 Epoch: 3 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:51:37,433-Speed 4868.37 samples/sec Loss 8.5677 LearningRate 0.3457 Epoch: 3 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:51:45,811-Speed 4889.32 samples/sec Loss 8.5196 LearningRate 0.3456 Epoch: 3 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:51:54,054-Speed 4969.70 samples/sec Loss 8.6213 LearningRate 0.3455 Epoch: 3 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:52:02,245-Speed 5001.21 samples/sec Loss 8.5891 LearningRate 0.3454 Epoch: 3 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:52:10,416-Speed 5013.84 samples/sec Loss 8.5250 LearningRate 0.3453 Epoch: 3 Global Step: 13670 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:52:18,580-Speed 5018.03 samples/sec Loss 8.5947 LearningRate 0.3452 Epoch: 3 Global Step: 13680 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:52:26,828-Speed 4966.45 samples/sec Loss 8.5728 LearningRate 0.3451 Epoch: 3 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:52:35,110-Speed 4946.39 samples/sec Loss 8.5369 LearningRate 0.3450 Epoch: 3 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:52:43,233-Speed 5042.95 samples/sec Loss 8.5109 LearningRate 0.3449 Epoch: 3 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:52:51,423-Speed 5002.09 samples/sec Loss 8.5797 LearningRate 0.3448 Epoch: 3 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:52:59,571-Speed 5027.94 samples/sec Loss 8.4922 LearningRate 0.3447 Epoch: 3 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:53:07,703-Speed 5037.27 samples/sec Loss 8.5835 LearningRate 0.3446 Epoch: 3 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:53:15,982-Speed 4948.18 samples/sec Loss 8.5333 LearningRate 0.3445 Epoch: 3 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:53:24,391-Speed 4871.59 samples/sec Loss 8.4719 LearningRate 0.3444 Epoch: 3 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:53:32,747-Speed 4902.29 samples/sec Loss 8.5704 LearningRate 0.3443 Epoch: 3 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:53:40,980-Speed 4975.85 samples/sec Loss 8.6703 LearningRate 0.3442 Epoch: 3 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:53:49,189-Speed 4990.34 samples/sec Loss 8.5218 LearningRate 0.3441 Epoch: 3 Global Step: 13790 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:53:57,399-Speed 4989.81 samples/sec Loss 8.5947 LearningRate 0.3440 Epoch: 3 Global Step: 13800 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:54:05,602-Speed 4993.47 samples/sec Loss 8.6134 LearningRate 0.3439 Epoch: 3 Global Step: 13810 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:54:13,891-Speed 4942.73 samples/sec Loss 8.5985 LearningRate 0.3438 Epoch: 3 Global Step: 13820 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:54:22,102-Speed 4988.86 samples/sec Loss 8.6806 LearningRate 0.3437 Epoch: 3 Global Step: 13830 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:54:30,395-Speed 4939.54 samples/sec Loss 8.6201 LearningRate 0.3436 Epoch: 3 Global Step: 13840 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-17 00:54:38,677-Speed 4946.58 samples/sec Loss 8.5223 LearningRate 0.3435 Epoch: 3 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:54:46,889-Speed 4988.23 samples/sec Loss 8.5952 LearningRate 0.3434 Epoch: 3 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:54:55,247-Speed 4901.48 samples/sec Loss 8.5291 LearningRate 0.3433 Epoch: 3 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:55:03,468-Speed 4982.91 samples/sec Loss 8.4859 LearningRate 0.3432 Epoch: 3 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:55:11,627-Speed 5021.20 samples/sec Loss 8.5018 LearningRate 0.3431 Epoch: 3 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:55:20,026-Speed 4877.59 samples/sec Loss 8.5932 LearningRate 0.3430 Epoch: 3 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:55:28,476-Speed 4847.70 samples/sec Loss 8.5594 LearningRate 0.3429 Epoch: 3 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:55:36,839-Speed 4898.30 samples/sec Loss 8.5335 LearningRate 0.3428 Epoch: 3 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:55:45,257-Speed 4866.86 samples/sec Loss 8.5255 LearningRate 0.3427 Epoch: 3 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:55:53,674-Speed 4867.00 samples/sec Loss 8.4693 LearningRate 0.3426 Epoch: 3 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-17 00:56:02,016-Speed 4910.41 samples/sec Loss 8.4858 LearningRate 0.3425 Epoch: 3 Global Step: 13950 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:56:10,290-Speed 4951.34 samples/sec Loss 8.5339 LearningRate 0.3424 Epoch: 3 Global Step: 13960 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:56:18,684-Speed 4880.24 samples/sec Loss 8.6284 LearningRate 0.3423 Epoch: 3 Global Step: 13970 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:56:26,960-Speed 4950.29 samples/sec Loss 8.5157 LearningRate 0.3422 Epoch: 3 Global Step: 13980 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:56:35,204-Speed 4968.92 samples/sec Loss 8.5076 LearningRate 0.3421 Epoch: 3 Global Step: 13990 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:56:43,447-Speed 4969.32 samples/sec Loss 8.4860 LearningRate 0.3420 Epoch: 3 Global Step: 14000 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:56:51,697-Speed 4966.30 samples/sec Loss 8.5522 LearningRate 0.3419 Epoch: 3 Global Step: 14010 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:00,056-Speed 4900.98 samples/sec Loss 8.4890 LearningRate 0.3418 Epoch: 3 Global Step: 14020 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:08,650-Speed 4766.66 samples/sec Loss 8.5025 LearningRate 0.3417 Epoch: 3 Global Step: 14030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:17,161-Speed 4812.96 samples/sec Loss 8.3907 LearningRate 0.3416 Epoch: 3 Global Step: 14040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:25,433-Speed 4952.79 samples/sec Loss 8.4741 LearningRate 0.3415 Epoch: 3 Global Step: 14050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:33,692-Speed 4959.72 samples/sec Loss 8.4238 LearningRate 0.3414 Epoch: 3 Global Step: 14060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:41,912-Speed 4983.96 samples/sec Loss 8.5125 LearningRate 0.3413 Epoch: 3 Global Step: 14070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:50,075-Speed 5018.59 samples/sec Loss 8.5113 LearningRate 0.3412 Epoch: 3 Global Step: 14080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:57:58,299-Speed 4980.94 samples/sec Loss 8.4811 LearningRate 0.3411 Epoch: 3 Global Step: 14090 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:58:06,729-Speed 4859.58 samples/sec Loss 8.4645 LearningRate 0.3410 Epoch: 3 Global Step: 14100 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:58:14,858-Speed 5039.04 samples/sec Loss 8.5031 LearningRate 0.3409 Epoch: 3 Global Step: 14110 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:58:23,019-Speed 5019.61 samples/sec Loss 8.4959 LearningRate 0.3408 Epoch: 3 Global Step: 14120 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:58:31,373-Speed 4904.22 samples/sec Loss 8.4557 LearningRate 0.3407 Epoch: 3 Global Step: 14130 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:58:39,595-Speed 4982.21 samples/sec Loss 8.5659 LearningRate 0.3406 Epoch: 3 Global Step: 14140 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:58:47,779-Speed 5005.12 samples/sec Loss 8.4397 LearningRate 0.3405 Epoch: 3 Global Step: 14150 Fp16 Grad Scale: 524288 Required: 16 hours Training: 2022-01-17 00:58:56,047-Speed 4955.07 samples/sec Loss 8.5456 LearningRate 0.3404 Epoch: 3 Global Step: 14160 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:59:04,249-Speed 4994.12 samples/sec Loss 8.5102 LearningRate 0.3403 Epoch: 3 Global Step: 14170 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:59:12,582-Speed 4915.93 samples/sec Loss 8.5037 LearningRate 0.3402 Epoch: 3 Global Step: 14180 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:59:20,734-Speed 5025.64 samples/sec Loss 8.4686 LearningRate 0.3401 Epoch: 3 Global Step: 14190 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:59:28,953-Speed 4984.29 samples/sec Loss 8.4494 LearningRate 0.3400 Epoch: 3 Global Step: 14200 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:59:37,168-Speed 4986.67 samples/sec Loss 8.5007 LearningRate 0.3399 Epoch: 3 Global Step: 14210 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:59:45,353-Speed 5004.55 samples/sec Loss 8.5337 LearningRate 0.3399 Epoch: 3 Global Step: 14220 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 00:59:53,644-Speed 4940.96 samples/sec Loss 8.4481 LearningRate 0.3398 Epoch: 3 Global Step: 14230 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:01,811-Speed 5016.44 samples/sec Loss 8.5214 LearningRate 0.3397 Epoch: 3 Global Step: 14240 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:10,148-Speed 4914.36 samples/sec Loss 8.4596 LearningRate 0.3396 Epoch: 3 Global Step: 14250 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:18,421-Speed 4951.87 samples/sec Loss 8.4745 LearningRate 0.3395 Epoch: 3 Global Step: 14260 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:26,605-Speed 5005.24 samples/sec Loss 8.4239 LearningRate 0.3394 Epoch: 3 Global Step: 14270 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:35,236-Speed 4746.34 samples/sec Loss 8.4760 LearningRate 0.3393 Epoch: 3 Global Step: 14280 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:43,527-Speed 4940.87 samples/sec Loss 8.4859 LearningRate 0.3392 Epoch: 3 Global Step: 14290 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:51,701-Speed 5011.53 samples/sec Loss 8.4534 LearningRate 0.3391 Epoch: 3 Global Step: 14300 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:00:59,981-Speed 4947.74 samples/sec Loss 8.3797 LearningRate 0.3390 Epoch: 3 Global Step: 14310 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:01:08,180-Speed 4996.61 samples/sec Loss 8.5151 LearningRate 0.3389 Epoch: 3 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:01:16,645-Speed 4839.29 samples/sec Loss 8.5172 LearningRate 0.3388 Epoch: 3 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:01:25,051-Speed 4872.99 samples/sec Loss 8.4007 LearningRate 0.3387 Epoch: 3 Global Step: 14340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:01:33,276-Speed 4981.11 samples/sec Loss 8.4799 LearningRate 0.3386 Epoch: 3 Global Step: 14350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:01:41,456-Speed 5008.19 samples/sec Loss 8.3727 LearningRate 0.3385 Epoch: 3 Global Step: 14360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:01:49,874-Speed 4865.71 samples/sec Loss 8.3863 LearningRate 0.3384 Epoch: 3 Global Step: 14370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:01:58,172-Speed 4936.99 samples/sec Loss 8.3847 LearningRate 0.3383 Epoch: 3 Global Step: 14380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:02:06,320-Speed 5027.86 samples/sec Loss 8.4151 LearningRate 0.3382 Epoch: 3 Global Step: 14390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:02:14,551-Speed 4976.87 samples/sec Loss 8.4853 LearningRate 0.3381 Epoch: 3 Global Step: 14400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:02:22,784-Speed 4975.92 samples/sec Loss 8.4159 LearningRate 0.3380 Epoch: 3 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:02:31,048-Speed 4956.65 samples/sec Loss 8.5198 LearningRate 0.3379 Epoch: 3 Global Step: 14420 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:02:39,363-Speed 4926.89 samples/sec Loss 8.4515 LearningRate 0.3378 Epoch: 3 Global Step: 14430 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:02:47,587-Speed 4981.11 samples/sec Loss 8.4046 LearningRate 0.3377 Epoch: 3 Global Step: 14440 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:02:55,806-Speed 4984.26 samples/sec Loss 8.3563 LearningRate 0.3376 Epoch: 3 Global Step: 14450 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:03:04,047-Speed 4970.98 samples/sec Loss 8.4383 LearningRate 0.3375 Epoch: 3 Global Step: 14460 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:03:12,249-Speed 4994.89 samples/sec Loss 8.3572 LearningRate 0.3374 Epoch: 3 Global Step: 14470 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:03:20,622-Speed 4892.59 samples/sec Loss 8.3445 LearningRate 0.3373 Epoch: 3 Global Step: 14480 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:03:28,990-Speed 4895.19 samples/sec Loss 8.3744 LearningRate 0.3372 Epoch: 3 Global Step: 14490 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:03:37,303-Speed 4928.37 samples/sec Loss 8.3351 LearningRate 0.3371 Epoch: 3 Global Step: 14500 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:03:45,541-Speed 4972.66 samples/sec Loss 8.4187 LearningRate 0.3370 Epoch: 3 Global Step: 14510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:03:53,840-Speed 4936.53 samples/sec Loss 8.4656 LearningRate 0.3369 Epoch: 3 Global Step: 14520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:04:02,124-Speed 4945.23 samples/sec Loss 8.4898 LearningRate 0.3368 Epoch: 3 Global Step: 14530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:04:10,555-Speed 4858.56 samples/sec Loss 8.4001 LearningRate 0.3367 Epoch: 3 Global Step: 14540 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:04:18,801-Speed 4968.25 samples/sec Loss 8.4442 LearningRate 0.3366 Epoch: 3 Global Step: 14550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:04:27,277-Speed 4832.90 samples/sec Loss 8.3814 LearningRate 0.3365 Epoch: 3 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:04:35,548-Speed 4952.92 samples/sec Loss 8.3539 LearningRate 0.3364 Epoch: 3 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:04:43,855-Speed 4931.70 samples/sec Loss 8.5016 LearningRate 0.3363 Epoch: 3 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:04:52,150-Speed 4938.08 samples/sec Loss 8.3829 LearningRate 0.3362 Epoch: 3 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:05:00,382-Speed 4976.40 samples/sec Loss 8.3873 LearningRate 0.3361 Epoch: 3 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:05:08,649-Speed 4955.03 samples/sec Loss 8.4161 LearningRate 0.3360 Epoch: 3 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:05:16,885-Speed 4974.30 samples/sec Loss 8.3219 LearningRate 0.3359 Epoch: 3 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:05:25,116-Speed 4976.52 samples/sec Loss 8.3667 LearningRate 0.3358 Epoch: 3 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:05:33,428-Speed 4929.70 samples/sec Loss 8.3597 LearningRate 0.3357 Epoch: 3 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:05:41,594-Speed 5016.06 samples/sec Loss 8.2661 LearningRate 0.3356 Epoch: 3 Global Step: 14650 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:05:49,823-Speed 4978.62 samples/sec Loss 8.4245 LearningRate 0.3355 Epoch: 3 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:05:58,176-Speed 4904.12 samples/sec Loss 8.4020 LearningRate 0.3354 Epoch: 3 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:06:06,533-Speed 4902.13 samples/sec Loss 8.3967 LearningRate 0.3353 Epoch: 3 Global Step: 14680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:06:14,769-Speed 4974.41 samples/sec Loss 8.3769 LearningRate 0.3353 Epoch: 3 Global Step: 14690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:06:23,046-Speed 4948.95 samples/sec Loss 8.3801 LearningRate 0.3352 Epoch: 3 Global Step: 14700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:06:31,340-Speed 4939.41 samples/sec Loss 8.3449 LearningRate 0.3351 Epoch: 3 Global Step: 14710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:06:39,697-Speed 4901.95 samples/sec Loss 8.3651 LearningRate 0.3350 Epoch: 3 Global Step: 14720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:06:48,282-Speed 4771.59 samples/sec Loss 8.3583 LearningRate 0.3349 Epoch: 3 Global Step: 14730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:06:56,784-Speed 4818.48 samples/sec Loss 8.3139 LearningRate 0.3348 Epoch: 3 Global Step: 14740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:07:05,050-Speed 4955.51 samples/sec Loss 8.3204 LearningRate 0.3347 Epoch: 3 Global Step: 14750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:07:13,382-Speed 4916.95 samples/sec Loss 8.3456 LearningRate 0.3346 Epoch: 3 Global Step: 14760 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:07:21,574-Speed 5000.57 samples/sec Loss 8.4048 LearningRate 0.3345 Epoch: 3 Global Step: 14770 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:07:29,972-Speed 4877.76 samples/sec Loss 8.3362 LearningRate 0.3344 Epoch: 3 Global Step: 14780 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:07:38,277-Speed 4932.69 samples/sec Loss 8.3260 LearningRate 0.3343 Epoch: 3 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:07:46,587-Speed 4929.51 samples/sec Loss 8.4311 LearningRate 0.3342 Epoch: 3 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:07:55,123-Speed 4799.25 samples/sec Loss 8.3312 LearningRate 0.3341 Epoch: 3 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:08:03,504-Speed 4887.85 samples/sec Loss 8.4211 LearningRate 0.3340 Epoch: 3 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:08:11,755-Speed 4965.25 samples/sec Loss 8.4198 LearningRate 0.3339 Epoch: 3 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:08:20,102-Speed 4907.97 samples/sec Loss 8.3218 LearningRate 0.3338 Epoch: 3 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:08:28,465-Speed 4898.40 samples/sec Loss 8.3543 LearningRate 0.3337 Epoch: 3 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:08:36,682-Speed 4985.41 samples/sec Loss 8.2874 LearningRate 0.3336 Epoch: 3 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:08:45,090-Speed 4871.94 samples/sec Loss 8.3084 LearningRate 0.3335 Epoch: 3 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:08:54,226-Speed 4483.74 samples/sec Loss 8.4737 LearningRate 0.3334 Epoch: 3 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:09:03,100-Speed 4616.86 samples/sec Loss 8.3161 LearningRate 0.3333 Epoch: 3 Global Step: 14890 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:09:12,120-Speed 4541.35 samples/sec Loss 8.3759 LearningRate 0.3332 Epoch: 3 Global Step: 14900 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:09:21,543-Speed 4347.45 samples/sec Loss 8.2730 LearningRate 0.3331 Epoch: 3 Global Step: 14910 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:09:30,892-Speed 4381.88 samples/sec Loss 8.3678 LearningRate 0.3330 Epoch: 3 Global Step: 14920 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:09:40,121-Speed 4438.51 samples/sec Loss 8.2562 LearningRate 0.3329 Epoch: 3 Global Step: 14930 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:09:48,960-Speed 4634.61 samples/sec Loss 8.2780 LearningRate 0.3328 Epoch: 3 Global Step: 14940 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:09:57,837-Speed 4614.69 samples/sec Loss 8.3061 LearningRate 0.3327 Epoch: 3 Global Step: 14950 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:10:06,673-Speed 4636.11 samples/sec Loss 8.3588 LearningRate 0.3326 Epoch: 3 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:10:15,506-Speed 4638.03 samples/sec Loss 8.2815 LearningRate 0.3325 Epoch: 3 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:10:24,336-Speed 4639.06 samples/sec Loss 8.2816 LearningRate 0.3324 Epoch: 3 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:10:33,232-Speed 4604.73 samples/sec Loss 8.3862 LearningRate 0.3323 Epoch: 3 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:10:42,080-Speed 4630.13 samples/sec Loss 8.3175 LearningRate 0.3322 Epoch: 3 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:11:30,104-[lfw][15000]XNorm: 21.191953 Training: 2022-01-17 01:11:30,105-[lfw][15000]Accuracy-Flip: 0.99683+-0.00320 Training: 2022-01-17 01:11:30,106-[lfw][15000]Accuracy-Highest: 0.99750 Training: 2022-01-17 01:12:27,706-[cfp_fp][15000]XNorm: 17.730968 Training: 2022-01-17 01:12:27,706-[cfp_fp][15000]Accuracy-Flip: 0.94657+-0.00668 Training: 2022-01-17 01:12:27,707-[cfp_fp][15000]Accuracy-Highest: 0.95943 Training: 2022-01-17 01:13:15,436-[agedb_30][15000]XNorm: 20.542435 Training: 2022-01-17 01:13:15,437-[agedb_30][15000]Accuracy-Flip: 0.97033+-0.00609 Training: 2022-01-17 01:13:15,438-[agedb_30][15000]Accuracy-Highest: 0.97033 Training: 2022-01-17 01:13:23,857-Speed 253.19 samples/sec Loss 8.2501 LearningRate 0.3321 Epoch: 3 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:13:32,174-Speed 4925.30 samples/sec Loss 8.2783 LearningRate 0.3320 Epoch: 3 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:13:40,340-Speed 5016.49 samples/sec Loss 8.2416 LearningRate 0.3319 Epoch: 3 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:13:48,506-Speed 5016.89 samples/sec Loss 8.3311 LearningRate 0.3318 Epoch: 3 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:13:56,689-Speed 5005.58 samples/sec Loss 8.2145 LearningRate 0.3318 Epoch: 3 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:14:04,819-Speed 5039.45 samples/sec Loss 8.2602 LearningRate 0.3317 Epoch: 3 Global Step: 15060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:14:12,971-Speed 5025.20 samples/sec Loss 8.2248 LearningRate 0.3316 Epoch: 3 Global Step: 15070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:14:21,198-Speed 4979.05 samples/sec Loss 8.2441 LearningRate 0.3315 Epoch: 3 Global Step: 15080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:14:29,393-Speed 4999.14 samples/sec Loss 8.3298 LearningRate 0.3314 Epoch: 3 Global Step: 15090 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:14:37,557-Speed 5018.36 samples/sec Loss 8.2650 LearningRate 0.3313 Epoch: 3 Global Step: 15100 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:14:45,781-Speed 4980.43 samples/sec Loss 8.1772 LearningRate 0.3312 Epoch: 3 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:14:53,961-Speed 5008.37 samples/sec Loss 8.2643 LearningRate 0.3311 Epoch: 3 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:15:02,159-Speed 4996.80 samples/sec Loss 8.2976 LearningRate 0.3310 Epoch: 3 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:15:10,358-Speed 4997.26 samples/sec Loss 8.2427 LearningRate 0.3309 Epoch: 3 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:15:18,684-Speed 4919.78 samples/sec Loss 8.3546 LearningRate 0.3308 Epoch: 3 Global Step: 15150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:15:27,224-Speed 4797.19 samples/sec Loss 8.2556 LearningRate 0.3307 Epoch: 3 Global Step: 15160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:15:35,482-Speed 4960.83 samples/sec Loss 8.2324 LearningRate 0.3306 Epoch: 3 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:15:43,719-Speed 4973.06 samples/sec Loss 8.2810 LearningRate 0.3305 Epoch: 3 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:15:52,000-Speed 4946.89 samples/sec Loss 8.2733 LearningRate 0.3304 Epoch: 3 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:16:00,231-Speed 4977.12 samples/sec Loss 8.2446 LearningRate 0.3303 Epoch: 3 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:16:08,497-Speed 4955.66 samples/sec Loss 8.2326 LearningRate 0.3302 Epoch: 3 Global Step: 15210 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:16:16,795-Speed 4936.69 samples/sec Loss 8.2892 LearningRate 0.3301 Epoch: 3 Global Step: 15220 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:16:25,092-Speed 4937.12 samples/sec Loss 8.1587 LearningRate 0.3300 Epoch: 3 Global Step: 15230 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:16:33,378-Speed 4944.42 samples/sec Loss 8.2796 LearningRate 0.3299 Epoch: 3 Global Step: 15240 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:16:41,691-Speed 4927.92 samples/sec Loss 8.1944 LearningRate 0.3298 Epoch: 3 Global Step: 15250 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:16:50,314-Speed 4750.38 samples/sec Loss 8.2184 LearningRate 0.3297 Epoch: 3 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:16:59,010-Speed 4711.26 samples/sec Loss 8.2624 LearningRate 0.3296 Epoch: 3 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:17:07,606-Speed 4765.89 samples/sec Loss 8.2993 LearningRate 0.3295 Epoch: 3 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:17:16,009-Speed 4874.89 samples/sec Loss 8.2073 LearningRate 0.3294 Epoch: 3 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:17:24,263-Speed 4962.72 samples/sec Loss 8.2251 LearningRate 0.3293 Epoch: 3 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:17:32,503-Speed 4971.90 samples/sec Loss 8.2421 LearningRate 0.3292 Epoch: 3 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:17:40,887-Speed 4886.07 samples/sec Loss 8.1476 LearningRate 0.3291 Epoch: 3 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:17:49,130-Speed 4970.10 samples/sec Loss 8.3070 LearningRate 0.3290 Epoch: 3 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:17:57,419-Speed 4941.72 samples/sec Loss 8.2542 LearningRate 0.3289 Epoch: 3 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:18:05,708-Speed 4942.08 samples/sec Loss 8.2086 LearningRate 0.3288 Epoch: 3 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:18:13,939-Speed 4977.11 samples/sec Loss 8.2384 LearningRate 0.3287 Epoch: 3 Global Step: 15360 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:18:22,220-Speed 4946.78 samples/sec Loss 8.1740 LearningRate 0.3287 Epoch: 3 Global Step: 15370 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:18:30,709-Speed 4825.95 samples/sec Loss 8.2382 LearningRate 0.3286 Epoch: 3 Global Step: 15380 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:18:39,306-Speed 4764.61 samples/sec Loss 8.2057 LearningRate 0.3285 Epoch: 3 Global Step: 15390 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:18:47,688-Speed 4887.24 samples/sec Loss 8.1984 LearningRate 0.3284 Epoch: 3 Global Step: 15400 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:18:55,934-Speed 4968.38 samples/sec Loss 8.2811 LearningRate 0.3283 Epoch: 3 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:19:04,161-Speed 4978.98 samples/sec Loss 8.2787 LearningRate 0.3282 Epoch: 3 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:19:12,359-Speed 4997.43 samples/sec Loss 8.1941 LearningRate 0.3281 Epoch: 3 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:19:20,576-Speed 4985.84 samples/sec Loss 8.4229 LearningRate 0.3280 Epoch: 3 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:19:28,974-Speed 4877.41 samples/sec Loss 8.2764 LearningRate 0.3279 Epoch: 3 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:19:37,257-Speed 4946.02 samples/sec Loss 8.2156 LearningRate 0.3278 Epoch: 3 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:19:45,535-Speed 4948.62 samples/sec Loss 8.2886 LearningRate 0.3277 Epoch: 3 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:19:53,768-Speed 4976.15 samples/sec Loss 8.2043 LearningRate 0.3276 Epoch: 3 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:20:02,130-Speed 4898.92 samples/sec Loss 8.1113 LearningRate 0.3275 Epoch: 3 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:20:10,604-Speed 4834.00 samples/sec Loss 8.1636 LearningRate 0.3274 Epoch: 3 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:20:19,102-Speed 4820.90 samples/sec Loss 8.1038 LearningRate 0.3273 Epoch: 3 Global Step: 15510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:20:27,276-Speed 5011.61 samples/sec Loss 8.1102 LearningRate 0.3272 Epoch: 3 Global Step: 15520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:20:35,556-Speed 4947.35 samples/sec Loss 8.2032 LearningRate 0.3271 Epoch: 3 Global Step: 15530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:20:43,860-Speed 4933.24 samples/sec Loss 8.1227 LearningRate 0.3270 Epoch: 3 Global Step: 15540 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:20:52,325-Speed 4838.97 samples/sec Loss 8.1442 LearningRate 0.3269 Epoch: 3 Global Step: 15550 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:00,598-Speed 4951.64 samples/sec Loss 8.3057 LearningRate 0.3268 Epoch: 3 Global Step: 15560 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:08,888-Speed 4941.39 samples/sec Loss 8.2969 LearningRate 0.3267 Epoch: 3 Global Step: 15570 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:17,178-Speed 4941.91 samples/sec Loss 8.2120 LearningRate 0.3266 Epoch: 3 Global Step: 15580 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:25,700-Speed 4806.81 samples/sec Loss 8.3087 LearningRate 0.3265 Epoch: 3 Global Step: 15590 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:34,040-Speed 4912.09 samples/sec Loss 8.1919 LearningRate 0.3264 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:42,321-Speed 4947.14 samples/sec Loss 8.1773 LearningRate 0.3263 Epoch: 3 Global Step: 15610 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:50,464-Speed 5030.76 samples/sec Loss 8.2229 LearningRate 0.3262 Epoch: 3 Global Step: 15620 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:21:58,735-Speed 4952.63 samples/sec Loss 8.2060 LearningRate 0.3261 Epoch: 3 Global Step: 15630 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:22:06,947-Speed 4988.96 samples/sec Loss 8.1773 LearningRate 0.3261 Epoch: 3 Global Step: 15640 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:22:15,097-Speed 5025.63 samples/sec Loss 8.0526 LearningRate 0.3260 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:22:23,419-Speed 4923.09 samples/sec Loss 8.1412 LearningRate 0.3259 Epoch: 3 Global Step: 15660 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:22:31,613-Speed 4999.61 samples/sec Loss 8.2505 LearningRate 0.3258 Epoch: 3 Global Step: 15670 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:22:39,849-Speed 4973.67 samples/sec Loss 8.1731 LearningRate 0.3257 Epoch: 3 Global Step: 15680 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:22:48,167-Speed 4924.81 samples/sec Loss 8.0937 LearningRate 0.3256 Epoch: 3 Global Step: 15690 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:22:56,380-Speed 4987.62 samples/sec Loss 8.1756 LearningRate 0.3255 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:23:04,606-Speed 4980.32 samples/sec Loss 8.2552 LearningRate 0.3254 Epoch: 3 Global Step: 15710 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:23:12,832-Speed 4979.90 samples/sec Loss 8.1808 LearningRate 0.3253 Epoch: 3 Global Step: 15720 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:23:21,036-Speed 4993.10 samples/sec Loss 8.0835 LearningRate 0.3252 Epoch: 3 Global Step: 15730 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:23:29,222-Speed 5004.80 samples/sec Loss 8.0880 LearningRate 0.3251 Epoch: 3 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:23:37,443-Speed 4982.93 samples/sec Loss 8.1213 LearningRate 0.3250 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:23:45,659-Speed 4985.98 samples/sec Loss 8.1210 LearningRate 0.3249 Epoch: 3 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:23:53,919-Speed 4959.70 samples/sec Loss 8.0849 LearningRate 0.3248 Epoch: 3 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:24:02,210-Speed 4940.83 samples/sec Loss 8.1513 LearningRate 0.3247 Epoch: 3 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:24:10,514-Speed 4932.87 samples/sec Loss 8.1672 LearningRate 0.3246 Epoch: 3 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:24:18,743-Speed 4978.27 samples/sec Loss 8.1730 LearningRate 0.3245 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:24:27,055-Speed 4928.42 samples/sec Loss 8.1541 LearningRate 0.3244 Epoch: 3 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:24:35,297-Speed 4970.19 samples/sec Loss 8.0885 LearningRate 0.3243 Epoch: 3 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:24:43,506-Speed 4990.24 samples/sec Loss 8.2006 LearningRate 0.3242 Epoch: 3 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:24:51,655-Speed 5027.33 samples/sec Loss 8.1693 LearningRate 0.3241 Epoch: 3 Global Step: 15840 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:24:59,937-Speed 4946.57 samples/sec Loss 8.1306 LearningRate 0.3240 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:25:08,322-Speed 4885.38 samples/sec Loss 8.1705 LearningRate 0.3239 Epoch: 3 Global Step: 15860 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:25:16,594-Speed 4952.19 samples/sec Loss 8.1704 LearningRate 0.3238 Epoch: 3 Global Step: 15870 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:25:24,762-Speed 5015.45 samples/sec Loss 8.1163 LearningRate 0.3237 Epoch: 3 Global Step: 15880 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:25:32,931-Speed 5014.71 samples/sec Loss 8.1316 LearningRate 0.3237 Epoch: 3 Global Step: 15890 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:25:41,164-Speed 4976.01 samples/sec Loss 8.1963 LearningRate 0.3236 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:25:49,341-Speed 5009.33 samples/sec Loss 8.1308 LearningRate 0.3235 Epoch: 3 Global Step: 15910 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:25:57,597-Speed 4962.04 samples/sec Loss 8.1483 LearningRate 0.3234 Epoch: 3 Global Step: 15920 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:26:05,895-Speed 4936.86 samples/sec Loss 8.1742 LearningRate 0.3233 Epoch: 3 Global Step: 15930 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:26:14,119-Speed 4981.12 samples/sec Loss 8.1042 LearningRate 0.3232 Epoch: 3 Global Step: 15940 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:26:22,396-Speed 4949.79 samples/sec Loss 8.1935 LearningRate 0.3231 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:26:30,897-Speed 4818.64 samples/sec Loss 8.1658 LearningRate 0.3230 Epoch: 3 Global Step: 15960 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:26:39,270-Speed 4892.30 samples/sec Loss 8.0963 LearningRate 0.3229 Epoch: 3 Global Step: 15970 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:26:47,591-Speed 4923.27 samples/sec Loss 8.1115 LearningRate 0.3228 Epoch: 3 Global Step: 15980 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:26:55,880-Speed 4942.23 samples/sec Loss 8.0126 LearningRate 0.3227 Epoch: 3 Global Step: 15990 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:27:04,142-Speed 4958.72 samples/sec Loss 8.0582 LearningRate 0.3226 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:27:12,337-Speed 4998.40 samples/sec Loss 8.0258 LearningRate 0.3225 Epoch: 3 Global Step: 16010 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:27:20,792-Speed 4845.17 samples/sec Loss 8.0655 LearningRate 0.3224 Epoch: 3 Global Step: 16020 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:27:28,996-Speed 4993.12 samples/sec Loss 8.0666 LearningRate 0.3223 Epoch: 3 Global Step: 16030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:27:37,167-Speed 5013.81 samples/sec Loss 8.0306 LearningRate 0.3222 Epoch: 3 Global Step: 16040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:27:45,484-Speed 4925.44 samples/sec Loss 8.0004 LearningRate 0.3221 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:27:53,689-Speed 4992.93 samples/sec Loss 8.0567 LearningRate 0.3220 Epoch: 3 Global Step: 16060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:01,886-Speed 4997.59 samples/sec Loss 8.0940 LearningRate 0.3219 Epoch: 3 Global Step: 16070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:10,170-Speed 4945.28 samples/sec Loss 8.0931 LearningRate 0.3218 Epoch: 3 Global Step: 16080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:18,432-Speed 4958.39 samples/sec Loss 8.0489 LearningRate 0.3217 Epoch: 3 Global Step: 16090 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:26,687-Speed 4962.00 samples/sec Loss 8.0959 LearningRate 0.3216 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:34,943-Speed 4962.45 samples/sec Loss 8.1388 LearningRate 0.3215 Epoch: 3 Global Step: 16110 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:43,155-Speed 4988.40 samples/sec Loss 8.1297 LearningRate 0.3215 Epoch: 3 Global Step: 16120 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:51,475-Speed 4923.71 samples/sec Loss 8.1263 LearningRate 0.3214 Epoch: 3 Global Step: 16130 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:28:59,767-Speed 4940.17 samples/sec Loss 8.0125 LearningRate 0.3213 Epoch: 3 Global Step: 16140 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:29:08,578-Speed 4649.93 samples/sec Loss 8.0786 LearningRate 0.3212 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:29:17,469-Speed 4607.00 samples/sec Loss 8.1135 LearningRate 0.3211 Epoch: 3 Global Step: 16160 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:29:26,378-Speed 4598.20 samples/sec Loss 8.1697 LearningRate 0.3210 Epoch: 3 Global Step: 16170 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:29:34,807-Speed 4860.68 samples/sec Loss 8.1152 LearningRate 0.3209 Epoch: 3 Global Step: 16180 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:29:43,104-Speed 4937.15 samples/sec Loss 8.1451 LearningRate 0.3208 Epoch: 3 Global Step: 16190 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:29:51,384-Speed 4947.47 samples/sec Loss 8.0878 LearningRate 0.3207 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:29:59,681-Speed 4937.20 samples/sec Loss 8.0718 LearningRate 0.3206 Epoch: 3 Global Step: 16210 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:30:07,980-Speed 4936.49 samples/sec Loss 8.0623 LearningRate 0.3205 Epoch: 3 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:30:16,287-Speed 4930.82 samples/sec Loss 8.0341 LearningRate 0.3204 Epoch: 3 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:30:24,435-Speed 5028.05 samples/sec Loss 8.0730 LearningRate 0.3203 Epoch: 3 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:30:32,558-Speed 5043.06 samples/sec Loss 8.0194 LearningRate 0.3202 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:30:40,797-Speed 4972.45 samples/sec Loss 8.0327 LearningRate 0.3201 Epoch: 3 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:30:48,900-Speed 5055.31 samples/sec Loss 7.9732 LearningRate 0.3200 Epoch: 3 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:30:57,188-Speed 4942.47 samples/sec Loss 8.0447 LearningRate 0.3199 Epoch: 3 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:31:05,379-Speed 5001.71 samples/sec Loss 8.0035 LearningRate 0.3198 Epoch: 3 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:31:13,559-Speed 5007.90 samples/sec Loss 8.0302 LearningRate 0.3197 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:31:21,734-Speed 5011.18 samples/sec Loss 8.0163 LearningRate 0.3196 Epoch: 3 Global Step: 16310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:31:29,913-Speed 5008.70 samples/sec Loss 8.0626 LearningRate 0.3195 Epoch: 3 Global Step: 16320 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:31:38,160-Speed 4966.91 samples/sec Loss 8.0684 LearningRate 0.3194 Epoch: 3 Global Step: 16330 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:31:46,308-Speed 5027.98 samples/sec Loss 8.0348 LearningRate 0.3194 Epoch: 3 Global Step: 16340 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:31:54,458-Speed 5026.35 samples/sec Loss 8.0244 LearningRate 0.3193 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:32:02,678-Speed 4983.31 samples/sec Loss 8.0623 LearningRate 0.3192 Epoch: 3 Global Step: 16360 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:32:11,110-Speed 4858.74 samples/sec Loss 8.0470 LearningRate 0.3191 Epoch: 3 Global Step: 16370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:32:19,425-Speed 4926.77 samples/sec Loss 7.9111 LearningRate 0.3190 Epoch: 3 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:32:27,510-Speed 5066.19 samples/sec Loss 7.9685 LearningRate 0.3189 Epoch: 3 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:32:35,850-Speed 4912.20 samples/sec Loss 8.0351 LearningRate 0.3188 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:32:44,034-Speed 5005.49 samples/sec Loss 8.0453 LearningRate 0.3187 Epoch: 3 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:32:52,315-Speed 4947.16 samples/sec Loss 8.0686 LearningRate 0.3186 Epoch: 3 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:33:00,532-Speed 4985.38 samples/sec Loss 8.0875 LearningRate 0.3185 Epoch: 3 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:33:08,659-Speed 5040.64 samples/sec Loss 7.9974 LearningRate 0.3184 Epoch: 3 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:33:16,779-Speed 5045.19 samples/sec Loss 7.9481 LearningRate 0.3183 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:33:24,906-Speed 5040.58 samples/sec Loss 8.0413 LearningRate 0.3182 Epoch: 3 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:33:33,025-Speed 5044.96 samples/sec Loss 8.0257 LearningRate 0.3181 Epoch: 3 Global Step: 16470 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:33:41,364-Speed 4912.96 samples/sec Loss 8.0381 LearningRate 0.3180 Epoch: 3 Global Step: 16480 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:33:49,835-Speed 4835.74 samples/sec Loss 8.0028 LearningRate 0.3179 Epoch: 3 Global Step: 16490 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:33:58,040-Speed 4992.91 samples/sec Loss 7.9974 LearningRate 0.3178 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:34:06,210-Speed 5014.37 samples/sec Loss 8.0039 LearningRate 0.3177 Epoch: 3 Global Step: 16510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:34:14,754-Speed 4794.60 samples/sec Loss 8.0426 LearningRate 0.3176 Epoch: 3 Global Step: 16520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:34:22,929-Speed 5011.23 samples/sec Loss 8.0293 LearningRate 0.3175 Epoch: 3 Global Step: 16530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:34:31,068-Speed 5032.99 samples/sec Loss 7.9985 LearningRate 0.3175 Epoch: 3 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:34:39,230-Speed 5019.14 samples/sec Loss 8.0631 LearningRate 0.3174 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:34:47,579-Speed 4906.54 samples/sec Loss 7.9914 LearningRate 0.3173 Epoch: 3 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:34:55,754-Speed 5011.26 samples/sec Loss 7.9854 LearningRate 0.3172 Epoch: 3 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:35:03,895-Speed 5031.92 samples/sec Loss 7.9767 LearningRate 0.3171 Epoch: 3 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:35:12,016-Speed 5044.52 samples/sec Loss 7.9684 LearningRate 0.3170 Epoch: 3 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:35:20,170-Speed 5023.74 samples/sec Loss 7.9654 LearningRate 0.3169 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:35:28,258-Speed 5065.57 samples/sec Loss 7.9864 LearningRate 0.3168 Epoch: 3 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:35:36,408-Speed 5025.75 samples/sec Loss 8.0667 LearningRate 0.3167 Epoch: 3 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:35:44,610-Speed 4995.12 samples/sec Loss 7.9676 LearningRate 0.3166 Epoch: 3 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:35:52,752-Speed 5031.10 samples/sec Loss 7.9530 LearningRate 0.3165 Epoch: 3 Global Step: 16640 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:36:01,048-Speed 4937.89 samples/sec Loss 7.9932 LearningRate 0.3164 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:36:10,193-Speed 4479.55 samples/sec Loss 8.0270 LearningRate 0.3163 Epoch: 3 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:36:19,011-Speed 4645.58 samples/sec Loss 7.9473 LearningRate 0.3162 Epoch: 3 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:36:28,024-Speed 4545.12 samples/sec Loss 7.9654 LearningRate 0.3161 Epoch: 3 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:37:03,769-Speed 1145.95 samples/sec Loss 7.8538 LearningRate 0.3160 Epoch: 4 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:37:12,168-Speed 4877.90 samples/sec Loss 7.4529 LearningRate 0.3159 Epoch: 4 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:37:20,406-Speed 4973.12 samples/sec Loss 7.5373 LearningRate 0.3158 Epoch: 4 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:37:28,714-Speed 4930.80 samples/sec Loss 7.6006 LearningRate 0.3157 Epoch: 4 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:37:36,946-Speed 4976.60 samples/sec Loss 7.4542 LearningRate 0.3157 Epoch: 4 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:37:45,188-Speed 4970.09 samples/sec Loss 7.5329 LearningRate 0.3156 Epoch: 4 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:37:53,879-Speed 4713.81 samples/sec Loss 7.5017 LearningRate 0.3155 Epoch: 4 Global Step: 16750 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:38:02,888-Speed 4546.86 samples/sec Loss 7.5027 LearningRate 0.3154 Epoch: 4 Global Step: 16760 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:38:11,857-Speed 4567.30 samples/sec Loss 7.5151 LearningRate 0.3153 Epoch: 4 Global Step: 16770 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:38:20,750-Speed 4606.97 samples/sec Loss 7.5062 LearningRate 0.3152 Epoch: 4 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:38:29,622-Speed 4617.50 samples/sec Loss 7.5863 LearningRate 0.3151 Epoch: 4 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:38:38,654-Speed 4535.63 samples/sec Loss 7.5949 LearningRate 0.3150 Epoch: 4 Global Step: 16800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:38:47,473-Speed 4644.71 samples/sec Loss 7.7037 LearningRate 0.3149 Epoch: 4 Global Step: 16810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:38:56,365-Speed 4607.06 samples/sec Loss 7.6608 LearningRate 0.3148 Epoch: 4 Global Step: 16820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:39:05,257-Speed 4607.25 samples/sec Loss 7.6339 LearningRate 0.3147 Epoch: 4 Global Step: 16830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:39:14,220-Speed 4570.22 samples/sec Loss 7.5976 LearningRate 0.3146 Epoch: 4 Global Step: 16840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:39:23,424-Speed 4450.93 samples/sec Loss 7.5535 LearningRate 0.3145 Epoch: 4 Global Step: 16850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:39:32,239-Speed 4647.41 samples/sec Loss 7.6409 LearningRate 0.3144 Epoch: 4 Global Step: 16860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:39:41,162-Speed 4590.91 samples/sec Loss 7.6291 LearningRate 0.3143 Epoch: 4 Global Step: 16870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:39:50,078-Speed 4594.66 samples/sec Loss 7.6375 LearningRate 0.3142 Epoch: 4 Global Step: 16880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:39:59,058-Speed 4562.19 samples/sec Loss 7.5519 LearningRate 0.3141 Epoch: 4 Global Step: 16890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-17 01:40:07,703-Speed 4738.19 samples/sec Loss 7.6612 LearningRate 0.3140 Epoch: 4 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:40:15,894-Speed 5001.32 samples/sec Loss 7.6744 LearningRate 0.3140 Epoch: 4 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:40:24,059-Speed 5017.18 samples/sec Loss 7.6581 LearningRate 0.3139 Epoch: 4 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:40:32,305-Speed 4967.95 samples/sec Loss 7.6825 LearningRate 0.3138 Epoch: 4 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:40:40,524-Speed 4983.97 samples/sec Loss 7.6761 LearningRate 0.3137 Epoch: 4 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:40:48,788-Speed 4957.06 samples/sec Loss 7.7223 LearningRate 0.3136 Epoch: 4 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:40:57,011-Speed 4981.82 samples/sec Loss 7.7235 LearningRate 0.3135 Epoch: 4 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:41:05,350-Speed 4912.45 samples/sec Loss 7.8151 LearningRate 0.3134 Epoch: 4 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:41:13,556-Speed 4992.63 samples/sec Loss 7.7792 LearningRate 0.3133 Epoch: 4 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:41:21,871-Speed 4926.86 samples/sec Loss 7.6992 LearningRate 0.3132 Epoch: 4 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:41:30,408-Speed 4798.32 samples/sec Loss 7.7670 LearningRate 0.3131 Epoch: 4 Global Step: 17000 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:41:38,614-Speed 4991.99 samples/sec Loss 7.7994 LearningRate 0.3130 Epoch: 4 Global Step: 17010 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:41:46,862-Speed 4967.17 samples/sec Loss 7.8263 LearningRate 0.3129 Epoch: 4 Global Step: 17020 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:41:55,207-Speed 4908.85 samples/sec Loss 7.8486 LearningRate 0.3128 Epoch: 4 Global Step: 17030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:42:03,416-Speed 4990.72 samples/sec Loss 7.7774 LearningRate 0.3127 Epoch: 4 Global Step: 17040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:42:11,641-Speed 4980.07 samples/sec Loss 7.8139 LearningRate 0.3126 Epoch: 4 Global Step: 17050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:42:19,894-Speed 4964.35 samples/sec Loss 7.8435 LearningRate 0.3125 Epoch: 4 Global Step: 17060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:42:28,066-Speed 5012.43 samples/sec Loss 7.7740 LearningRate 0.3124 Epoch: 4 Global Step: 17070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:42:36,297-Speed 4977.33 samples/sec Loss 7.7354 LearningRate 0.3123 Epoch: 4 Global Step: 17080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:42:44,511-Speed 4987.13 samples/sec Loss 7.7927 LearningRate 0.3123 Epoch: 4 Global Step: 17090 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:42:52,721-Speed 4989.83 samples/sec Loss 7.7998 LearningRate 0.3122 Epoch: 4 Global Step: 17100 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:43:00,937-Speed 4986.31 samples/sec Loss 7.8781 LearningRate 0.3121 Epoch: 4 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:43:09,194-Speed 4961.11 samples/sec Loss 7.8671 LearningRate 0.3120 Epoch: 4 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:43:17,498-Speed 4932.97 samples/sec Loss 7.8505 LearningRate 0.3119 Epoch: 4 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:43:25,856-Speed 4901.46 samples/sec Loss 7.6812 LearningRate 0.3118 Epoch: 4 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:43:34,094-Speed 4972.98 samples/sec Loss 7.8409 LearningRate 0.3117 Epoch: 4 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:43:41,980-Speed 5194.76 samples/sec Loss 7.8066 LearningRate 0.3116 Epoch: 4 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:43:50,220-Speed 4971.56 samples/sec Loss 7.9085 LearningRate 0.3115 Epoch: 4 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:43:58,476-Speed 4961.69 samples/sec Loss 7.7599 LearningRate 0.3114 Epoch: 4 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:44:06,965-Speed 4826.09 samples/sec Loss 7.8850 LearningRate 0.3113 Epoch: 4 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:44:15,442-Speed 4832.26 samples/sec Loss 7.7428 LearningRate 0.3112 Epoch: 4 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:44:23,623-Speed 5007.48 samples/sec Loss 7.8231 LearningRate 0.3111 Epoch: 4 Global Step: 17210 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:44:31,919-Speed 4938.67 samples/sec Loss 7.7442 LearningRate 0.3110 Epoch: 4 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:44:40,241-Speed 4922.56 samples/sec Loss 7.8260 LearningRate 0.3109 Epoch: 4 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:44:48,509-Speed 4954.68 samples/sec Loss 7.8563 LearningRate 0.3108 Epoch: 4 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:44:56,816-Speed 4931.55 samples/sec Loss 7.9582 LearningRate 0.3108 Epoch: 4 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:45:05,083-Speed 4955.56 samples/sec Loss 7.8689 LearningRate 0.3107 Epoch: 4 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:45:13,324-Speed 4970.75 samples/sec Loss 7.8477 LearningRate 0.3106 Epoch: 4 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:45:21,533-Speed 4990.22 samples/sec Loss 7.7625 LearningRate 0.3105 Epoch: 4 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:45:29,717-Speed 5005.90 samples/sec Loss 7.8203 LearningRate 0.3104 Epoch: 4 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:45:37,954-Speed 4973.56 samples/sec Loss 7.7320 LearningRate 0.3103 Epoch: 4 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:45:46,161-Speed 4991.32 samples/sec Loss 7.7869 LearningRate 0.3102 Epoch: 4 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:45:54,392-Speed 4976.88 samples/sec Loss 7.9628 LearningRate 0.3101 Epoch: 4 Global Step: 17320 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:46:02,602-Speed 4990.04 samples/sec Loss 7.8901 LearningRate 0.3100 Epoch: 4 Global Step: 17330 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:46:10,985-Speed 4886.41 samples/sec Loss 7.9044 LearningRate 0.3099 Epoch: 4 Global Step: 17340 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:46:19,467-Speed 4829.67 samples/sec Loss 7.7846 LearningRate 0.3098 Epoch: 4 Global Step: 17350 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:46:27,676-Speed 4990.62 samples/sec Loss 7.8488 LearningRate 0.3097 Epoch: 4 Global Step: 17360 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:46:35,828-Speed 5025.24 samples/sec Loss 7.8607 LearningRate 0.3096 Epoch: 4 Global Step: 17370 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:46:44,085-Speed 4960.98 samples/sec Loss 7.8461 LearningRate 0.3095 Epoch: 4 Global Step: 17380 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:46:52,908-Speed 4643.14 samples/sec Loss 7.7736 LearningRate 0.3094 Epoch: 4 Global Step: 17390 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:47:01,776-Speed 4619.62 samples/sec Loss 7.8030 LearningRate 0.3093 Epoch: 4 Global Step: 17400 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:47:10,662-Speed 4610.14 samples/sec Loss 7.8319 LearningRate 0.3092 Epoch: 4 Global Step: 17410 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:47:19,514-Speed 4627.55 samples/sec Loss 7.8429 LearningRate 0.3092 Epoch: 4 Global Step: 17420 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:47:28,418-Speed 4600.98 samples/sec Loss 7.8579 LearningRate 0.3091 Epoch: 4 Global Step: 17430 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:47:37,140-Speed 4696.92 samples/sec Loss 7.8050 LearningRate 0.3090 Epoch: 4 Global Step: 17440 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:47:45,022-Speed 5197.63 samples/sec Loss 7.8971 LearningRate 0.3089 Epoch: 4 Global Step: 17450 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:47:53,147-Speed 5042.05 samples/sec Loss 7.8330 LearningRate 0.3088 Epoch: 4 Global Step: 17460 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:01,322-Speed 5010.30 samples/sec Loss 7.8505 LearningRate 0.3087 Epoch: 4 Global Step: 17470 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:09,553-Speed 4977.29 samples/sec Loss 7.8099 LearningRate 0.3086 Epoch: 4 Global Step: 17480 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:17,801-Speed 4966.85 samples/sec Loss 7.7872 LearningRate 0.3085 Epoch: 4 Global Step: 17490 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:26,057-Speed 4961.58 samples/sec Loss 7.7694 LearningRate 0.3084 Epoch: 4 Global Step: 17500 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:34,317-Speed 4960.09 samples/sec Loss 7.8320 LearningRate 0.3083 Epoch: 4 Global Step: 17510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:42,522-Speed 4992.81 samples/sec Loss 7.8679 LearningRate 0.3082 Epoch: 4 Global Step: 17520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:50,768-Speed 4968.20 samples/sec Loss 7.7985 LearningRate 0.3081 Epoch: 4 Global Step: 17530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:48:58,981-Speed 4988.04 samples/sec Loss 7.8831 LearningRate 0.3080 Epoch: 4 Global Step: 17540 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:49:07,406-Speed 4862.12 samples/sec Loss 7.8380 LearningRate 0.3079 Epoch: 4 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:49:15,603-Speed 4997.69 samples/sec Loss 7.7949 LearningRate 0.3078 Epoch: 4 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:49:23,787-Speed 5005.81 samples/sec Loss 7.7608 LearningRate 0.3078 Epoch: 4 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:49:31,896-Speed 5051.26 samples/sec Loss 7.8273 LearningRate 0.3077 Epoch: 4 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:49:40,361-Speed 4839.42 samples/sec Loss 7.8569 LearningRate 0.3076 Epoch: 4 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:49:49,266-Speed 4600.68 samples/sec Loss 7.7335 LearningRate 0.3075 Epoch: 4 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:49:57,866-Speed 4762.99 samples/sec Loss 7.7494 LearningRate 0.3074 Epoch: 4 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:50:06,375-Speed 4814.74 samples/sec Loss 7.7849 LearningRate 0.3073 Epoch: 4 Global Step: 17620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:50:14,745-Speed 4894.15 samples/sec Loss 7.8019 LearningRate 0.3072 Epoch: 4 Global Step: 17630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:50:23,177-Speed 4858.37 samples/sec Loss 7.7783 LearningRate 0.3071 Epoch: 4 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:50:31,514-Speed 4913.98 samples/sec Loss 7.7360 LearningRate 0.3070 Epoch: 4 Global Step: 17650 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:50:39,943-Speed 4860.10 samples/sec Loss 7.8132 LearningRate 0.3069 Epoch: 4 Global Step: 17660 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:50:48,213-Speed 4953.52 samples/sec Loss 7.7909 LearningRate 0.3068 Epoch: 4 Global Step: 17670 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:50:56,499-Speed 4943.67 samples/sec Loss 7.7250 LearningRate 0.3067 Epoch: 4 Global Step: 17680 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:51:04,914-Speed 4868.56 samples/sec Loss 7.7954 LearningRate 0.3066 Epoch: 4 Global Step: 17690 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:51:13,198-Speed 4945.02 samples/sec Loss 7.8077 LearningRate 0.3065 Epoch: 4 Global Step: 17700 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:51:21,548-Speed 4906.43 samples/sec Loss 7.8985 LearningRate 0.3064 Epoch: 4 Global Step: 17710 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:51:29,930-Speed 4887.19 samples/sec Loss 7.7752 LearningRate 0.3064 Epoch: 4 Global Step: 17720 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:51:38,232-Speed 4934.43 samples/sec Loss 7.8097 LearningRate 0.3063 Epoch: 4 Global Step: 17730 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:51:46,605-Speed 4892.06 samples/sec Loss 7.8097 LearningRate 0.3062 Epoch: 4 Global Step: 17740 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:51:54,910-Speed 4932.60 samples/sec Loss 7.8215 LearningRate 0.3061 Epoch: 4 Global Step: 17750 Fp16 Grad Scale: 524288 Required: 16 hours Training: 2022-01-17 01:52:03,189-Speed 4948.36 samples/sec Loss 7.7709 LearningRate 0.3060 Epoch: 4 Global Step: 17760 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:52:11,494-Speed 4932.39 samples/sec Loss 7.7129 LearningRate 0.3059 Epoch: 4 Global Step: 17770 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:52:19,748-Speed 4963.11 samples/sec Loss 7.7235 LearningRate 0.3058 Epoch: 4 Global Step: 17780 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:52:28,358-Speed 4758.26 samples/sec Loss 7.7277 LearningRate 0.3057 Epoch: 4 Global Step: 17790 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:52:37,012-Speed 4733.30 samples/sec Loss 7.6866 LearningRate 0.3056 Epoch: 4 Global Step: 17800 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:52:45,609-Speed 4765.26 samples/sec Loss 7.7963 LearningRate 0.3055 Epoch: 4 Global Step: 17810 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:52:54,078-Speed 4837.47 samples/sec Loss 7.8555 LearningRate 0.3054 Epoch: 4 Global Step: 17820 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:02,265-Speed 5003.45 samples/sec Loss 7.6467 LearningRate 0.3053 Epoch: 4 Global Step: 17830 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:10,490-Speed 4980.44 samples/sec Loss 7.7584 LearningRate 0.3052 Epoch: 4 Global Step: 17840 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:18,676-Speed 5004.40 samples/sec Loss 7.7313 LearningRate 0.3051 Epoch: 4 Global Step: 17850 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:26,877-Speed 4995.12 samples/sec Loss 7.8208 LearningRate 0.3050 Epoch: 4 Global Step: 17860 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:35,072-Speed 4998.90 samples/sec Loss 7.6973 LearningRate 0.3050 Epoch: 4 Global Step: 17870 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:43,231-Speed 5020.70 samples/sec Loss 7.7469 LearningRate 0.3049 Epoch: 4 Global Step: 17880 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:51,552-Speed 4923.55 samples/sec Loss 7.7714 LearningRate 0.3048 Epoch: 4 Global Step: 17890 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:53:59,761-Speed 4990.44 samples/sec Loss 7.7255 LearningRate 0.3047 Epoch: 4 Global Step: 17900 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:54:07,973-Speed 4988.07 samples/sec Loss 7.7919 LearningRate 0.3046 Epoch: 4 Global Step: 17910 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:54:16,206-Speed 4975.93 samples/sec Loss 7.8285 LearningRate 0.3045 Epoch: 4 Global Step: 17920 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:54:24,581-Speed 4891.18 samples/sec Loss 7.7729 LearningRate 0.3044 Epoch: 4 Global Step: 17930 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:54:33,252-Speed 4724.19 samples/sec Loss 7.7501 LearningRate 0.3043 Epoch: 4 Global Step: 17940 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:54:41,606-Speed 4903.77 samples/sec Loss 7.7302 LearningRate 0.3042 Epoch: 4 Global Step: 17950 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:54:49,814-Speed 4990.66 samples/sec Loss 7.7133 LearningRate 0.3041 Epoch: 4 Global Step: 17960 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:54:58,118-Speed 4933.57 samples/sec Loss 7.7586 LearningRate 0.3040 Epoch: 4 Global Step: 17970 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:55:06,354-Speed 4974.01 samples/sec Loss 7.7998 LearningRate 0.3039 Epoch: 4 Global Step: 17980 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:55:14,581-Speed 4979.24 samples/sec Loss 7.7874 LearningRate 0.3038 Epoch: 4 Global Step: 17990 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:55:22,825-Speed 4968.92 samples/sec Loss 7.8333 LearningRate 0.3037 Epoch: 4 Global Step: 18000 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:55:31,180-Speed 4903.47 samples/sec Loss 7.7931 LearningRate 0.3037 Epoch: 4 Global Step: 18010 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:55:39,419-Speed 4971.92 samples/sec Loss 7.7391 LearningRate 0.3036 Epoch: 4 Global Step: 18020 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:55:47,660-Speed 4970.70 samples/sec Loss 7.8628 LearningRate 0.3035 Epoch: 4 Global Step: 18030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:55:55,849-Speed 5002.73 samples/sec Loss 7.8095 LearningRate 0.3034 Epoch: 4 Global Step: 18040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:56:04,108-Speed 4960.01 samples/sec Loss 7.7552 LearningRate 0.3033 Epoch: 4 Global Step: 18050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:56:12,434-Speed 4920.11 samples/sec Loss 7.7057 LearningRate 0.3032 Epoch: 4 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:56:20,748-Speed 4927.39 samples/sec Loss 7.7431 LearningRate 0.3031 Epoch: 4 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:56:28,979-Speed 4977.26 samples/sec Loss 7.6587 LearningRate 0.3030 Epoch: 4 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:56:37,228-Speed 4965.92 samples/sec Loss 7.7956 LearningRate 0.3029 Epoch: 4 Global Step: 18090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:56:45,856-Speed 4747.79 samples/sec Loss 7.7586 LearningRate 0.3028 Epoch: 4 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:56:54,135-Speed 4948.21 samples/sec Loss 7.7337 LearningRate 0.3027 Epoch: 4 Global Step: 18110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:57:02,322-Speed 5004.14 samples/sec Loss 7.7518 LearningRate 0.3026 Epoch: 4 Global Step: 18120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:57:10,592-Speed 4954.09 samples/sec Loss 7.6449 LearningRate 0.3025 Epoch: 4 Global Step: 18130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:57:18,805-Speed 4987.25 samples/sec Loss 7.7426 LearningRate 0.3024 Epoch: 4 Global Step: 18140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:57:27,021-Speed 4986.20 samples/sec Loss 7.7487 LearningRate 0.3024 Epoch: 4 Global Step: 18150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:57:35,262-Speed 4971.01 samples/sec Loss 7.7445 LearningRate 0.3023 Epoch: 4 Global Step: 18160 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:57:43,457-Speed 4998.84 samples/sec Loss 7.6685 LearningRate 0.3022 Epoch: 4 Global Step: 18170 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:57:51,649-Speed 5000.59 samples/sec Loss 7.7235 LearningRate 0.3021 Epoch: 4 Global Step: 18180 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:57:59,870-Speed 4982.95 samples/sec Loss 7.7326 LearningRate 0.3020 Epoch: 4 Global Step: 18190 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:58:08,178-Speed 4931.29 samples/sec Loss 7.5841 LearningRate 0.3019 Epoch: 4 Global Step: 18200 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 01:58:16,426-Speed 4966.51 samples/sec Loss 7.7177 LearningRate 0.3018 Epoch: 4 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:58:24,683-Speed 4961.15 samples/sec Loss 7.7034 LearningRate 0.3017 Epoch: 4 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:58:33,131-Speed 4849.21 samples/sec Loss 7.7551 LearningRate 0.3016 Epoch: 4 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:58:41,721-Speed 4768.82 samples/sec Loss 7.6612 LearningRate 0.3015 Epoch: 4 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:58:49,944-Speed 4981.58 samples/sec Loss 7.7478 LearningRate 0.3014 Epoch: 4 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:58:58,129-Speed 5005.94 samples/sec Loss 7.6348 LearningRate 0.3013 Epoch: 4 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:59:06,364-Speed 4974.35 samples/sec Loss 7.6709 LearningRate 0.3012 Epoch: 4 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:59:14,669-Speed 4932.49 samples/sec Loss 7.6822 LearningRate 0.3012 Epoch: 4 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:59:22,950-Speed 4947.05 samples/sec Loss 7.6396 LearningRate 0.3011 Epoch: 4 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:59:31,178-Speed 4978.78 samples/sec Loss 7.7341 LearningRate 0.3010 Epoch: 4 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:59:39,423-Speed 4968.56 samples/sec Loss 7.6910 LearningRate 0.3009 Epoch: 4 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:59:47,680-Speed 4960.96 samples/sec Loss 7.5560 LearningRate 0.3008 Epoch: 4 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 01:59:55,945-Speed 4956.37 samples/sec Loss 7.7715 LearningRate 0.3007 Epoch: 4 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:00:04,196-Speed 4965.42 samples/sec Loss 7.7493 LearningRate 0.3006 Epoch: 4 Global Step: 18340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:00:12,396-Speed 4995.82 samples/sec Loss 7.6093 LearningRate 0.3005 Epoch: 4 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:00:20,680-Speed 4944.80 samples/sec Loss 7.6457 LearningRate 0.3004 Epoch: 4 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:00:28,940-Speed 4959.58 samples/sec Loss 7.7110 LearningRate 0.3003 Epoch: 4 Global Step: 18370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:00:37,447-Speed 4815.03 samples/sec Loss 7.7426 LearningRate 0.3002 Epoch: 4 Global Step: 18380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:00:45,909-Speed 4841.66 samples/sec Loss 7.6631 LearningRate 0.3001 Epoch: 4 Global Step: 18390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:00:54,347-Speed 4854.47 samples/sec Loss 7.6257 LearningRate 0.3000 Epoch: 4 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:01:02,709-Speed 4899.25 samples/sec Loss 7.7328 LearningRate 0.3000 Epoch: 4 Global Step: 18410 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 02:01:11,396-Speed 4715.86 samples/sec Loss 7.6985 LearningRate 0.2999 Epoch: 4 Global Step: 18420 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 02:01:19,886-Speed 4824.91 samples/sec Loss 7.6458 LearningRate 0.2998 Epoch: 4 Global Step: 18430 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-17 02:01:28,737-Speed 4627.96 samples/sec Loss 7.6906 LearningRate 0.2997 Epoch: 4 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:01:37,216-Speed 4831.24 samples/sec Loss 7.6888 LearningRate 0.2996 Epoch: 4 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:01:45,848-Speed 4745.83 samples/sec Loss 7.6816 LearningRate 0.2995 Epoch: 4 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:01:54,111-Speed 4958.47 samples/sec Loss 7.6901 LearningRate 0.2994 Epoch: 4 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:02:02,397-Speed 4943.48 samples/sec Loss 7.6696 LearningRate 0.2993 Epoch: 4 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:02:10,671-Speed 4951.13 samples/sec Loss 7.6960 LearningRate 0.2992 Epoch: 4 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:02:18,930-Speed 4960.14 samples/sec Loss 7.5891 LearningRate 0.2991 Epoch: 4 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:02:27,174-Speed 4969.21 samples/sec Loss 7.6879 LearningRate 0.2990 Epoch: 4 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-17 02:02:35,552-Speed 4889.29 samples/sec Loss 7.6059 LearningRate 0.2989 Epoch: 4 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:02:43,891-Speed 4913.00 samples/sec Loss 7.6630 LearningRate 0.2988 Epoch: 4 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:02:52,165-Speed 4950.59 samples/sec Loss 7.6083 LearningRate 0.2988 Epoch: 4 Global Step: 18540 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:00,385-Speed 4983.58 samples/sec Loss 7.6633 LearningRate 0.2987 Epoch: 4 Global Step: 18550 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:08,649-Speed 4957.60 samples/sec Loss 7.7112 LearningRate 0.2986 Epoch: 4 Global Step: 18560 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:16,908-Speed 4959.99 samples/sec Loss 7.5822 LearningRate 0.2985 Epoch: 4 Global Step: 18570 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:25,001-Speed 5062.02 samples/sec Loss 7.6129 LearningRate 0.2984 Epoch: 4 Global Step: 18580 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:33,259-Speed 4960.70 samples/sec Loss 7.6431 LearningRate 0.2983 Epoch: 4 Global Step: 18590 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:41,560-Speed 4935.34 samples/sec Loss 7.6731 LearningRate 0.2982 Epoch: 4 Global Step: 18600 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:49,878-Speed 4924.97 samples/sec Loss 7.6457 LearningRate 0.2981 Epoch: 4 Global Step: 18610 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:03:58,162-Speed 4944.60 samples/sec Loss 7.6876 LearningRate 0.2980 Epoch: 4 Global Step: 18620 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:04:06,385-Speed 4982.32 samples/sec Loss 7.6329 LearningRate 0.2979 Epoch: 4 Global Step: 18630 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:04:14,613-Speed 4978.91 samples/sec Loss 7.6315 LearningRate 0.2978 Epoch: 4 Global Step: 18640 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:04:22,836-Speed 4981.96 samples/sec Loss 7.6499 LearningRate 0.2977 Epoch: 4 Global Step: 18650 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:04:31,229-Speed 4880.25 samples/sec Loss 7.6204 LearningRate 0.2977 Epoch: 4 Global Step: 18660 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:04:39,655-Speed 4862.25 samples/sec Loss 7.6793 LearningRate 0.2976 Epoch: 4 Global Step: 18670 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:04:47,971-Speed 4925.88 samples/sec Loss 7.5799 LearningRate 0.2975 Epoch: 4 Global Step: 18680 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:04:56,364-Speed 4881.31 samples/sec Loss 7.7158 LearningRate 0.2974 Epoch: 4 Global Step: 18690 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:05:04,699-Speed 4914.36 samples/sec Loss 7.5619 LearningRate 0.2973 Epoch: 4 Global Step: 18700 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:05:12,966-Speed 4955.41 samples/sec Loss 7.5977 LearningRate 0.2972 Epoch: 4 Global Step: 18710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:05:21,200-Speed 4975.41 samples/sec Loss 7.6634 LearningRate 0.2971 Epoch: 4 Global Step: 18720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:05:29,520-Speed 4923.77 samples/sec Loss 7.6455 LearningRate 0.2970 Epoch: 4 Global Step: 18730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:05:37,801-Speed 4946.65 samples/sec Loss 7.5928 LearningRate 0.2969 Epoch: 4 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:05:46,063-Speed 4958.14 samples/sec Loss 7.7111 LearningRate 0.2968 Epoch: 4 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:05:54,355-Speed 4940.45 samples/sec Loss 7.6352 LearningRate 0.2967 Epoch: 4 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:06:02,653-Speed 4936.61 samples/sec Loss 7.5545 LearningRate 0.2966 Epoch: 4 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:06:10,944-Speed 4941.58 samples/sec Loss 7.6305 LearningRate 0.2965 Epoch: 4 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:06:19,348-Speed 4874.24 samples/sec Loss 7.7114 LearningRate 0.2965 Epoch: 4 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:06:27,797-Speed 4848.50 samples/sec Loss 7.6369 LearningRate 0.2964 Epoch: 4 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:06:36,378-Speed 4774.25 samples/sec Loss 7.6210 LearningRate 0.2963 Epoch: 4 Global Step: 18810 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:06:44,627-Speed 4965.88 samples/sec Loss 7.5550 LearningRate 0.2962 Epoch: 4 Global Step: 18820 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:06:53,106-Speed 4831.67 samples/sec Loss 7.6005 LearningRate 0.2961 Epoch: 4 Global Step: 18830 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:07:01,266-Speed 5020.57 samples/sec Loss 7.6427 LearningRate 0.2960 Epoch: 4 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:07:09,529-Speed 4957.19 samples/sec Loss 7.6125 LearningRate 0.2959 Epoch: 4 Global Step: 18850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:07:17,940-Speed 4870.90 samples/sec Loss 7.6458 LearningRate 0.2958 Epoch: 4 Global Step: 18860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:07:26,244-Speed 4933.43 samples/sec Loss 7.5951 LearningRate 0.2957 Epoch: 4 Global Step: 18870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:07:34,432-Speed 5002.63 samples/sec Loss 7.6303 LearningRate 0.2956 Epoch: 4 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:07:42,735-Speed 4933.50 samples/sec Loss 7.5825 LearningRate 0.2955 Epoch: 4 Global Step: 18890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:07:50,938-Speed 4994.47 samples/sec Loss 7.5686 LearningRate 0.2955 Epoch: 4 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:07:59,386-Speed 4848.87 samples/sec Loss 7.5438 LearningRate 0.2954 Epoch: 4 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:08:07,800-Speed 4868.90 samples/sec Loss 7.5613 LearningRate 0.2953 Epoch: 4 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:08:16,004-Speed 4993.13 samples/sec Loss 7.5414 LearningRate 0.2952 Epoch: 4 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:08:24,201-Speed 4997.97 samples/sec Loss 7.5437 LearningRate 0.2951 Epoch: 4 Global Step: 18940 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:08:32,418-Speed 4984.84 samples/sec Loss 7.5589 LearningRate 0.2950 Epoch: 4 Global Step: 18950 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:08:40,817-Speed 4877.52 samples/sec Loss 7.6000 LearningRate 0.2949 Epoch: 4 Global Step: 18960 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:08:49,038-Speed 4983.30 samples/sec Loss 7.6265 LearningRate 0.2948 Epoch: 4 Global Step: 18970 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:08:57,241-Speed 4993.87 samples/sec Loss 7.6356 LearningRate 0.2947 Epoch: 4 Global Step: 18980 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:09:05,440-Speed 4996.23 samples/sec Loss 7.6051 LearningRate 0.2946 Epoch: 4 Global Step: 18990 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:09:13,625-Speed 5004.94 samples/sec Loss 7.5376 LearningRate 0.2945 Epoch: 4 Global Step: 19000 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:09:21,835-Speed 4989.94 samples/sec Loss 7.6072 LearningRate 0.2944 Epoch: 4 Global Step: 19010 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:09:30,106-Speed 4952.86 samples/sec Loss 7.6003 LearningRate 0.2944 Epoch: 4 Global Step: 19020 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:09:38,283-Speed 5009.89 samples/sec Loss 7.5938 LearningRate 0.2943 Epoch: 4 Global Step: 19030 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:09:46,423-Speed 5032.63 samples/sec Loss 7.5617 LearningRate 0.2942 Epoch: 4 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:09:54,701-Speed 4948.28 samples/sec Loss 7.5453 LearningRate 0.2941 Epoch: 4 Global Step: 19050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:10:02,979-Speed 4948.96 samples/sec Loss 7.5506 LearningRate 0.2940 Epoch: 4 Global Step: 19060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:10:11,256-Speed 4949.55 samples/sec Loss 7.5688 LearningRate 0.2939 Epoch: 4 Global Step: 19070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:10:19,523-Speed 4955.20 samples/sec Loss 7.5099 LearningRate 0.2938 Epoch: 4 Global Step: 19080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:10:27,895-Speed 4893.05 samples/sec Loss 7.4557 LearningRate 0.2937 Epoch: 4 Global Step: 19090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:10:36,393-Speed 4821.00 samples/sec Loss 7.5460 LearningRate 0.2936 Epoch: 4 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:10:44,715-Speed 4922.04 samples/sec Loss 7.5407 LearningRate 0.2935 Epoch: 4 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:10:53,035-Speed 4924.21 samples/sec Loss 7.5743 LearningRate 0.2934 Epoch: 4 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:11:01,337-Speed 4934.20 samples/sec Loss 7.5907 LearningRate 0.2933 Epoch: 4 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:11:09,772-Speed 4856.67 samples/sec Loss 7.5273 LearningRate 0.2933 Epoch: 4 Global Step: 19140 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:11:18,052-Speed 4947.37 samples/sec Loss 7.5906 LearningRate 0.2932 Epoch: 4 Global Step: 19150 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:11:26,388-Speed 4914.35 samples/sec Loss 7.5430 LearningRate 0.2931 Epoch: 4 Global Step: 19160 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:11:34,657-Speed 4954.30 samples/sec Loss 7.5230 LearningRate 0.2930 Epoch: 4 Global Step: 19170 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:11:42,924-Speed 4955.21 samples/sec Loss 7.5215 LearningRate 0.2929 Epoch: 4 Global Step: 19180 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:11:51,189-Speed 4956.47 samples/sec Loss 7.6076 LearningRate 0.2928 Epoch: 4 Global Step: 19190 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:11:59,487-Speed 4936.69 samples/sec Loss 7.6048 LearningRate 0.2927 Epoch: 4 Global Step: 19200 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:12:07,711-Speed 4981.63 samples/sec Loss 7.4681 LearningRate 0.2926 Epoch: 4 Global Step: 19210 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:12:15,966-Speed 4962.53 samples/sec Loss 7.5700 LearningRate 0.2925 Epoch: 4 Global Step: 19220 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:12:24,131-Speed 5017.31 samples/sec Loss 7.5500 LearningRate 0.2924 Epoch: 4 Global Step: 19230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:12:32,308-Speed 5009.73 samples/sec Loss 7.5683 LearningRate 0.2923 Epoch: 4 Global Step: 19240 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:12:40,516-Speed 4990.78 samples/sec Loss 7.5360 LearningRate 0.2923 Epoch: 4 Global Step: 19250 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:12:48,845-Speed 4918.09 samples/sec Loss 7.5720 LearningRate 0.2922 Epoch: 4 Global Step: 19260 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:12:57,341-Speed 4821.50 samples/sec Loss 7.5675 LearningRate 0.2921 Epoch: 4 Global Step: 19270 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:13:05,531-Speed 5001.85 samples/sec Loss 7.5132 LearningRate 0.2920 Epoch: 4 Global Step: 19280 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:13:13,726-Speed 4998.99 samples/sec Loss 7.4862 LearningRate 0.2919 Epoch: 4 Global Step: 19290 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:13:22,041-Speed 4926.95 samples/sec Loss 7.5812 LearningRate 0.2918 Epoch: 4 Global Step: 19300 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:13:30,390-Speed 4906.22 samples/sec Loss 7.5252 LearningRate 0.2917 Epoch: 4 Global Step: 19310 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:13:38,614-Speed 4981.46 samples/sec Loss 7.5441 LearningRate 0.2916 Epoch: 4 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:13:47,530-Speed 4594.83 samples/sec Loss 7.5538 LearningRate 0.2915 Epoch: 4 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:13:56,453-Speed 4590.84 samples/sec Loss 7.4534 LearningRate 0.2914 Epoch: 4 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:14:05,361-Speed 4598.21 samples/sec Loss 7.4455 LearningRate 0.2913 Epoch: 4 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:14:14,366-Speed 4549.43 samples/sec Loss 7.4664 LearningRate 0.2913 Epoch: 4 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:14:22,814-Speed 4849.41 samples/sec Loss 7.5354 LearningRate 0.2912 Epoch: 4 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:14:31,073-Speed 4959.57 samples/sec Loss 7.5420 LearningRate 0.2911 Epoch: 4 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:14:39,378-Speed 4932.73 samples/sec Loss 7.5927 LearningRate 0.2910 Epoch: 4 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:14:47,614-Speed 4974.25 samples/sec Loss 7.5293 LearningRate 0.2909 Epoch: 4 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:14:55,904-Speed 4941.54 samples/sec Loss 7.4924 LearningRate 0.2908 Epoch: 4 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:15:04,230-Speed 4920.49 samples/sec Loss 7.4597 LearningRate 0.2907 Epoch: 4 Global Step: 19420 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:15:12,720-Speed 4825.36 samples/sec Loss 7.4059 LearningRate 0.2906 Epoch: 4 Global Step: 19430 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:15:21,008-Speed 4942.48 samples/sec Loss 7.5382 LearningRate 0.2905 Epoch: 4 Global Step: 19440 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:15:29,335-Speed 4919.38 samples/sec Loss 7.4565 LearningRate 0.2904 Epoch: 4 Global Step: 19450 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:15:37,624-Speed 4942.01 samples/sec Loss 7.4610 LearningRate 0.2903 Epoch: 4 Global Step: 19460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:15:45,856-Speed 4977.03 samples/sec Loss 7.3820 LearningRate 0.2903 Epoch: 4 Global Step: 19470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:15:54,064-Speed 4990.96 samples/sec Loss 7.4748 LearningRate 0.2902 Epoch: 4 Global Step: 19480 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:16:02,285-Speed 4983.09 samples/sec Loss 7.5123 LearningRate 0.2901 Epoch: 4 Global Step: 19490 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:16:10,534-Speed 4965.63 samples/sec Loss 7.4856 LearningRate 0.2900 Epoch: 4 Global Step: 19500 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:16:18,968-Speed 4857.60 samples/sec Loss 7.4905 LearningRate 0.2899 Epoch: 4 Global Step: 19510 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:16:27,186-Speed 4984.88 samples/sec Loss 7.4026 LearningRate 0.2898 Epoch: 4 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:16:35,919-Speed 4690.52 samples/sec Loss 7.6026 LearningRate 0.2897 Epoch: 4 Global Step: 19530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:16:44,218-Speed 4935.91 samples/sec Loss 7.4967 LearningRate 0.2896 Epoch: 4 Global Step: 19540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:16:52,449-Speed 4977.40 samples/sec Loss 7.4945 LearningRate 0.2895 Epoch: 4 Global Step: 19550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:17:00,660-Speed 4988.58 samples/sec Loss 7.4523 LearningRate 0.2894 Epoch: 4 Global Step: 19560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:17:08,964-Speed 4933.41 samples/sec Loss 7.4793 LearningRate 0.2893 Epoch: 4 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:17:17,226-Speed 4958.15 samples/sec Loss 7.4898 LearningRate 0.2893 Epoch: 4 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:17:25,462-Speed 4973.65 samples/sec Loss 7.4913 LearningRate 0.2892 Epoch: 4 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:17:33,703-Speed 4971.34 samples/sec Loss 7.4132 LearningRate 0.2891 Epoch: 4 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:17:41,984-Speed 4947.16 samples/sec Loss 7.4631 LearningRate 0.2890 Epoch: 4 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:17:50,267-Speed 4945.80 samples/sec Loss 7.4253 LearningRate 0.2889 Epoch: 4 Global Step: 19620 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:17:58,549-Speed 4945.93 samples/sec Loss 7.5134 LearningRate 0.2888 Epoch: 4 Global Step: 19630 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:18:06,809-Speed 4959.11 samples/sec Loss 7.3966 LearningRate 0.2887 Epoch: 4 Global Step: 19640 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:18:15,156-Speed 4908.26 samples/sec Loss 7.4355 LearningRate 0.2886 Epoch: 4 Global Step: 19650 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:18:23,408-Speed 4964.79 samples/sec Loss 7.4939 LearningRate 0.2885 Epoch: 4 Global Step: 19660 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:18:31,649-Speed 4970.57 samples/sec Loss 7.4444 LearningRate 0.2884 Epoch: 4 Global Step: 19670 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:18:39,874-Speed 4980.94 samples/sec Loss 7.4231 LearningRate 0.2884 Epoch: 4 Global Step: 19680 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:18:48,112-Speed 4972.59 samples/sec Loss 7.4745 LearningRate 0.2883 Epoch: 4 Global Step: 19690 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:18:56,299-Speed 5003.40 samples/sec Loss 7.4207 LearningRate 0.2882 Epoch: 4 Global Step: 19700 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:19:04,588-Speed 4942.29 samples/sec Loss 7.3972 LearningRate 0.2881 Epoch: 4 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:19:12,813-Speed 4980.85 samples/sec Loss 7.4649 LearningRate 0.2880 Epoch: 4 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:19:21,115-Speed 4934.53 samples/sec Loss 7.4301 LearningRate 0.2879 Epoch: 4 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:19:29,336-Speed 4982.39 samples/sec Loss 7.4453 LearningRate 0.2878 Epoch: 4 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:19:37,618-Speed 4946.71 samples/sec Loss 7.4466 LearningRate 0.2877 Epoch: 4 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:19:45,851-Speed 4975.29 samples/sec Loss 7.4630 LearningRate 0.2876 Epoch: 4 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:19:54,029-Speed 5009.54 samples/sec Loss 7.4788 LearningRate 0.2875 Epoch: 4 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:20:02,266-Speed 4973.35 samples/sec Loss 7.4174 LearningRate 0.2874 Epoch: 4 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:20:10,422-Speed 5022.49 samples/sec Loss 7.3965 LearningRate 0.2874 Epoch: 4 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:20:18,782-Speed 4900.87 samples/sec Loss 7.4444 LearningRate 0.2873 Epoch: 4 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:20:27,053-Speed 4953.39 samples/sec Loss 7.4264 LearningRate 0.2872 Epoch: 4 Global Step: 19810 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:20:35,250-Speed 4997.83 samples/sec Loss 7.3746 LearningRate 0.2871 Epoch: 4 Global Step: 19820 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:20:43,564-Speed 4927.30 samples/sec Loss 7.4272 LearningRate 0.2870 Epoch: 4 Global Step: 19830 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:20:51,831-Speed 4955.13 samples/sec Loss 7.3925 LearningRate 0.2869 Epoch: 4 Global Step: 19840 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:21:00,030-Speed 4996.60 samples/sec Loss 7.5141 LearningRate 0.2868 Epoch: 4 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:21:08,305-Speed 4950.36 samples/sec Loss 7.4721 LearningRate 0.2867 Epoch: 4 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:21:16,642-Speed 4913.92 samples/sec Loss 7.4459 LearningRate 0.2866 Epoch: 4 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:21:24,990-Speed 4907.15 samples/sec Loss 7.4586 LearningRate 0.2865 Epoch: 4 Global Step: 19880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:21:33,204-Speed 4987.06 samples/sec Loss 7.4104 LearningRate 0.2865 Epoch: 4 Global Step: 19890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:21:41,405-Speed 4995.13 samples/sec Loss 7.4728 LearningRate 0.2864 Epoch: 4 Global Step: 19900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:21:49,713-Speed 4930.84 samples/sec Loss 7.4455 LearningRate 0.2863 Epoch: 4 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:21:58,089-Speed 4890.92 samples/sec Loss 7.4093 LearningRate 0.2862 Epoch: 4 Global Step: 19920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:22:06,271-Speed 5007.06 samples/sec Loss 7.4171 LearningRate 0.2861 Epoch: 4 Global Step: 19930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:22:14,471-Speed 4995.20 samples/sec Loss 7.4094 LearningRate 0.2860 Epoch: 4 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:22:22,678-Speed 4992.46 samples/sec Loss 7.3233 LearningRate 0.2859 Epoch: 4 Global Step: 19950 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:22:30,981-Speed 4933.80 samples/sec Loss 7.4033 LearningRate 0.2858 Epoch: 4 Global Step: 19960 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:22:39,369-Speed 4883.69 samples/sec Loss 7.4267 LearningRate 0.2857 Epoch: 4 Global Step: 19970 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:22:47,580-Speed 4988.81 samples/sec Loss 7.4559 LearningRate 0.2856 Epoch: 4 Global Step: 19980 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:22:55,868-Speed 4942.93 samples/sec Loss 7.3682 LearningRate 0.2856 Epoch: 4 Global Step: 19990 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:23:04,076-Speed 4991.19 samples/sec Loss 7.4070 LearningRate 0.2855 Epoch: 4 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:23:50,987-[lfw][20000]XNorm: 23.461378 Training: 2022-01-17 02:23:50,987-[lfw][20000]Accuracy-Flip: 0.99633+-0.00340 Training: 2022-01-17 02:23:50,988-[lfw][20000]Accuracy-Highest: 0.99750 Training: 2022-01-17 02:24:46,953-[cfp_fp][20000]XNorm: 19.968383 Training: 2022-01-17 02:24:46,954-[cfp_fp][20000]Accuracy-Flip: 0.96814+-0.00658 Training: 2022-01-17 02:24:46,955-[cfp_fp][20000]Accuracy-Highest: 0.96814 Training: 2022-01-17 02:25:34,555-[agedb_30][20000]XNorm: 22.560627 Training: 2022-01-17 02:25:34,555-[agedb_30][20000]Accuracy-Flip: 0.97333+-0.00803 Training: 2022-01-17 02:25:34,556-[agedb_30][20000]Accuracy-Highest: 0.97333 Training: 2022-01-17 02:25:42,735-Speed 258.17 samples/sec Loss 7.4078 LearningRate 0.2854 Epoch: 4 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:25:50,919-Speed 5005.50 samples/sec Loss 7.3821 LearningRate 0.2853 Epoch: 4 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:25:59,073-Speed 5024.10 samples/sec Loss 7.3438 LearningRate 0.2852 Epoch: 4 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:26:07,290-Speed 4985.76 samples/sec Loss 7.3215 LearningRate 0.2851 Epoch: 4 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:26:15,518-Speed 4977.99 samples/sec Loss 7.3724 LearningRate 0.2850 Epoch: 4 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:26:23,681-Speed 5018.85 samples/sec Loss 7.3842 LearningRate 0.2849 Epoch: 4 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:26:31,845-Speed 5017.95 samples/sec Loss 7.3306 LearningRate 0.2848 Epoch: 4 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:26:40,050-Speed 4992.94 samples/sec Loss 7.3991 LearningRate 0.2847 Epoch: 4 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:26:48,896-Speed 4631.05 samples/sec Loss 7.4086 LearningRate 0.2847 Epoch: 4 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:26:57,838-Speed 4581.21 samples/sec Loss 7.3913 LearningRate 0.2846 Epoch: 4 Global Step: 20100 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:27:06,746-Speed 4598.16 samples/sec Loss 7.4025 LearningRate 0.2845 Epoch: 4 Global Step: 20110 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:27:15,603-Speed 4625.38 samples/sec Loss 7.3893 LearningRate 0.2844 Epoch: 4 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:27:23,858-Speed 4962.69 samples/sec Loss 7.3048 LearningRate 0.2843 Epoch: 4 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:27:32,163-Speed 4932.22 samples/sec Loss 7.3965 LearningRate 0.2842 Epoch: 4 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:27:40,455-Speed 4940.76 samples/sec Loss 7.4127 LearningRate 0.2841 Epoch: 4 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:27:48,819-Speed 4897.73 samples/sec Loss 7.2823 LearningRate 0.2840 Epoch: 4 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:27:57,085-Speed 4955.62 samples/sec Loss 7.3044 LearningRate 0.2839 Epoch: 4 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:28:05,349-Speed 4957.30 samples/sec Loss 7.3730 LearningRate 0.2838 Epoch: 4 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:28:13,559-Speed 4989.49 samples/sec Loss 7.3981 LearningRate 0.2838 Epoch: 4 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:28:21,719-Speed 5020.11 samples/sec Loss 7.3227 LearningRate 0.2837 Epoch: 4 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:28:29,895-Speed 5011.06 samples/sec Loss 7.3365 LearningRate 0.2836 Epoch: 4 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:28:38,147-Speed 4963.98 samples/sec Loss 7.3564 LearningRate 0.2835 Epoch: 4 Global Step: 20220 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:28:46,445-Speed 4936.58 samples/sec Loss 7.3798 LearningRate 0.2834 Epoch: 4 Global Step: 20230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:28:54,742-Speed 4938.07 samples/sec Loss 7.3111 LearningRate 0.2833 Epoch: 4 Global Step: 20240 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:29:02,987-Speed 4968.20 samples/sec Loss 7.4057 LearningRate 0.2832 Epoch: 4 Global Step: 20250 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:29:11,234-Speed 4967.24 samples/sec Loss 7.3489 LearningRate 0.2831 Epoch: 4 Global Step: 20260 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:29:19,490-Speed 4961.88 samples/sec Loss 7.3507 LearningRate 0.2830 Epoch: 4 Global Step: 20270 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:29:27,749-Speed 4960.19 samples/sec Loss 7.2732 LearningRate 0.2830 Epoch: 4 Global Step: 20280 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:29:36,096-Speed 4907.76 samples/sec Loss 7.3321 LearningRate 0.2829 Epoch: 4 Global Step: 20290 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:29:44,369-Speed 4952.20 samples/sec Loss 7.3882 LearningRate 0.2828 Epoch: 4 Global Step: 20300 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:29:52,583-Speed 4987.22 samples/sec Loss 7.3730 LearningRate 0.2827 Epoch: 4 Global Step: 20310 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:00,848-Speed 4956.35 samples/sec Loss 7.3625 LearningRate 0.2826 Epoch: 4 Global Step: 20320 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:09,116-Speed 4954.60 samples/sec Loss 7.3457 LearningRate 0.2825 Epoch: 4 Global Step: 20330 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:17,454-Speed 4913.78 samples/sec Loss 7.3085 LearningRate 0.2824 Epoch: 4 Global Step: 20340 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:25,726-Speed 4951.97 samples/sec Loss 7.3067 LearningRate 0.2823 Epoch: 4 Global Step: 20350 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:34,374-Speed 4737.35 samples/sec Loss 7.3645 LearningRate 0.2822 Epoch: 4 Global Step: 20360 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:42,966-Speed 4767.56 samples/sec Loss 7.3285 LearningRate 0.2821 Epoch: 4 Global Step: 20370 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:51,467-Speed 4818.97 samples/sec Loss 7.3345 LearningRate 0.2821 Epoch: 4 Global Step: 20380 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:30:59,707-Speed 4971.28 samples/sec Loss 7.3178 LearningRate 0.2820 Epoch: 4 Global Step: 20390 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:31:07,940-Speed 4975.82 samples/sec Loss 7.3480 LearningRate 0.2819 Epoch: 4 Global Step: 20400 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:31:16,216-Speed 4949.85 samples/sec Loss 7.3320 LearningRate 0.2818 Epoch: 4 Global Step: 20410 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:31:24,439-Speed 4981.99 samples/sec Loss 7.3921 LearningRate 0.2817 Epoch: 4 Global Step: 20420 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:31:32,727-Speed 4942.74 samples/sec Loss 7.2595 LearningRate 0.2816 Epoch: 4 Global Step: 20430 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:31:40,954-Speed 4979.53 samples/sec Loss 7.3014 LearningRate 0.2815 Epoch: 4 Global Step: 20440 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:31:49,195-Speed 4970.55 samples/sec Loss 7.3095 LearningRate 0.2814 Epoch: 4 Global Step: 20450 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:31:57,441-Speed 4968.25 samples/sec Loss 7.2746 LearningRate 0.2813 Epoch: 4 Global Step: 20460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:32:05,699-Speed 4960.77 samples/sec Loss 7.1988 LearningRate 0.2813 Epoch: 4 Global Step: 20470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:32:14,009-Speed 4929.90 samples/sec Loss 7.4275 LearningRate 0.2812 Epoch: 4 Global Step: 20480 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:32:22,305-Speed 4937.33 samples/sec Loss 7.2474 LearningRate 0.2811 Epoch: 4 Global Step: 20490 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:32:30,843-Speed 4798.44 samples/sec Loss 7.3391 LearningRate 0.2810 Epoch: 4 Global Step: 20500 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:32:39,265-Speed 4864.13 samples/sec Loss 7.3182 LearningRate 0.2809 Epoch: 4 Global Step: 20510 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:32:47,520-Speed 4962.32 samples/sec Loss 7.3724 LearningRate 0.2808 Epoch: 4 Global Step: 20520 Fp16 Grad Scale: 524288 Required: 15 hours Training: 2022-01-17 02:32:55,807-Speed 4943.06 samples/sec Loss 7.2606 LearningRate 0.2807 Epoch: 4 Global Step: 20530 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:33:04,394-Speed 4770.86 samples/sec Loss 7.3460 LearningRate 0.2806 Epoch: 4 Global Step: 20540 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:33:12,779-Speed 4885.73 samples/sec Loss 7.2863 LearningRate 0.2805 Epoch: 4 Global Step: 20550 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:33:21,019-Speed 4971.28 samples/sec Loss 7.3122 LearningRate 0.2804 Epoch: 4 Global Step: 20560 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:33:29,389-Speed 4893.88 samples/sec Loss 7.2636 LearningRate 0.2804 Epoch: 4 Global Step: 20570 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:33:37,723-Speed 4915.84 samples/sec Loss 7.2443 LearningRate 0.2803 Epoch: 4 Global Step: 20580 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:33:45,934-Speed 4988.95 samples/sec Loss 7.2730 LearningRate 0.2802 Epoch: 4 Global Step: 20590 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:33:54,174-Speed 4971.67 samples/sec Loss 7.2287 LearningRate 0.2801 Epoch: 4 Global Step: 20600 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:02,404-Speed 4977.42 samples/sec Loss 7.3008 LearningRate 0.2800 Epoch: 4 Global Step: 20610 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:10,567-Speed 5018.51 samples/sec Loss 7.3063 LearningRate 0.2799 Epoch: 4 Global Step: 20620 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:18,739-Speed 5012.65 samples/sec Loss 7.3082 LearningRate 0.2798 Epoch: 4 Global Step: 20630 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:26,911-Speed 5012.86 samples/sec Loss 7.2420 LearningRate 0.2797 Epoch: 4 Global Step: 20640 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:35,133-Speed 4982.67 samples/sec Loss 7.2331 LearningRate 0.2796 Epoch: 4 Global Step: 20650 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:43,260-Speed 5040.83 samples/sec Loss 7.2640 LearningRate 0.2796 Epoch: 4 Global Step: 20660 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:51,480-Speed 4983.32 samples/sec Loss 7.2759 LearningRate 0.2795 Epoch: 4 Global Step: 20670 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:34:59,732-Speed 4964.04 samples/sec Loss 7.2934 LearningRate 0.2794 Epoch: 4 Global Step: 20680 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:35:07,962-Speed 4977.51 samples/sec Loss 7.3706 LearningRate 0.2793 Epoch: 4 Global Step: 20690 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:35:16,229-Speed 4955.15 samples/sec Loss 7.3070 LearningRate 0.2792 Epoch: 4 Global Step: 20700 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:35:24,442-Speed 4988.14 samples/sec Loss 7.3370 LearningRate 0.2791 Epoch: 4 Global Step: 20710 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:35:32,656-Speed 4987.14 samples/sec Loss 7.2966 LearningRate 0.2790 Epoch: 4 Global Step: 20720 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:35:40,891-Speed 4974.81 samples/sec Loss 7.2710 LearningRate 0.2789 Epoch: 4 Global Step: 20730 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:35:49,082-Speed 5001.27 samples/sec Loss 7.2606 LearningRate 0.2788 Epoch: 4 Global Step: 20740 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:35:57,272-Speed 5001.62 samples/sec Loss 7.3127 LearningRate 0.2788 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:36:05,502-Speed 4977.51 samples/sec Loss 7.2156 LearningRate 0.2787 Epoch: 4 Global Step: 20760 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:36:13,745-Speed 4970.20 samples/sec Loss 7.2560 LearningRate 0.2786 Epoch: 4 Global Step: 20770 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:36:22,018-Speed 4952.03 samples/sec Loss 7.2960 LearningRate 0.2785 Epoch: 4 Global Step: 20780 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:36:30,250-Speed 4975.98 samples/sec Loss 7.2612 LearningRate 0.2784 Epoch: 4 Global Step: 20790 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:36:38,498-Speed 4966.72 samples/sec Loss 7.2206 LearningRate 0.2783 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:36:46,666-Speed 5015.39 samples/sec Loss 7.2300 LearningRate 0.2782 Epoch: 4 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:36:55,148-Speed 4829.76 samples/sec Loss 7.2085 LearningRate 0.2781 Epoch: 4 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:37:03,535-Speed 4884.72 samples/sec Loss 7.1655 LearningRate 0.2780 Epoch: 4 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:37:11,738-Speed 4993.56 samples/sec Loss 7.2018 LearningRate 0.2780 Epoch: 4 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:37:20,401-Speed 4728.92 samples/sec Loss 7.2123 LearningRate 0.2779 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:37:28,664-Speed 4958.21 samples/sec Loss 7.2132 LearningRate 0.2778 Epoch: 4 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:38:04,609-Speed 1139.69 samples/sec Loss 6.7016 LearningRate 0.2777 Epoch: 5 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:38:12,707-Speed 5059.54 samples/sec Loss 6.6526 LearningRate 0.2776 Epoch: 5 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:38:20,921-Speed 4986.62 samples/sec Loss 6.7687 LearningRate 0.2775 Epoch: 5 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:38:29,151-Speed 4978.15 samples/sec Loss 6.6351 LearningRate 0.2774 Epoch: 5 Global Step: 20900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:38:37,406-Speed 4962.57 samples/sec Loss 6.7623 LearningRate 0.2773 Epoch: 5 Global Step: 20910 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:38:45,879-Speed 4834.58 samples/sec Loss 6.7580 LearningRate 0.2772 Epoch: 5 Global Step: 20920 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:38:54,212-Speed 4916.27 samples/sec Loss 6.7090 LearningRate 0.2772 Epoch: 5 Global Step: 20930 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:02,387-Speed 5010.67 samples/sec Loss 6.8011 LearningRate 0.2771 Epoch: 5 Global Step: 20940 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:10,519-Speed 5037.73 samples/sec Loss 6.7541 LearningRate 0.2770 Epoch: 5 Global Step: 20950 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:18,661-Speed 5031.38 samples/sec Loss 6.7802 LearningRate 0.2769 Epoch: 5 Global Step: 20960 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:26,846-Speed 5004.96 samples/sec Loss 6.9052 LearningRate 0.2768 Epoch: 5 Global Step: 20970 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:34,926-Speed 5070.56 samples/sec Loss 6.8444 LearningRate 0.2767 Epoch: 5 Global Step: 20980 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:43,056-Speed 5038.12 samples/sec Loss 6.8213 LearningRate 0.2766 Epoch: 5 Global Step: 20990 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:51,158-Speed 5056.28 samples/sec Loss 6.8640 LearningRate 0.2765 Epoch: 5 Global Step: 21000 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:39:59,278-Speed 5045.27 samples/sec Loss 6.8781 LearningRate 0.2764 Epoch: 5 Global Step: 21010 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:40:07,393-Speed 5048.15 samples/sec Loss 6.9712 LearningRate 0.2764 Epoch: 5 Global Step: 21020 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:40:15,554-Speed 5019.43 samples/sec Loss 6.9217 LearningRate 0.2763 Epoch: 5 Global Step: 21030 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:40:23,685-Speed 5038.16 samples/sec Loss 6.9220 LearningRate 0.2762 Epoch: 5 Global Step: 21040 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:40:32,003-Speed 4925.17 samples/sec Loss 6.9180 LearningRate 0.2761 Epoch: 5 Global Step: 21050 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:40:40,298-Speed 4938.71 samples/sec Loss 6.8899 LearningRate 0.2760 Epoch: 5 Global Step: 21060 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:40:48,692-Speed 4880.39 samples/sec Loss 6.9891 LearningRate 0.2759 Epoch: 5 Global Step: 21070 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:40:56,936-Speed 4968.83 samples/sec Loss 6.9927 LearningRate 0.2758 Epoch: 5 Global Step: 21080 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:41:05,180-Speed 4969.36 samples/sec Loss 7.0299 LearningRate 0.2757 Epoch: 5 Global Step: 21090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:41:13,471-Speed 4941.06 samples/sec Loss 7.0206 LearningRate 0.2757 Epoch: 5 Global Step: 21100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:41:21,822-Speed 4905.11 samples/sec Loss 7.0438 LearningRate 0.2756 Epoch: 5 Global Step: 21110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:41:29,963-Speed 5032.03 samples/sec Loss 7.0092 LearningRate 0.2755 Epoch: 5 Global Step: 21120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:41:38,265-Speed 4934.56 samples/sec Loss 7.0247 LearningRate 0.2754 Epoch: 5 Global Step: 21130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:41:46,683-Speed 4866.34 samples/sec Loss 7.0567 LearningRate 0.2753 Epoch: 5 Global Step: 21140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:41:54,890-Speed 4991.61 samples/sec Loss 7.1170 LearningRate 0.2752 Epoch: 5 Global Step: 21150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:42:03,026-Speed 5034.96 samples/sec Loss 7.0418 LearningRate 0.2751 Epoch: 5 Global Step: 21160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:42:11,163-Speed 5034.66 samples/sec Loss 6.9822 LearningRate 0.2750 Epoch: 5 Global Step: 21170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:42:19,303-Speed 5032.84 samples/sec Loss 7.1379 LearningRate 0.2749 Epoch: 5 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:42:27,564-Speed 4958.46 samples/sec Loss 7.0480 LearningRate 0.2749 Epoch: 5 Global Step: 21190 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:42:36,102-Speed 4798.62 samples/sec Loss 7.0933 LearningRate 0.2748 Epoch: 5 Global Step: 21200 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:42:44,814-Speed 4701.69 samples/sec Loss 7.0848 LearningRate 0.2747 Epoch: 5 Global Step: 21210 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:42:53,665-Speed 4628.73 samples/sec Loss 7.0656 LearningRate 0.2746 Epoch: 5 Global Step: 21220 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:43:02,413-Speed 4682.83 samples/sec Loss 7.0504 LearningRate 0.2745 Epoch: 5 Global Step: 21230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:43:11,167-Speed 4679.38 samples/sec Loss 7.1018 LearningRate 0.2744 Epoch: 5 Global Step: 21240 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:43:19,918-Speed 4681.93 samples/sec Loss 7.1417 LearningRate 0.2743 Epoch: 5 Global Step: 21250 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:43:28,712-Speed 4658.16 samples/sec Loss 7.1090 LearningRate 0.2742 Epoch: 5 Global Step: 21260 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:43:37,523-Speed 4649.04 samples/sec Loss 7.0501 LearningRate 0.2741 Epoch: 5 Global Step: 21270 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:43:46,295-Speed 4670.60 samples/sec Loss 7.1693 LearningRate 0.2741 Epoch: 5 Global Step: 21280 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:43:55,051-Speed 4678.54 samples/sec Loss 7.0722 LearningRate 0.2740 Epoch: 5 Global Step: 21290 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:44:03,793-Speed 4685.61 samples/sec Loss 7.1089 LearningRate 0.2739 Epoch: 5 Global Step: 21300 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:44:12,549-Speed 4678.81 samples/sec Loss 7.0388 LearningRate 0.2738 Epoch: 5 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:44:21,312-Speed 4674.75 samples/sec Loss 7.1207 LearningRate 0.2737 Epoch: 5 Global Step: 21320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:44:30,203-Speed 4607.48 samples/sec Loss 7.1451 LearningRate 0.2736 Epoch: 5 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:44:38,634-Speed 4858.43 samples/sec Loss 7.1193 LearningRate 0.2735 Epoch: 5 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:44:46,866-Speed 4976.50 samples/sec Loss 7.0526 LearningRate 0.2734 Epoch: 5 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:44:54,960-Speed 5061.45 samples/sec Loss 7.0573 LearningRate 0.2734 Epoch: 5 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:45:03,238-Speed 4948.45 samples/sec Loss 7.0372 LearningRate 0.2733 Epoch: 5 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:45:11,468-Speed 4977.58 samples/sec Loss 7.0800 LearningRate 0.2732 Epoch: 5 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:45:19,606-Speed 5034.02 samples/sec Loss 7.1584 LearningRate 0.2731 Epoch: 5 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:45:27,775-Speed 5014.56 samples/sec Loss 7.1027 LearningRate 0.2730 Epoch: 5 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:45:35,991-Speed 4986.01 samples/sec Loss 7.0904 LearningRate 0.2729 Epoch: 5 Global Step: 21410 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:45:44,315-Speed 4921.70 samples/sec Loss 7.0890 LearningRate 0.2728 Epoch: 5 Global Step: 21420 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:45:52,532-Speed 4985.35 samples/sec Loss 7.1125 LearningRate 0.2727 Epoch: 5 Global Step: 21430 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:46:00,738-Speed 4992.04 samples/sec Loss 7.1035 LearningRate 0.2727 Epoch: 5 Global Step: 21440 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:46:08,920-Speed 5007.02 samples/sec Loss 7.0859 LearningRate 0.2726 Epoch: 5 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:46:17,140-Speed 4983.98 samples/sec Loss 7.1308 LearningRate 0.2725 Epoch: 5 Global Step: 21460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:46:25,329-Speed 5002.74 samples/sec Loss 7.0453 LearningRate 0.2724 Epoch: 5 Global Step: 21470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:46:33,477-Speed 5027.72 samples/sec Loss 7.1002 LearningRate 0.2723 Epoch: 5 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:46:41,582-Speed 5054.90 samples/sec Loss 7.0704 LearningRate 0.2722 Epoch: 5 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:46:49,845-Speed 4957.81 samples/sec Loss 7.0576 LearningRate 0.2721 Epoch: 5 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:46:58,314-Speed 4836.95 samples/sec Loss 7.1455 LearningRate 0.2720 Epoch: 5 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:47:06,450-Speed 5035.18 samples/sec Loss 7.0708 LearningRate 0.2719 Epoch: 5 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:47:14,636-Speed 5004.08 samples/sec Loss 7.1014 LearningRate 0.2719 Epoch: 5 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:47:22,907-Speed 4953.43 samples/sec Loss 7.0082 LearningRate 0.2718 Epoch: 5 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:47:31,039-Speed 5037.42 samples/sec Loss 7.0664 LearningRate 0.2717 Epoch: 5 Global Step: 21550 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:47:39,202-Speed 5018.29 samples/sec Loss 7.0915 LearningRate 0.2716 Epoch: 5 Global Step: 21560 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:47:47,359-Speed 5022.27 samples/sec Loss 7.0071 LearningRate 0.2715 Epoch: 5 Global Step: 21570 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:47:55,520-Speed 5019.30 samples/sec Loss 7.1479 LearningRate 0.2714 Epoch: 5 Global Step: 21580 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:48:03,738-Speed 4985.39 samples/sec Loss 7.1020 LearningRate 0.2713 Epoch: 5 Global Step: 21590 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:48:11,901-Speed 5018.29 samples/sec Loss 7.1060 LearningRate 0.2712 Epoch: 5 Global Step: 21600 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:48:20,079-Speed 5009.05 samples/sec Loss 7.1003 LearningRate 0.2712 Epoch: 5 Global Step: 21610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:48:28,203-Speed 5043.05 samples/sec Loss 7.1348 LearningRate 0.2711 Epoch: 5 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:48:36,353-Speed 5026.31 samples/sec Loss 7.0564 LearningRate 0.2710 Epoch: 5 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:48:44,513-Speed 5020.41 samples/sec Loss 7.0246 LearningRate 0.2709 Epoch: 5 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:48:52,787-Speed 4950.95 samples/sec Loss 7.2141 LearningRate 0.2708 Epoch: 5 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:49:00,906-Speed 5045.85 samples/sec Loss 7.0888 LearningRate 0.2707 Epoch: 5 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:49:09,070-Speed 5017.84 samples/sec Loss 7.1443 LearningRate 0.2706 Epoch: 5 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:49:17,201-Speed 5038.52 samples/sec Loss 7.0152 LearningRate 0.2705 Epoch: 5 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:49:25,333-Speed 5036.80 samples/sec Loss 7.0158 LearningRate 0.2705 Epoch: 5 Global Step: 21690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:49:33,569-Speed 4974.42 samples/sec Loss 7.1316 LearningRate 0.2704 Epoch: 5 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:49:41,762-Speed 4999.98 samples/sec Loss 7.0497 LearningRate 0.2703 Epoch: 5 Global Step: 21710 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:49:49,902-Speed 5032.40 samples/sec Loss 7.0161 LearningRate 0.2702 Epoch: 5 Global Step: 21720 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:49:58,043-Speed 5032.72 samples/sec Loss 7.0523 LearningRate 0.2701 Epoch: 5 Global Step: 21730 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:50:06,351-Speed 4930.63 samples/sec Loss 7.1500 LearningRate 0.2700 Epoch: 5 Global Step: 21740 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:50:14,869-Speed 4809.12 samples/sec Loss 7.1793 LearningRate 0.2699 Epoch: 5 Global Step: 21750 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:50:23,408-Speed 4797.56 samples/sec Loss 7.1150 LearningRate 0.2698 Epoch: 5 Global Step: 21760 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:50:31,907-Speed 4819.83 samples/sec Loss 7.0840 LearningRate 0.2698 Epoch: 5 Global Step: 21770 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:50:40,485-Speed 4775.66 samples/sec Loss 7.1081 LearningRate 0.2697 Epoch: 5 Global Step: 21780 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:50:49,024-Speed 4797.62 samples/sec Loss 7.1045 LearningRate 0.2696 Epoch: 5 Global Step: 21790 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:50:57,575-Speed 4790.69 samples/sec Loss 7.1173 LearningRate 0.2695 Epoch: 5 Global Step: 21800 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:51:05,741-Speed 5016.49 samples/sec Loss 7.0477 LearningRate 0.2694 Epoch: 5 Global Step: 21810 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:51:13,896-Speed 5023.21 samples/sec Loss 7.0630 LearningRate 0.2693 Epoch: 5 Global Step: 21820 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:51:22,063-Speed 5015.98 samples/sec Loss 6.9855 LearningRate 0.2692 Epoch: 5 Global Step: 21830 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:51:30,174-Speed 5050.94 samples/sec Loss 7.0717 LearningRate 0.2691 Epoch: 5 Global Step: 21840 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:51:38,284-Speed 5051.45 samples/sec Loss 7.0956 LearningRate 0.2691 Epoch: 5 Global Step: 21850 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:51:46,605-Speed 4923.20 samples/sec Loss 7.1621 LearningRate 0.2690 Epoch: 5 Global Step: 21860 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:51:54,840-Speed 4974.61 samples/sec Loss 7.0677 LearningRate 0.2689 Epoch: 5 Global Step: 21870 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:52:03,073-Speed 4975.13 samples/sec Loss 7.0790 LearningRate 0.2688 Epoch: 5 Global Step: 21880 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:52:11,186-Speed 5049.47 samples/sec Loss 7.0476 LearningRate 0.2687 Epoch: 5 Global Step: 21890 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:52:19,325-Speed 5033.47 samples/sec Loss 7.0671 LearningRate 0.2686 Epoch: 5 Global Step: 21900 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:52:27,453-Speed 5039.66 samples/sec Loss 7.0212 LearningRate 0.2685 Epoch: 5 Global Step: 21910 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:52:35,603-Speed 5026.39 samples/sec Loss 7.0686 LearningRate 0.2684 Epoch: 5 Global Step: 21920 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:52:43,874-Speed 4953.09 samples/sec Loss 7.0671 LearningRate 0.2684 Epoch: 5 Global Step: 21930 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:52:52,162-Speed 4942.54 samples/sec Loss 7.0665 LearningRate 0.2683 Epoch: 5 Global Step: 21940 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:53:00,366-Speed 4993.40 samples/sec Loss 7.1075 LearningRate 0.2682 Epoch: 5 Global Step: 21950 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:53:08,537-Speed 5013.77 samples/sec Loss 7.0884 LearningRate 0.2681 Epoch: 5 Global Step: 21960 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:53:16,712-Speed 5011.64 samples/sec Loss 7.1411 LearningRate 0.2680 Epoch: 5 Global Step: 21970 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:53:24,908-Speed 4998.18 samples/sec Loss 7.0235 LearningRate 0.2679 Epoch: 5 Global Step: 21980 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:53:33,103-Speed 4998.79 samples/sec Loss 7.0283 LearningRate 0.2678 Epoch: 5 Global Step: 21990 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:53:41,260-Speed 5022.41 samples/sec Loss 7.0639 LearningRate 0.2677 Epoch: 5 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:53:49,418-Speed 5021.36 samples/sec Loss 7.0743 LearningRate 0.2677 Epoch: 5 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:53:57,552-Speed 5036.48 samples/sec Loss 7.1461 LearningRate 0.2676 Epoch: 5 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:54:05,853-Speed 4935.24 samples/sec Loss 7.1409 LearningRate 0.2675 Epoch: 5 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:54:14,208-Speed 4902.90 samples/sec Loss 7.0780 LearningRate 0.2674 Epoch: 5 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:54:22,343-Speed 5035.47 samples/sec Loss 7.0462 LearningRate 0.2673 Epoch: 5 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:54:30,749-Speed 4874.20 samples/sec Loss 7.0278 LearningRate 0.2672 Epoch: 5 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:54:39,057-Speed 4931.11 samples/sec Loss 7.1111 LearningRate 0.2671 Epoch: 5 Global Step: 22070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:54:47,183-Speed 5041.45 samples/sec Loss 7.0610 LearningRate 0.2671 Epoch: 5 Global Step: 22080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:54:55,390-Speed 4992.07 samples/sec Loss 7.0028 LearningRate 0.2670 Epoch: 5 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:55:04,045-Speed 4733.22 samples/sec Loss 7.0291 LearningRate 0.2669 Epoch: 5 Global Step: 22100 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:55:12,486-Speed 4853.25 samples/sec Loss 7.0620 LearningRate 0.2668 Epoch: 5 Global Step: 22110 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:55:22,929-Speed 3922.78 samples/sec Loss 7.0350 LearningRate 0.2667 Epoch: 5 Global Step: 22120 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:55:31,077-Speed 5027.73 samples/sec Loss 7.0126 LearningRate 0.2666 Epoch: 5 Global Step: 22130 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:55:39,257-Speed 5008.05 samples/sec Loss 7.0497 LearningRate 0.2665 Epoch: 5 Global Step: 22140 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:55:49,878-Speed 3856.93 samples/sec Loss 7.0211 LearningRate 0.2664 Epoch: 5 Global Step: 22150 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:55:58,227-Speed 4906.92 samples/sec Loss 7.1186 LearningRate 0.2664 Epoch: 5 Global Step: 22160 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:56:06,300-Speed 5073.95 samples/sec Loss 7.0591 LearningRate 0.2663 Epoch: 5 Global Step: 22170 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:56:14,448-Speed 5034.41 samples/sec Loss 7.1007 LearningRate 0.2662 Epoch: 5 Global Step: 22180 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:56:22,520-Speed 5075.25 samples/sec Loss 7.0262 LearningRate 0.2661 Epoch: 5 Global Step: 22190 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:56:30,673-Speed 5024.46 samples/sec Loss 7.0177 LearningRate 0.2660 Epoch: 5 Global Step: 22200 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:56:38,761-Speed 5064.74 samples/sec Loss 6.9697 LearningRate 0.2659 Epoch: 5 Global Step: 22210 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:56:46,860-Speed 5058.52 samples/sec Loss 6.9998 LearningRate 0.2658 Epoch: 5 Global Step: 22220 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:56:54,924-Speed 5079.66 samples/sec Loss 7.0278 LearningRate 0.2657 Epoch: 5 Global Step: 22230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:57:03,001-Speed 5072.21 samples/sec Loss 7.0637 LearningRate 0.2657 Epoch: 5 Global Step: 22240 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:57:11,142-Speed 5031.69 samples/sec Loss 7.0120 LearningRate 0.2656 Epoch: 5 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:57:19,321-Speed 5008.88 samples/sec Loss 7.0357 LearningRate 0.2655 Epoch: 5 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:57:28,094-Speed 4670.01 samples/sec Loss 7.0046 LearningRate 0.2654 Epoch: 5 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:57:36,828-Speed 4690.15 samples/sec Loss 7.0610 LearningRate 0.2653 Epoch: 5 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:57:45,475-Speed 4737.26 samples/sec Loss 7.0131 LearningRate 0.2652 Epoch: 5 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:57:53,925-Speed 4848.06 samples/sec Loss 7.0430 LearningRate 0.2651 Epoch: 5 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:58:02,301-Speed 4890.64 samples/sec Loss 7.0157 LearningRate 0.2651 Epoch: 5 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:58:10,402-Speed 5056.64 samples/sec Loss 7.0615 LearningRate 0.2650 Epoch: 5 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:58:18,633-Speed 4977.20 samples/sec Loss 7.1041 LearningRate 0.2649 Epoch: 5 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:58:26,855-Speed 4982.55 samples/sec Loss 7.0110 LearningRate 0.2648 Epoch: 5 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:58:34,928-Speed 5074.44 samples/sec Loss 7.0310 LearningRate 0.2647 Epoch: 5 Global Step: 22350 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 02:58:43,185-Speed 4961.30 samples/sec Loss 7.0491 LearningRate 0.2646 Epoch: 5 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:58:51,342-Speed 5021.97 samples/sec Loss 6.9934 LearningRate 0.2645 Epoch: 5 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:58:59,468-Speed 5041.26 samples/sec Loss 7.0121 LearningRate 0.2644 Epoch: 5 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:59:07,631-Speed 5018.26 samples/sec Loss 7.0646 LearningRate 0.2644 Epoch: 5 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:59:15,724-Speed 5061.76 samples/sec Loss 6.9688 LearningRate 0.2643 Epoch: 5 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:59:23,830-Speed 5053.95 samples/sec Loss 6.9936 LearningRate 0.2642 Epoch: 5 Global Step: 22410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:59:31,916-Speed 5066.23 samples/sec Loss 6.9345 LearningRate 0.2641 Epoch: 5 Global Step: 22420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:59:40,018-Speed 5055.76 samples/sec Loss 7.0182 LearningRate 0.2640 Epoch: 5 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:59:48,351-Speed 4916.40 samples/sec Loss 6.9507 LearningRate 0.2639 Epoch: 5 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 02:59:56,463-Speed 5050.01 samples/sec Loss 7.0530 LearningRate 0.2638 Epoch: 5 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:00:04,633-Speed 5013.99 samples/sec Loss 6.9800 LearningRate 0.2638 Epoch: 5 Global Step: 22460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:00:12,796-Speed 5018.60 samples/sec Loss 7.0235 LearningRate 0.2637 Epoch: 5 Global Step: 22470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:00:20,891-Speed 5060.31 samples/sec Loss 7.0669 LearningRate 0.2636 Epoch: 5 Global Step: 22480 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:00:28,983-Speed 5062.70 samples/sec Loss 7.0038 LearningRate 0.2635 Epoch: 5 Global Step: 22490 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:00:37,090-Speed 5053.40 samples/sec Loss 6.9402 LearningRate 0.2634 Epoch: 5 Global Step: 22500 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:00:45,211-Speed 5044.12 samples/sec Loss 6.9936 LearningRate 0.2633 Epoch: 5 Global Step: 22510 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:00:53,414-Speed 4993.89 samples/sec Loss 6.9575 LearningRate 0.2632 Epoch: 5 Global Step: 22520 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:01:01,628-Speed 4987.46 samples/sec Loss 6.9344 LearningRate 0.2632 Epoch: 5 Global Step: 22530 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:01:09,769-Speed 5031.64 samples/sec Loss 6.9943 LearningRate 0.2631 Epoch: 5 Global Step: 22540 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:01:17,939-Speed 5014.07 samples/sec Loss 6.9580 LearningRate 0.2630 Epoch: 5 Global Step: 22550 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:01:26,001-Speed 5081.19 samples/sec Loss 7.0004 LearningRate 0.2629 Epoch: 5 Global Step: 22560 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:01:34,252-Speed 4965.17 samples/sec Loss 7.0553 LearningRate 0.2628 Epoch: 5 Global Step: 22570 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:01:42,387-Speed 5035.53 samples/sec Loss 7.0124 LearningRate 0.2627 Epoch: 5 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:01:50,624-Speed 4973.02 samples/sec Loss 6.9809 LearningRate 0.2626 Epoch: 5 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:01:58,813-Speed 5003.07 samples/sec Loss 7.0063 LearningRate 0.2625 Epoch: 5 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:02:06,896-Speed 5068.17 samples/sec Loss 7.0116 LearningRate 0.2625 Epoch: 5 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:02:15,025-Speed 5038.92 samples/sec Loss 6.9573 LearningRate 0.2624 Epoch: 5 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:02:23,213-Speed 5003.56 samples/sec Loss 6.9724 LearningRate 0.2623 Epoch: 5 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:02:31,373-Speed 5019.91 samples/sec Loss 6.9575 LearningRate 0.2622 Epoch: 5 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:02:39,544-Speed 5013.50 samples/sec Loss 6.8883 LearningRate 0.2621 Epoch: 5 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:02:47,657-Speed 5049.25 samples/sec Loss 6.9457 LearningRate 0.2620 Epoch: 5 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:02:55,773-Speed 5047.99 samples/sec Loss 6.9852 LearningRate 0.2619 Epoch: 5 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:03:03,910-Speed 5033.96 samples/sec Loss 6.9046 LearningRate 0.2619 Epoch: 5 Global Step: 22680 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:03:12,038-Speed 5040.62 samples/sec Loss 6.9125 LearningRate 0.2618 Epoch: 5 Global Step: 22690 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:03:20,196-Speed 5021.26 samples/sec Loss 6.9950 LearningRate 0.2617 Epoch: 5 Global Step: 22700 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:03:28,306-Speed 5051.23 samples/sec Loss 6.8613 LearningRate 0.2616 Epoch: 5 Global Step: 22710 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:03:36,438-Speed 5037.47 samples/sec Loss 6.9331 LearningRate 0.2615 Epoch: 5 Global Step: 22720 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:03:44,574-Speed 5035.18 samples/sec Loss 6.9086 LearningRate 0.2614 Epoch: 5 Global Step: 22730 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:03:52,669-Speed 5060.48 samples/sec Loss 6.9954 LearningRate 0.2613 Epoch: 5 Global Step: 22740 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:00,778-Speed 5051.83 samples/sec Loss 7.0240 LearningRate 0.2613 Epoch: 5 Global Step: 22750 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:09,045-Speed 4955.82 samples/sec Loss 6.9266 LearningRate 0.2612 Epoch: 5 Global Step: 22760 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:17,309-Speed 4957.02 samples/sec Loss 6.9281 LearningRate 0.2611 Epoch: 5 Global Step: 22770 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:25,473-Speed 5017.73 samples/sec Loss 6.9248 LearningRate 0.2610 Epoch: 5 Global Step: 22780 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:33,674-Speed 4994.70 samples/sec Loss 6.9333 LearningRate 0.2609 Epoch: 5 Global Step: 22790 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:41,865-Speed 5001.37 samples/sec Loss 6.8631 LearningRate 0.2608 Epoch: 5 Global Step: 22800 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:50,106-Speed 4971.48 samples/sec Loss 6.9334 LearningRate 0.2607 Epoch: 5 Global Step: 22810 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:04:58,186-Speed 5069.41 samples/sec Loss 6.9524 LearningRate 0.2607 Epoch: 5 Global Step: 22820 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:05:06,324-Speed 5033.84 samples/sec Loss 7.0506 LearningRate 0.2606 Epoch: 5 Global Step: 22830 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-17 03:05:14,509-Speed 5005.04 samples/sec Loss 6.9451 LearningRate 0.2605 Epoch: 5 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-17 03:05:22,710-Speed 4995.29 samples/sec Loss 6.9253 LearningRate 0.2604 Epoch: 5 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:05:30,806-Speed 5060.00 samples/sec Loss 6.9373 LearningRate 0.2603 Epoch: 5 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:05:38,927-Speed 5044.55 samples/sec Loss 7.0127 LearningRate 0.2602 Epoch: 5 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:05:47,002-Speed 5073.06 samples/sec Loss 7.0590 LearningRate 0.2601 Epoch: 5 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:05:55,158-Speed 5023.14 samples/sec Loss 6.9579 LearningRate 0.2600 Epoch: 5 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:06:03,317-Speed 5020.60 samples/sec Loss 7.0040 LearningRate 0.2600 Epoch: 5 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:06:11,501-Speed 5005.80 samples/sec Loss 6.9125 LearningRate 0.2599 Epoch: 5 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:06:19,721-Speed 4983.63 samples/sec Loss 6.9093 LearningRate 0.2598 Epoch: 5 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:06:27,884-Speed 5017.88 samples/sec Loss 6.8888 LearningRate 0.2597 Epoch: 5 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:06:36,009-Speed 5042.30 samples/sec Loss 6.9030 LearningRate 0.2596 Epoch: 5 Global Step: 22940 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:06:44,193-Speed 5005.89 samples/sec Loss 6.8955 LearningRate 0.2595 Epoch: 5 Global Step: 22950 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:06:52,414-Speed 4982.92 samples/sec Loss 6.9292 LearningRate 0.2594 Epoch: 5 Global Step: 22960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:07:00,588-Speed 5011.29 samples/sec Loss 6.9162 LearningRate 0.2594 Epoch: 5 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:07:08,748-Speed 5021.04 samples/sec Loss 6.9342 LearningRate 0.2593 Epoch: 5 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:07:16,931-Speed 5005.98 samples/sec Loss 6.8944 LearningRate 0.2592 Epoch: 5 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:07:25,000-Speed 5076.60 samples/sec Loss 6.8944 LearningRate 0.2591 Epoch: 5 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:07:33,363-Speed 4898.48 samples/sec Loss 6.9945 LearningRate 0.2590 Epoch: 5 Global Step: 23010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:07:41,531-Speed 5015.17 samples/sec Loss 6.9494 LearningRate 0.2589 Epoch: 5 Global Step: 23020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:07:49,743-Speed 4988.67 samples/sec Loss 6.8848 LearningRate 0.2588 Epoch: 5 Global Step: 23030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:07:57,840-Speed 5059.63 samples/sec Loss 6.8755 LearningRate 0.2588 Epoch: 5 Global Step: 23040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:08:05,938-Speed 5058.59 samples/sec Loss 6.9270 LearningRate 0.2587 Epoch: 5 Global Step: 23050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:08:14,032-Speed 5061.27 samples/sec Loss 6.9207 LearningRate 0.2586 Epoch: 5 Global Step: 23060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:08:22,160-Speed 5040.07 samples/sec Loss 6.8889 LearningRate 0.2585 Epoch: 5 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:08:30,287-Speed 5040.52 samples/sec Loss 6.8187 LearningRate 0.2584 Epoch: 5 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:08:38,433-Speed 5028.83 samples/sec Loss 6.8550 LearningRate 0.2583 Epoch: 5 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:08:46,650-Speed 4985.06 samples/sec Loss 6.9233 LearningRate 0.2582 Epoch: 5 Global Step: 23100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:08:54,767-Speed 5047.29 samples/sec Loss 6.9500 LearningRate 0.2582 Epoch: 5 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:09:02,905-Speed 5033.76 samples/sec Loss 6.8968 LearningRate 0.2581 Epoch: 5 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:09:11,255-Speed 4905.79 samples/sec Loss 6.8154 LearningRate 0.2580 Epoch: 5 Global Step: 23130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:09:19,362-Speed 5053.42 samples/sec Loss 6.8954 LearningRate 0.2579 Epoch: 5 Global Step: 23140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:09:27,413-Speed 5088.65 samples/sec Loss 6.9292 LearningRate 0.2578 Epoch: 5 Global Step: 23150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:09:35,708-Speed 4937.96 samples/sec Loss 6.8823 LearningRate 0.2577 Epoch: 5 Global Step: 23160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:09:43,831-Speed 5043.90 samples/sec Loss 6.8782 LearningRate 0.2576 Epoch: 5 Global Step: 23170 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:09:51,918-Speed 5065.23 samples/sec Loss 6.8481 LearningRate 0.2576 Epoch: 5 Global Step: 23180 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:00,119-Speed 4994.93 samples/sec Loss 6.8596 LearningRate 0.2575 Epoch: 5 Global Step: 23190 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:08,209-Speed 5064.14 samples/sec Loss 6.8491 LearningRate 0.2574 Epoch: 5 Global Step: 23200 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:16,380-Speed 5013.07 samples/sec Loss 6.9112 LearningRate 0.2573 Epoch: 5 Global Step: 23210 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:24,704-Speed 4921.38 samples/sec Loss 6.8069 LearningRate 0.2572 Epoch: 5 Global Step: 23220 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:33,133-Speed 4860.29 samples/sec Loss 6.8530 LearningRate 0.2571 Epoch: 5 Global Step: 23230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:41,345-Speed 4988.12 samples/sec Loss 6.8745 LearningRate 0.2571 Epoch: 5 Global Step: 23240 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:49,557-Speed 4988.46 samples/sec Loss 6.8171 LearningRate 0.2570 Epoch: 5 Global Step: 23250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:10:57,860-Speed 4933.89 samples/sec Loss 6.8453 LearningRate 0.2569 Epoch: 5 Global Step: 23260 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:11:06,690-Speed 4639.73 samples/sec Loss 6.7633 LearningRate 0.2568 Epoch: 5 Global Step: 23270 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:11:15,464-Speed 4668.72 samples/sec Loss 6.9560 LearningRate 0.2567 Epoch: 5 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:11:24,181-Speed 4699.28 samples/sec Loss 6.8983 LearningRate 0.2566 Epoch: 5 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:11:32,634-Speed 4846.15 samples/sec Loss 6.9107 LearningRate 0.2565 Epoch: 5 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:11:40,813-Speed 5008.95 samples/sec Loss 6.8184 LearningRate 0.2565 Epoch: 5 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:11:49,046-Speed 4975.62 samples/sec Loss 6.8860 LearningRate 0.2564 Epoch: 5 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:11:57,294-Speed 4967.30 samples/sec Loss 6.8936 LearningRate 0.2563 Epoch: 5 Global Step: 23330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:12:05,736-Speed 4852.44 samples/sec Loss 6.8708 LearningRate 0.2562 Epoch: 5 Global Step: 23340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:12:13,897-Speed 5019.19 samples/sec Loss 6.9061 LearningRate 0.2561 Epoch: 5 Global Step: 23350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:12:22,142-Speed 4968.69 samples/sec Loss 6.8407 LearningRate 0.2560 Epoch: 5 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:12:30,311-Speed 5015.05 samples/sec Loss 6.8478 LearningRate 0.2559 Epoch: 5 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:12:38,510-Speed 4996.03 samples/sec Loss 6.8896 LearningRate 0.2559 Epoch: 5 Global Step: 23380 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:12:46,628-Speed 5045.99 samples/sec Loss 6.8179 LearningRate 0.2558 Epoch: 5 Global Step: 23390 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:12:54,813-Speed 5004.87 samples/sec Loss 6.8465 LearningRate 0.2557 Epoch: 5 Global Step: 23400 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:13:03,076-Speed 4958.20 samples/sec Loss 6.8698 LearningRate 0.2556 Epoch: 5 Global Step: 23410 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:13:11,146-Speed 5076.18 samples/sec Loss 6.7750 LearningRate 0.2555 Epoch: 5 Global Step: 23420 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:13:19,246-Speed 5056.98 samples/sec Loss 6.8216 LearningRate 0.2554 Epoch: 5 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:13:27,309-Speed 5080.74 samples/sec Loss 6.8570 LearningRate 0.2553 Epoch: 5 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:13:35,418-Speed 5051.47 samples/sec Loss 6.8172 LearningRate 0.2553 Epoch: 5 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:13:43,624-Speed 4992.28 samples/sec Loss 6.8464 LearningRate 0.2552 Epoch: 5 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:13:51,835-Speed 4989.00 samples/sec Loss 6.9011 LearningRate 0.2551 Epoch: 5 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:13:59,962-Speed 5041.10 samples/sec Loss 6.9142 LearningRate 0.2550 Epoch: 5 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:14:08,074-Speed 5050.23 samples/sec Loss 6.8706 LearningRate 0.2549 Epoch: 5 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:14:16,139-Speed 5079.16 samples/sec Loss 6.8451 LearningRate 0.2548 Epoch: 5 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:14:24,221-Speed 5068.52 samples/sec Loss 6.8872 LearningRate 0.2548 Epoch: 5 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:14:32,330-Speed 5052.28 samples/sec Loss 6.7792 LearningRate 0.2547 Epoch: 5 Global Step: 23520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:14:40,572-Speed 4970.19 samples/sec Loss 6.8219 LearningRate 0.2546 Epoch: 5 Global Step: 23530 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:14:48,767-Speed 4999.09 samples/sec Loss 6.7939 LearningRate 0.2545 Epoch: 5 Global Step: 23540 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:14:57,082-Speed 4926.87 samples/sec Loss 6.9168 LearningRate 0.2544 Epoch: 5 Global Step: 23550 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:15:05,490-Speed 4872.09 samples/sec Loss 6.8199 LearningRate 0.2543 Epoch: 5 Global Step: 23560 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:15:13,620-Speed 5039.04 samples/sec Loss 6.8235 LearningRate 0.2542 Epoch: 5 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:15:21,830-Speed 4989.55 samples/sec Loss 6.7863 LearningRate 0.2542 Epoch: 5 Global Step: 23580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:15:29,928-Speed 5058.89 samples/sec Loss 6.8490 LearningRate 0.2541 Epoch: 5 Global Step: 23590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:15:38,064-Speed 5034.85 samples/sec Loss 6.8547 LearningRate 0.2540 Epoch: 5 Global Step: 23600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:15:46,186-Speed 5043.99 samples/sec Loss 6.8411 LearningRate 0.2539 Epoch: 5 Global Step: 23610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:15:54,291-Speed 5055.06 samples/sec Loss 6.8503 LearningRate 0.2538 Epoch: 5 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:16:02,371-Speed 5069.98 samples/sec Loss 6.7527 LearningRate 0.2537 Epoch: 5 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:16:10,511-Speed 5032.47 samples/sec Loss 6.9085 LearningRate 0.2536 Epoch: 5 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:16:18,701-Speed 5001.45 samples/sec Loss 6.7831 LearningRate 0.2536 Epoch: 5 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:16:26,862-Speed 5020.20 samples/sec Loss 6.8662 LearningRate 0.2535 Epoch: 5 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:16:35,094-Speed 4976.30 samples/sec Loss 6.7670 LearningRate 0.2534 Epoch: 5 Global Step: 23670 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:16:43,276-Speed 5007.01 samples/sec Loss 6.7709 LearningRate 0.2533 Epoch: 5 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:16:51,432-Speed 5023.17 samples/sec Loss 6.7939 LearningRate 0.2532 Epoch: 5 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:16:59,670-Speed 4972.43 samples/sec Loss 6.7964 LearningRate 0.2531 Epoch: 5 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:17:08,078-Speed 4872.45 samples/sec Loss 6.7955 LearningRate 0.2531 Epoch: 5 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:17:16,339-Speed 4959.08 samples/sec Loss 6.7548 LearningRate 0.2530 Epoch: 5 Global Step: 23720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:17:24,499-Speed 5020.20 samples/sec Loss 6.7336 LearningRate 0.2529 Epoch: 5 Global Step: 23730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:17:32,575-Speed 5072.19 samples/sec Loss 6.7522 LearningRate 0.2528 Epoch: 5 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:17:40,716-Speed 5031.89 samples/sec Loss 6.7477 LearningRate 0.2527 Epoch: 5 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:17:49,107-Speed 4882.51 samples/sec Loss 6.7475 LearningRate 0.2526 Epoch: 5 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:17:57,568-Speed 4841.93 samples/sec Loss 6.8717 LearningRate 0.2525 Epoch: 5 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:18:06,026-Speed 4842.74 samples/sec Loss 6.8166 LearningRate 0.2525 Epoch: 5 Global Step: 23780 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:18:14,355-Speed 4918.96 samples/sec Loss 6.6642 LearningRate 0.2524 Epoch: 5 Global Step: 23790 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:18:22,479-Speed 5042.74 samples/sec Loss 6.7459 LearningRate 0.2523 Epoch: 5 Global Step: 23800 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:18:30,643-Speed 5017.50 samples/sec Loss 6.7165 LearningRate 0.2522 Epoch: 5 Global Step: 23810 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:18:38,766-Speed 5043.13 samples/sec Loss 6.7484 LearningRate 0.2521 Epoch: 5 Global Step: 23820 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:18:46,858-Speed 5062.77 samples/sec Loss 6.8244 LearningRate 0.2520 Epoch: 5 Global Step: 23830 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:18:55,021-Speed 5018.50 samples/sec Loss 6.8113 LearningRate 0.2520 Epoch: 5 Global Step: 23840 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:19:03,101-Speed 5069.87 samples/sec Loss 6.7960 LearningRate 0.2519 Epoch: 5 Global Step: 23850 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:19:11,223-Speed 5043.00 samples/sec Loss 6.7254 LearningRate 0.2518 Epoch: 5 Global Step: 23860 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:19:19,330-Speed 5053.55 samples/sec Loss 6.7535 LearningRate 0.2517 Epoch: 5 Global Step: 23870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:19:27,398-Speed 5078.15 samples/sec Loss 6.7869 LearningRate 0.2516 Epoch: 5 Global Step: 23880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:19:35,837-Speed 4854.55 samples/sec Loss 6.7330 LearningRate 0.2515 Epoch: 5 Global Step: 23890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:19:44,042-Speed 4992.24 samples/sec Loss 6.7465 LearningRate 0.2514 Epoch: 5 Global Step: 23900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:19:52,158-Speed 5047.37 samples/sec Loss 6.8573 LearningRate 0.2514 Epoch: 5 Global Step: 23910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:20:00,316-Speed 5021.94 samples/sec Loss 6.6908 LearningRate 0.2513 Epoch: 5 Global Step: 23920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:20:08,412-Speed 5059.67 samples/sec Loss 6.8153 LearningRate 0.2512 Epoch: 5 Global Step: 23930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:20:16,980-Speed 4781.72 samples/sec Loss 6.7542 LearningRate 0.2511 Epoch: 5 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:20:25,350-Speed 4894.52 samples/sec Loss 6.7559 LearningRate 0.2510 Epoch: 5 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:20:33,487-Speed 5034.19 samples/sec Loss 6.6951 LearningRate 0.2509 Epoch: 5 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:20:41,646-Speed 5021.15 samples/sec Loss 6.7808 LearningRate 0.2509 Epoch: 5 Global Step: 23970 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:20:49,781-Speed 5035.86 samples/sec Loss 6.7893 LearningRate 0.2508 Epoch: 5 Global Step: 23980 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:20:57,920-Speed 5032.78 samples/sec Loss 6.7409 LearningRate 0.2507 Epoch: 5 Global Step: 23990 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:21:06,134-Speed 4988.29 samples/sec Loss 6.8152 LearningRate 0.2506 Epoch: 5 Global Step: 24000 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:21:14,254-Speed 5045.20 samples/sec Loss 6.7694 LearningRate 0.2505 Epoch: 5 Global Step: 24010 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:21:22,412-Speed 5021.33 samples/sec Loss 6.8473 LearningRate 0.2504 Epoch: 5 Global Step: 24020 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:21:30,488-Speed 5072.72 samples/sec Loss 6.6935 LearningRate 0.2503 Epoch: 5 Global Step: 24030 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:21:38,777-Speed 4942.17 samples/sec Loss 6.6775 LearningRate 0.2503 Epoch: 5 Global Step: 24040 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:21:47,164-Speed 4883.95 samples/sec Loss 6.7392 LearningRate 0.2502 Epoch: 5 Global Step: 24050 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:21:55,419-Speed 4962.63 samples/sec Loss 6.7544 LearningRate 0.2501 Epoch: 5 Global Step: 24060 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:22:03,788-Speed 4894.93 samples/sec Loss 6.7094 LearningRate 0.2500 Epoch: 5 Global Step: 24070 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:22:12,098-Speed 4929.87 samples/sec Loss 6.7340 LearningRate 0.2499 Epoch: 5 Global Step: 24080 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:22:20,347-Speed 4965.92 samples/sec Loss 6.7215 LearningRate 0.2498 Epoch: 5 Global Step: 24090 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:22:28,581-Speed 4975.18 samples/sec Loss 6.7323 LearningRate 0.2498 Epoch: 5 Global Step: 24100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:22:36,666-Speed 5067.08 samples/sec Loss 6.6814 LearningRate 0.2497 Epoch: 5 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:22:44,750-Speed 5067.27 samples/sec Loss 6.7162 LearningRate 0.2496 Epoch: 5 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:22:52,877-Speed 5040.53 samples/sec Loss 6.7371 LearningRate 0.2495 Epoch: 5 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:23:00,935-Speed 5084.38 samples/sec Loss 6.7860 LearningRate 0.2494 Epoch: 5 Global Step: 24140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:23:09,250-Speed 4926.13 samples/sec Loss 6.6641 LearningRate 0.2493 Epoch: 5 Global Step: 24150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:23:17,528-Speed 4949.30 samples/sec Loss 6.7369 LearningRate 0.2493 Epoch: 5 Global Step: 24160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:23:25,622-Speed 5061.03 samples/sec Loss 6.7289 LearningRate 0.2492 Epoch: 5 Global Step: 24170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:23:33,738-Speed 5047.51 samples/sec Loss 6.7196 LearningRate 0.2491 Epoch: 5 Global Step: 24180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:23:41,888-Speed 5026.73 samples/sec Loss 6.7335 LearningRate 0.2490 Epoch: 5 Global Step: 24190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:23:50,028-Speed 5032.24 samples/sec Loss 6.6960 LearningRate 0.2489 Epoch: 5 Global Step: 24200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:23:58,374-Speed 4908.22 samples/sec Loss 6.7602 LearningRate 0.2488 Epoch: 5 Global Step: 24210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:24:06,488-Speed 5049.29 samples/sec Loss 6.7627 LearningRate 0.2488 Epoch: 5 Global Step: 24220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:24:14,653-Speed 5017.18 samples/sec Loss 6.7495 LearningRate 0.2487 Epoch: 5 Global Step: 24230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:24:22,754-Speed 5056.65 samples/sec Loss 6.6298 LearningRate 0.2486 Epoch: 5 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:24:30,932-Speed 5009.29 samples/sec Loss 6.6996 LearningRate 0.2485 Epoch: 5 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:24:39,022-Speed 5064.30 samples/sec Loss 6.7382 LearningRate 0.2484 Epoch: 5 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:24:47,159-Speed 5034.45 samples/sec Loss 6.7095 LearningRate 0.2483 Epoch: 5 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:24:55,363-Speed 4993.70 samples/sec Loss 6.6950 LearningRate 0.2482 Epoch: 5 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:25:03,466-Speed 5055.20 samples/sec Loss 6.7573 LearningRate 0.2482 Epoch: 5 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:25:11,566-Speed 5058.22 samples/sec Loss 6.7406 LearningRate 0.2481 Epoch: 5 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:25:19,614-Speed 5089.87 samples/sec Loss 6.7448 LearningRate 0.2480 Epoch: 5 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:25:27,785-Speed 5013.39 samples/sec Loss 6.6815 LearningRate 0.2479 Epoch: 5 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:25:35,908-Speed 5043.12 samples/sec Loss 6.6588 LearningRate 0.2478 Epoch: 5 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:25:43,962-Speed 5086.37 samples/sec Loss 6.6465 LearningRate 0.2477 Epoch: 5 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:25:52,062-Speed 5057.00 samples/sec Loss 6.6129 LearningRate 0.2477 Epoch: 5 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:26:00,177-Speed 5048.44 samples/sec Loss 6.6560 LearningRate 0.2476 Epoch: 5 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:26:08,480-Speed 4933.70 samples/sec Loss 6.6953 LearningRate 0.2475 Epoch: 5 Global Step: 24370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:26:16,754-Speed 4951.75 samples/sec Loss 6.6506 LearningRate 0.2474 Epoch: 5 Global Step: 24380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:26:24,889-Speed 5034.98 samples/sec Loss 6.7408 LearningRate 0.2473 Epoch: 5 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:26:33,135-Speed 4968.39 samples/sec Loss 6.6876 LearningRate 0.2472 Epoch: 5 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:26:41,309-Speed 5012.02 samples/sec Loss 6.7092 LearningRate 0.2472 Epoch: 5 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:26:49,458-Speed 5026.75 samples/sec Loss 6.6362 LearningRate 0.2471 Epoch: 5 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:26:57,613-Speed 5023.36 samples/sec Loss 6.6464 LearningRate 0.2470 Epoch: 5 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:27:05,710-Speed 5059.80 samples/sec Loss 6.6347 LearningRate 0.2469 Epoch: 5 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:27:13,832-Speed 5043.97 samples/sec Loss 6.7548 LearningRate 0.2468 Epoch: 5 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:27:22,277-Speed 4850.85 samples/sec Loss 6.7458 LearningRate 0.2467 Epoch: 5 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:27:30,708-Speed 4858.58 samples/sec Loss 6.6735 LearningRate 0.2467 Epoch: 5 Global Step: 24470 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:27:38,928-Speed 4983.78 samples/sec Loss 6.6299 LearningRate 0.2466 Epoch: 5 Global Step: 24480 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:27:47,093-Speed 5017.03 samples/sec Loss 6.6737 LearningRate 0.2465 Epoch: 5 Global Step: 24490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:27:55,177-Speed 5067.46 samples/sec Loss 6.6969 LearningRate 0.2464 Epoch: 5 Global Step: 24500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:28:03,684-Speed 4815.52 samples/sec Loss 6.6725 LearningRate 0.2463 Epoch: 5 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:28:11,940-Speed 4961.90 samples/sec Loss 6.6764 LearningRate 0.2462 Epoch: 5 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:28:20,056-Speed 5047.34 samples/sec Loss 6.6269 LearningRate 0.2462 Epoch: 5 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:28:28,318-Speed 4958.36 samples/sec Loss 6.6700 LearningRate 0.2461 Epoch: 5 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:28:36,449-Speed 5038.31 samples/sec Loss 6.6627 LearningRate 0.2460 Epoch: 5 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:28:44,534-Speed 5067.15 samples/sec Loss 6.6845 LearningRate 0.2459 Epoch: 5 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:28:52,644-Speed 5051.16 samples/sec Loss 6.7454 LearningRate 0.2458 Epoch: 5 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:29:00,857-Speed 4987.61 samples/sec Loss 6.6894 LearningRate 0.2457 Epoch: 5 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:29:09,084-Speed 4979.46 samples/sec Loss 6.6613 LearningRate 0.2457 Epoch: 5 Global Step: 24590 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:29:17,263-Speed 5008.70 samples/sec Loss 6.5951 LearningRate 0.2456 Epoch: 5 Global Step: 24600 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:29:25,332-Speed 5076.78 samples/sec Loss 6.5818 LearningRate 0.2455 Epoch: 5 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:29:33,461-Speed 5039.44 samples/sec Loss 6.6496 LearningRate 0.2454 Epoch: 5 Global Step: 24620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:29:41,929-Speed 4837.61 samples/sec Loss 6.6801 LearningRate 0.2453 Epoch: 5 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:29:50,035-Speed 5053.63 samples/sec Loss 6.6601 LearningRate 0.2452 Epoch: 5 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:29:58,251-Speed 4986.63 samples/sec Loss 6.6611 LearningRate 0.2452 Epoch: 5 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:30:06,412-Speed 5019.48 samples/sec Loss 6.5638 LearningRate 0.2451 Epoch: 5 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:30:14,643-Speed 4977.01 samples/sec Loss 6.6820 LearningRate 0.2450 Epoch: 5 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:30:22,745-Speed 5056.09 samples/sec Loss 6.5748 LearningRate 0.2449 Epoch: 5 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:30:30,866-Speed 5044.38 samples/sec Loss 6.6880 LearningRate 0.2448 Epoch: 5 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:30:38,966-Speed 5057.47 samples/sec Loss 6.6264 LearningRate 0.2447 Epoch: 5 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:30:47,091-Speed 5041.90 samples/sec Loss 6.6550 LearningRate 0.2447 Epoch: 5 Global Step: 24710 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:30:55,187-Speed 5060.21 samples/sec Loss 6.6627 LearningRate 0.2446 Epoch: 5 Global Step: 24720 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:31:03,397-Speed 4989.76 samples/sec Loss 6.6498 LearningRate 0.2445 Epoch: 5 Global Step: 24730 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:31:11,500-Speed 5055.62 samples/sec Loss 6.6127 LearningRate 0.2444 Epoch: 5 Global Step: 24740 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:31:19,643-Speed 5030.82 samples/sec Loss 6.6639 LearningRate 0.2443 Epoch: 5 Global Step: 24750 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:31:27,761-Speed 5046.16 samples/sec Loss 6.5662 LearningRate 0.2442 Epoch: 5 Global Step: 24760 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:31:35,861-Speed 5057.80 samples/sec Loss 6.6743 LearningRate 0.2442 Epoch: 5 Global Step: 24770 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:31:44,021-Speed 5020.19 samples/sec Loss 6.6181 LearningRate 0.2441 Epoch: 5 Global Step: 24780 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:31:52,145-Speed 5042.05 samples/sec Loss 6.6135 LearningRate 0.2440 Epoch: 5 Global Step: 24790 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:32:00,203-Speed 5084.04 samples/sec Loss 6.6118 LearningRate 0.2439 Epoch: 5 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:32:08,316-Speed 5049.36 samples/sec Loss 6.6020 LearningRate 0.2438 Epoch: 5 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:32:16,574-Speed 4960.40 samples/sec Loss 6.5847 LearningRate 0.2437 Epoch: 5 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:32:25,041-Speed 4838.58 samples/sec Loss 6.5955 LearningRate 0.2437 Epoch: 5 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:32:33,171-Speed 5039.64 samples/sec Loss 6.5564 LearningRate 0.2436 Epoch: 5 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:32:41,304-Speed 5036.33 samples/sec Loss 6.5797 LearningRate 0.2435 Epoch: 5 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:32:49,419-Speed 5048.46 samples/sec Loss 6.6438 LearningRate 0.2434 Epoch: 5 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:32:57,517-Speed 5059.15 samples/sec Loss 6.5966 LearningRate 0.2433 Epoch: 5 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:33:05,641-Speed 5042.29 samples/sec Loss 6.7119 LearningRate 0.2432 Epoch: 5 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:33:13,787-Speed 5028.61 samples/sec Loss 6.6271 LearningRate 0.2432 Epoch: 5 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:33:21,882-Speed 5061.18 samples/sec Loss 6.5342 LearningRate 0.2431 Epoch: 5 Global Step: 24900 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:33:30,109-Speed 4979.34 samples/sec Loss 6.5872 LearningRate 0.2430 Epoch: 5 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:33:38,364-Speed 4962.56 samples/sec Loss 6.5839 LearningRate 0.2429 Epoch: 5 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:33:46,488-Speed 5042.34 samples/sec Loss 6.6147 LearningRate 0.2428 Epoch: 5 Global Step: 24930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:33:54,673-Speed 5005.21 samples/sec Loss 6.6127 LearningRate 0.2427 Epoch: 5 Global Step: 24940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:34:02,802-Speed 5039.55 samples/sec Loss 6.5534 LearningRate 0.2427 Epoch: 5 Global Step: 24950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:34:10,848-Speed 5091.46 samples/sec Loss 6.6305 LearningRate 0.2426 Epoch: 5 Global Step: 24960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:34:18,988-Speed 5032.16 samples/sec Loss 6.6703 LearningRate 0.2425 Epoch: 5 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:34:27,291-Speed 4934.37 samples/sec Loss 6.6363 LearningRate 0.2424 Epoch: 5 Global Step: 24980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:34:35,456-Speed 5017.07 samples/sec Loss 6.5383 LearningRate 0.2423 Epoch: 5 Global Step: 24990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:34:43,533-Speed 5071.82 samples/sec Loss 6.6299 LearningRate 0.2422 Epoch: 5 Global Step: 25000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:35:30,005-[lfw][25000]XNorm: 21.994490 Training: 2022-01-17 03:35:30,006-[lfw][25000]Accuracy-Flip: 0.99633+-0.00340 Training: 2022-01-17 03:35:30,006-[lfw][25000]Accuracy-Highest: 0.99750 Training: 2022-01-17 03:36:24,081-[cfp_fp][25000]XNorm: 19.121898 Training: 2022-01-17 03:36:24,082-[cfp_fp][25000]Accuracy-Flip: 0.96514+-0.00998 Training: 2022-01-17 03:36:24,082-[cfp_fp][25000]Accuracy-Highest: 0.96814 Training: 2022-01-17 03:37:10,812-[agedb_30][25000]XNorm: 21.522018 Training: 2022-01-17 03:37:10,813-[agedb_30][25000]Accuracy-Flip: 0.96950+-0.00972 Training: 2022-01-17 03:37:10,813-[agedb_30][25000]Accuracy-Highest: 0.97333 Training: 2022-01-17 03:37:18,972-Speed 263.51 samples/sec Loss 6.6657 LearningRate 0.2422 Epoch: 5 Global Step: 25010 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:37:27,025-Speed 5087.21 samples/sec Loss 6.5878 LearningRate 0.2421 Epoch: 5 Global Step: 25020 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:37:35,522-Speed 4821.39 samples/sec Loss 6.6158 LearningRate 0.2420 Epoch: 5 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:38:26,980-Speed 796.01 samples/sec Loss 6.2015 LearningRate 0.2419 Epoch: 6 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:38:35,197-Speed 4985.72 samples/sec Loss 6.0304 LearningRate 0.2418 Epoch: 6 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:38:43,394-Speed 4997.63 samples/sec Loss 6.0522 LearningRate 0.2417 Epoch: 6 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:38:51,623-Speed 4978.01 samples/sec Loss 6.0608 LearningRate 0.2417 Epoch: 6 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:38:59,846-Speed 4982.03 samples/sec Loss 6.1147 LearningRate 0.2416 Epoch: 6 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:39:07,999-Speed 5024.16 samples/sec Loss 6.0271 LearningRate 0.2415 Epoch: 6 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:39:16,224-Speed 4981.58 samples/sec Loss 6.0666 LearningRate 0.2414 Epoch: 6 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:39:24,542-Speed 4925.21 samples/sec Loss 6.1163 LearningRate 0.2413 Epoch: 6 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:39:32,938-Speed 4878.44 samples/sec Loss 6.1135 LearningRate 0.2412 Epoch: 6 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:39:41,456-Speed 4809.79 samples/sec Loss 6.1839 LearningRate 0.2412 Epoch: 6 Global Step: 25130 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:39:49,699-Speed 4969.52 samples/sec Loss 6.1453 LearningRate 0.2411 Epoch: 6 Global Step: 25140 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:39:57,843-Speed 5029.83 samples/sec Loss 6.2174 LearningRate 0.2410 Epoch: 6 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:40:05,979-Speed 5035.09 samples/sec Loss 6.2271 LearningRate 0.2409 Epoch: 6 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:40:14,051-Speed 5074.99 samples/sec Loss 6.2342 LearningRate 0.2408 Epoch: 6 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:40:22,159-Speed 5052.66 samples/sec Loss 6.2101 LearningRate 0.2408 Epoch: 6 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:40:30,508-Speed 4906.37 samples/sec Loss 6.2271 LearningRate 0.2407 Epoch: 6 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:40:38,909-Speed 4876.80 samples/sec Loss 6.2443 LearningRate 0.2406 Epoch: 6 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:40:46,987-Speed 5071.09 samples/sec Loss 6.2339 LearningRate 0.2405 Epoch: 6 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:40:55,037-Speed 5088.75 samples/sec Loss 6.2714 LearningRate 0.2404 Epoch: 6 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:41:03,077-Speed 5095.05 samples/sec Loss 6.2463 LearningRate 0.2403 Epoch: 6 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:41:11,172-Speed 5060.86 samples/sec Loss 6.3545 LearningRate 0.2403 Epoch: 6 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:41:19,341-Speed 5014.39 samples/sec Loss 6.2777 LearningRate 0.2402 Epoch: 6 Global Step: 25250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:41:27,415-Speed 5073.69 samples/sec Loss 6.2614 LearningRate 0.2401 Epoch: 6 Global Step: 25260 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:41:35,645-Speed 4977.64 samples/sec Loss 6.3337 LearningRate 0.2400 Epoch: 6 Global Step: 25270 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:41:43,820-Speed 5011.33 samples/sec Loss 6.3611 LearningRate 0.2399 Epoch: 6 Global Step: 25280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:41:51,935-Speed 5047.58 samples/sec Loss 6.3092 LearningRate 0.2398 Epoch: 6 Global Step: 25290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:00,006-Speed 5076.09 samples/sec Loss 6.2721 LearningRate 0.2398 Epoch: 6 Global Step: 25300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:08,184-Speed 5009.27 samples/sec Loss 6.3834 LearningRate 0.2397 Epoch: 6 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:16,338-Speed 5023.75 samples/sec Loss 6.3499 LearningRate 0.2396 Epoch: 6 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:24,398-Speed 5082.93 samples/sec Loss 6.3111 LearningRate 0.2395 Epoch: 6 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:32,558-Speed 5019.84 samples/sec Loss 6.3611 LearningRate 0.2394 Epoch: 6 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:40,644-Speed 5066.38 samples/sec Loss 6.4942 LearningRate 0.2393 Epoch: 6 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:48,730-Speed 5066.28 samples/sec Loss 6.3460 LearningRate 0.2393 Epoch: 6 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:42:56,891-Speed 5019.36 samples/sec Loss 6.3569 LearningRate 0.2392 Epoch: 6 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:43:04,961-Speed 5076.87 samples/sec Loss 6.3479 LearningRate 0.2391 Epoch: 6 Global Step: 25380 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:43:13,043-Speed 5068.19 samples/sec Loss 6.3806 LearningRate 0.2390 Epoch: 6 Global Step: 25390 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:43:21,135-Speed 5062.72 samples/sec Loss 6.4362 LearningRate 0.2389 Epoch: 6 Global Step: 25400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:43:29,266-Speed 5038.43 samples/sec Loss 6.4051 LearningRate 0.2389 Epoch: 6 Global Step: 25410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:43:37,380-Speed 5048.84 samples/sec Loss 6.4056 LearningRate 0.2388 Epoch: 6 Global Step: 25420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:43:45,485-Speed 5054.47 samples/sec Loss 6.3945 LearningRate 0.2387 Epoch: 6 Global Step: 25430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:43:53,674-Speed 5002.69 samples/sec Loss 6.3628 LearningRate 0.2386 Epoch: 6 Global Step: 25440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:44:01,786-Speed 5049.89 samples/sec Loss 6.4647 LearningRate 0.2385 Epoch: 6 Global Step: 25450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:44:09,883-Speed 5059.15 samples/sec Loss 6.3381 LearningRate 0.2384 Epoch: 6 Global Step: 25460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:44:17,972-Speed 5064.30 samples/sec Loss 6.4520 LearningRate 0.2384 Epoch: 6 Global Step: 25470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:44:26,106-Speed 5036.31 samples/sec Loss 6.4501 LearningRate 0.2383 Epoch: 6 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:44:34,229-Speed 5043.15 samples/sec Loss 6.4314 LearningRate 0.2382 Epoch: 6 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:44:42,317-Speed 5065.02 samples/sec Loss 6.4580 LearningRate 0.2381 Epoch: 6 Global Step: 25500 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:44:50,421-Speed 5055.09 samples/sec Loss 6.4700 LearningRate 0.2380 Epoch: 6 Global Step: 25510 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:44:58,505-Speed 5068.93 samples/sec Loss 6.4460 LearningRate 0.2379 Epoch: 6 Global Step: 25520 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:45:06,559-Speed 5086.41 samples/sec Loss 6.4207 LearningRate 0.2379 Epoch: 6 Global Step: 25530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:45:14,743-Speed 5005.49 samples/sec Loss 6.4963 LearningRate 0.2378 Epoch: 6 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:45:22,819-Speed 5072.30 samples/sec Loss 6.3909 LearningRate 0.2377 Epoch: 6 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:45:30,862-Speed 5093.42 samples/sec Loss 6.3598 LearningRate 0.2376 Epoch: 6 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:45:38,944-Speed 5068.39 samples/sec Loss 6.4200 LearningRate 0.2375 Epoch: 6 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:45:46,963-Speed 5108.93 samples/sec Loss 6.4263 LearningRate 0.2375 Epoch: 6 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:45:55,005-Speed 5094.12 samples/sec Loss 6.4189 LearningRate 0.2374 Epoch: 6 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:46:03,122-Speed 5046.84 samples/sec Loss 6.4061 LearningRate 0.2373 Epoch: 6 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:46:11,160-Speed 5096.36 samples/sec Loss 6.4541 LearningRate 0.2372 Epoch: 6 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:46:19,299-Speed 5032.85 samples/sec Loss 6.4409 LearningRate 0.2371 Epoch: 6 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:46:27,464-Speed 5017.10 samples/sec Loss 6.4483 LearningRate 0.2370 Epoch: 6 Global Step: 25630 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:46:35,631-Speed 5016.11 samples/sec Loss 6.4280 LearningRate 0.2370 Epoch: 6 Global Step: 25640 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:46:43,889-Speed 4961.01 samples/sec Loss 6.4546 LearningRate 0.2369 Epoch: 6 Global Step: 25650 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:46:52,072-Speed 5006.08 samples/sec Loss 6.4262 LearningRate 0.2368 Epoch: 6 Global Step: 25660 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:47:00,114-Speed 5093.63 samples/sec Loss 6.4503 LearningRate 0.2367 Epoch: 6 Global Step: 25670 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:47:08,359-Speed 4968.53 samples/sec Loss 6.3816 LearningRate 0.2366 Epoch: 6 Global Step: 25680 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:47:16,397-Speed 5096.73 samples/sec Loss 6.3917 LearningRate 0.2366 Epoch: 6 Global Step: 25690 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:47:24,661-Speed 4957.02 samples/sec Loss 6.4173 LearningRate 0.2365 Epoch: 6 Global Step: 25700 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:47:32,752-Speed 5062.99 samples/sec Loss 6.5085 LearningRate 0.2364 Epoch: 6 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:47:40,845-Speed 5061.69 samples/sec Loss 6.4535 LearningRate 0.2363 Epoch: 6 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:47:48,930-Speed 5066.90 samples/sec Loss 6.3640 LearningRate 0.2362 Epoch: 6 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:47:56,995-Speed 5079.59 samples/sec Loss 6.4581 LearningRate 0.2361 Epoch: 6 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:48:05,079-Speed 5067.00 samples/sec Loss 6.4414 LearningRate 0.2361 Epoch: 6 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:48:13,250-Speed 5014.11 samples/sec Loss 6.4562 LearningRate 0.2360 Epoch: 6 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:48:21,386-Speed 5034.96 samples/sec Loss 6.4382 LearningRate 0.2359 Epoch: 6 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:48:29,576-Speed 5002.07 samples/sec Loss 6.4369 LearningRate 0.2358 Epoch: 6 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:48:37,667-Speed 5062.78 samples/sec Loss 6.4602 LearningRate 0.2357 Epoch: 6 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:48:45,794-Speed 5040.41 samples/sec Loss 6.4442 LearningRate 0.2357 Epoch: 6 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:48:53,878-Speed 5068.03 samples/sec Loss 6.4579 LearningRate 0.2356 Epoch: 6 Global Step: 25810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:01,954-Speed 5073.16 samples/sec Loss 6.4637 LearningRate 0.2355 Epoch: 6 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:10,063-Speed 5051.73 samples/sec Loss 6.3870 LearningRate 0.2354 Epoch: 6 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:18,172-Speed 5052.03 samples/sec Loss 6.4243 LearningRate 0.2353 Epoch: 6 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:26,377-Speed 4992.16 samples/sec Loss 6.4626 LearningRate 0.2352 Epoch: 6 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:34,476-Speed 5058.15 samples/sec Loss 6.4608 LearningRate 0.2352 Epoch: 6 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:42,628-Speed 5025.49 samples/sec Loss 6.3905 LearningRate 0.2351 Epoch: 6 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:50,994-Speed 4896.49 samples/sec Loss 6.3710 LearningRate 0.2350 Epoch: 6 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:49:59,416-Speed 4864.44 samples/sec Loss 6.4712 LearningRate 0.2349 Epoch: 6 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:50:07,558-Speed 5031.51 samples/sec Loss 6.3894 LearningRate 0.2348 Epoch: 6 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:50:15,807-Speed 4966.16 samples/sec Loss 6.5015 LearningRate 0.2348 Epoch: 6 Global Step: 25910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:50:24,053-Speed 4967.37 samples/sec Loss 6.4487 LearningRate 0.2347 Epoch: 6 Global Step: 25920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:50:32,254-Speed 4995.56 samples/sec Loss 6.4575 LearningRate 0.2346 Epoch: 6 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:50:40,374-Speed 5045.14 samples/sec Loss 6.5064 LearningRate 0.2345 Epoch: 6 Global Step: 25940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:50:48,497-Speed 5043.02 samples/sec Loss 6.4747 LearningRate 0.2344 Epoch: 6 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:50:56,621-Speed 5042.71 samples/sec Loss 6.4111 LearningRate 0.2343 Epoch: 6 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:51:04,747-Speed 5041.05 samples/sec Loss 6.4428 LearningRate 0.2343 Epoch: 6 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:51:13,111-Speed 4898.30 samples/sec Loss 6.4005 LearningRate 0.2342 Epoch: 6 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:51:21,455-Speed 4909.16 samples/sec Loss 6.4304 LearningRate 0.2341 Epoch: 6 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:51:29,582-Speed 5040.33 samples/sec Loss 6.4695 LearningRate 0.2340 Epoch: 6 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:51:37,846-Speed 4957.57 samples/sec Loss 6.3671 LearningRate 0.2339 Epoch: 6 Global Step: 26010 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:51:45,948-Speed 5056.18 samples/sec Loss 6.3886 LearningRate 0.2339 Epoch: 6 Global Step: 26020 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:51:54,034-Speed 5066.52 samples/sec Loss 6.4710 LearningRate 0.2338 Epoch: 6 Global Step: 26030 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:52:02,099-Speed 5079.68 samples/sec Loss 6.3863 LearningRate 0.2337 Epoch: 6 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:52:10,216-Speed 5046.76 samples/sec Loss 6.4653 LearningRate 0.2336 Epoch: 6 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:52:18,330-Speed 5048.39 samples/sec Loss 6.4338 LearningRate 0.2335 Epoch: 6 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:52:26,489-Speed 5020.91 samples/sec Loss 6.4439 LearningRate 0.2335 Epoch: 6 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:52:34,745-Speed 4962.02 samples/sec Loss 6.4191 LearningRate 0.2334 Epoch: 6 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:52:42,815-Speed 5075.98 samples/sec Loss 6.4628 LearningRate 0.2333 Epoch: 6 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:52:50,882-Speed 5078.26 samples/sec Loss 6.4466 LearningRate 0.2332 Epoch: 6 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:52:58,962-Speed 5069.77 samples/sec Loss 6.4408 LearningRate 0.2331 Epoch: 6 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:53:07,035-Speed 5074.40 samples/sec Loss 6.4101 LearningRate 0.2330 Epoch: 6 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:53:15,119-Speed 5068.35 samples/sec Loss 6.4464 LearningRate 0.2330 Epoch: 6 Global Step: 26130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:53:23,182-Speed 5080.22 samples/sec Loss 6.3724 LearningRate 0.2329 Epoch: 6 Global Step: 26140 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:53:31,499-Speed 4925.85 samples/sec Loss 6.4178 LearningRate 0.2328 Epoch: 6 Global Step: 26150 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:53:39,637-Speed 5033.78 samples/sec Loss 6.4445 LearningRate 0.2327 Epoch: 6 Global Step: 26160 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:53:47,674-Speed 5096.98 samples/sec Loss 6.4597 LearningRate 0.2326 Epoch: 6 Global Step: 26170 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:53:55,781-Speed 5053.12 samples/sec Loss 6.4515 LearningRate 0.2326 Epoch: 6 Global Step: 26180 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:54:03,871-Speed 5063.89 samples/sec Loss 6.4254 LearningRate 0.2325 Epoch: 6 Global Step: 26190 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:54:11,910-Speed 5095.40 samples/sec Loss 6.3729 LearningRate 0.2324 Epoch: 6 Global Step: 26200 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:54:20,035-Speed 5042.63 samples/sec Loss 6.4081 LearningRate 0.2323 Epoch: 6 Global Step: 26210 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:54:28,063-Speed 5102.46 samples/sec Loss 6.4328 LearningRate 0.2322 Epoch: 6 Global Step: 26220 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:54:36,202-Speed 5033.38 samples/sec Loss 6.4215 LearningRate 0.2322 Epoch: 6 Global Step: 26230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:54:44,370-Speed 5015.80 samples/sec Loss 6.3931 LearningRate 0.2321 Epoch: 6 Global Step: 26240 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:54:52,434-Speed 5080.06 samples/sec Loss 6.3791 LearningRate 0.2320 Epoch: 6 Global Step: 26250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:55:00,515-Speed 5069.00 samples/sec Loss 6.4388 LearningRate 0.2319 Epoch: 6 Global Step: 26260 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:55:08,720-Speed 4993.19 samples/sec Loss 6.4182 LearningRate 0.2318 Epoch: 6 Global Step: 26270 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:55:16,827-Speed 5053.17 samples/sec Loss 6.4019 LearningRate 0.2317 Epoch: 6 Global Step: 26280 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:55:25,092-Speed 4956.71 samples/sec Loss 6.4242 LearningRate 0.2317 Epoch: 6 Global Step: 26290 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:55:33,252-Speed 5019.93 samples/sec Loss 6.4282 LearningRate 0.2316 Epoch: 6 Global Step: 26300 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:55:41,463-Speed 4989.14 samples/sec Loss 6.4366 LearningRate 0.2315 Epoch: 6 Global Step: 26310 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:55:49,626-Speed 5018.40 samples/sec Loss 6.4944 LearningRate 0.2314 Epoch: 6 Global Step: 26320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:55:57,850-Speed 4980.85 samples/sec Loss 6.6269 LearningRate 0.2313 Epoch: 6 Global Step: 26330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:56:06,163-Speed 4928.07 samples/sec Loss 6.5132 LearningRate 0.2313 Epoch: 6 Global Step: 26340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:56:14,344-Speed 5007.16 samples/sec Loss 6.4388 LearningRate 0.2312 Epoch: 6 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:56:22,515-Speed 5013.47 samples/sec Loss 6.4157 LearningRate 0.2311 Epoch: 6 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:56:30,535-Speed 5107.67 samples/sec Loss 6.4111 LearningRate 0.2310 Epoch: 6 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:56:38,647-Speed 5050.01 samples/sec Loss 6.4201 LearningRate 0.2309 Epoch: 6 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:56:46,773-Speed 5041.86 samples/sec Loss 6.3471 LearningRate 0.2309 Epoch: 6 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:56:54,944-Speed 5013.21 samples/sec Loss 6.3103 LearningRate 0.2308 Epoch: 6 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:57:03,153-Speed 4990.10 samples/sec Loss 6.3799 LearningRate 0.2307 Epoch: 6 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:57:11,357-Speed 4993.53 samples/sec Loss 6.3994 LearningRate 0.2306 Epoch: 6 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:57:19,628-Speed 4952.54 samples/sec Loss 6.3121 LearningRate 0.2305 Epoch: 6 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-17 03:57:27,771-Speed 5031.37 samples/sec Loss 6.3743 LearningRate 0.2304 Epoch: 6 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:57:35,858-Speed 5065.75 samples/sec Loss 6.4105 LearningRate 0.2304 Epoch: 6 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:57:44,090-Speed 4975.69 samples/sec Loss 6.3349 LearningRate 0.2303 Epoch: 6 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:57:52,373-Speed 4945.97 samples/sec Loss 6.3554 LearningRate 0.2302 Epoch: 6 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:58:00,498-Speed 5042.12 samples/sec Loss 6.3860 LearningRate 0.2301 Epoch: 6 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:58:08,677-Speed 5008.42 samples/sec Loss 6.4303 LearningRate 0.2300 Epoch: 6 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:58:16,784-Speed 5053.11 samples/sec Loss 6.3910 LearningRate 0.2300 Epoch: 6 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:58:24,998-Speed 4987.48 samples/sec Loss 6.3963 LearningRate 0.2299 Epoch: 6 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:58:33,107-Speed 5051.61 samples/sec Loss 6.3629 LearningRate 0.2298 Epoch: 6 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:58:41,197-Speed 5063.53 samples/sec Loss 6.3293 LearningRate 0.2297 Epoch: 6 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 03:58:49,376-Speed 5008.63 samples/sec Loss 6.3656 LearningRate 0.2296 Epoch: 6 Global Step: 26540 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:58:57,523-Speed 5028.49 samples/sec Loss 6.4008 LearningRate 0.2296 Epoch: 6 Global Step: 26550 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:59:05,686-Speed 5018.23 samples/sec Loss 6.3642 LearningRate 0.2295 Epoch: 6 Global Step: 26560 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:59:13,877-Speed 5001.18 samples/sec Loss 6.4567 LearningRate 0.2294 Epoch: 6 Global Step: 26570 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:59:21,991-Speed 5048.88 samples/sec Loss 6.3928 LearningRate 0.2293 Epoch: 6 Global Step: 26580 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:59:30,113-Speed 5044.31 samples/sec Loss 6.4402 LearningRate 0.2292 Epoch: 6 Global Step: 26590 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:59:38,294-Speed 5006.99 samples/sec Loss 6.3510 LearningRate 0.2292 Epoch: 6 Global Step: 26600 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:59:46,379-Speed 5066.77 samples/sec Loss 6.3673 LearningRate 0.2291 Epoch: 6 Global Step: 26610 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 03:59:54,573-Speed 4999.83 samples/sec Loss 6.4080 LearningRate 0.2290 Epoch: 6 Global Step: 26620 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:00:02,820-Speed 4966.99 samples/sec Loss 6.4107 LearningRate 0.2289 Epoch: 6 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:00:10,962-Speed 5031.57 samples/sec Loss 6.3612 LearningRate 0.2288 Epoch: 6 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:00:19,117-Speed 5023.18 samples/sec Loss 6.3783 LearningRate 0.2288 Epoch: 6 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:00:27,296-Speed 5008.50 samples/sec Loss 6.4037 LearningRate 0.2287 Epoch: 6 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:00:35,524-Speed 4979.11 samples/sec Loss 6.3511 LearningRate 0.2286 Epoch: 6 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:00:43,704-Speed 5007.77 samples/sec Loss 6.3701 LearningRate 0.2285 Epoch: 6 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:00:51,861-Speed 5021.99 samples/sec Loss 6.4030 LearningRate 0.2284 Epoch: 6 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:01:00,002-Speed 5031.97 samples/sec Loss 6.3935 LearningRate 0.2284 Epoch: 6 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:01:08,095-Speed 5061.88 samples/sec Loss 6.3461 LearningRate 0.2283 Epoch: 6 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:01:16,250-Speed 5023.66 samples/sec Loss 6.3098 LearningRate 0.2282 Epoch: 6 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:01:24,407-Speed 5021.94 samples/sec Loss 6.3556 LearningRate 0.2281 Epoch: 6 Global Step: 26730 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:01:32,735-Speed 4919.15 samples/sec Loss 6.3380 LearningRate 0.2280 Epoch: 6 Global Step: 26740 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:01:40,869-Speed 5036.42 samples/sec Loss 6.3129 LearningRate 0.2279 Epoch: 6 Global Step: 26750 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:01:48,970-Speed 5056.47 samples/sec Loss 6.3342 LearningRate 0.2279 Epoch: 6 Global Step: 26760 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:01:57,082-Speed 5050.01 samples/sec Loss 6.2873 LearningRate 0.2278 Epoch: 6 Global Step: 26770 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:02:05,264-Speed 5007.14 samples/sec Loss 6.3445 LearningRate 0.2277 Epoch: 6 Global Step: 26780 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:02:13,400-Speed 5034.75 samples/sec Loss 6.3074 LearningRate 0.2276 Epoch: 6 Global Step: 26790 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:02:21,519-Speed 5045.79 samples/sec Loss 6.3696 LearningRate 0.2275 Epoch: 6 Global Step: 26800 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:02:29,636-Speed 5046.99 samples/sec Loss 6.3692 LearningRate 0.2275 Epoch: 6 Global Step: 26810 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:02:38,044-Speed 4872.12 samples/sec Loss 6.3387 LearningRate 0.2274 Epoch: 6 Global Step: 26820 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:02:46,455-Speed 4870.52 samples/sec Loss 6.4147 LearningRate 0.2273 Epoch: 6 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:02:54,918-Speed 4840.44 samples/sec Loss 6.3720 LearningRate 0.2272 Epoch: 6 Global Step: 26840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:03,055-Speed 5034.49 samples/sec Loss 6.3599 LearningRate 0.2271 Epoch: 6 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:11,239-Speed 5005.35 samples/sec Loss 6.3348 LearningRate 0.2271 Epoch: 6 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:19,346-Speed 5053.26 samples/sec Loss 6.3868 LearningRate 0.2270 Epoch: 6 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:27,489-Speed 5031.14 samples/sec Loss 6.3167 LearningRate 0.2269 Epoch: 6 Global Step: 26880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:35,672-Speed 5005.34 samples/sec Loss 6.3269 LearningRate 0.2268 Epoch: 6 Global Step: 26890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:43,792-Speed 5045.67 samples/sec Loss 6.3535 LearningRate 0.2267 Epoch: 6 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:51,934-Speed 5030.87 samples/sec Loss 6.2665 LearningRate 0.2267 Epoch: 6 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:03:59,986-Speed 5087.76 samples/sec Loss 6.3090 LearningRate 0.2266 Epoch: 6 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:04:08,054-Speed 5077.39 samples/sec Loss 6.2920 LearningRate 0.2265 Epoch: 6 Global Step: 26930 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-17 04:04:16,081-Speed 5103.81 samples/sec Loss 6.2353 LearningRate 0.2264 Epoch: 6 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:04:24,251-Speed 5014.06 samples/sec Loss 6.3123 LearningRate 0.2263 Epoch: 6 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:04:32,407-Speed 5022.48 samples/sec Loss 6.2329 LearningRate 0.2263 Epoch: 6 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:04:40,555-Speed 5027.67 samples/sec Loss 6.3180 LearningRate 0.2262 Epoch: 6 Global Step: 26970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:04:48,732-Speed 5010.59 samples/sec Loss 6.3764 LearningRate 0.2261 Epoch: 6 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:04:56,926-Speed 4999.00 samples/sec Loss 6.2913 LearningRate 0.2260 Epoch: 6 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:05:05,022-Speed 5059.81 samples/sec Loss 6.2402 LearningRate 0.2259 Epoch: 6 Global Step: 27000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:05:13,126-Speed 5055.20 samples/sec Loss 6.2743 LearningRate 0.2259 Epoch: 6 Global Step: 27010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:05:21,289-Speed 5018.24 samples/sec Loss 6.3396 LearningRate 0.2258 Epoch: 6 Global Step: 27020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:05:29,509-Speed 4983.18 samples/sec Loss 6.3228 LearningRate 0.2257 Epoch: 6 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-17 04:05:37,664-Speed 5023.39 samples/sec Loss 6.3547 LearningRate 0.2256 Epoch: 6 Global Step: 27040 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:05:45,948-Speed 4945.09 samples/sec Loss 6.2888 LearningRate 0.2255 Epoch: 6 Global Step: 27050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:05:54,283-Speed 4914.89 samples/sec Loss 6.3358 LearningRate 0.2255 Epoch: 6 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:02,484-Speed 4995.19 samples/sec Loss 6.2592 LearningRate 0.2254 Epoch: 6 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:10,714-Speed 4977.56 samples/sec Loss 6.3037 LearningRate 0.2253 Epoch: 6 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:18,891-Speed 5010.11 samples/sec Loss 6.3321 LearningRate 0.2252 Epoch: 6 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:27,054-Speed 5018.19 samples/sec Loss 6.3081 LearningRate 0.2251 Epoch: 6 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:35,132-Speed 5071.16 samples/sec Loss 6.3081 LearningRate 0.2251 Epoch: 6 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:43,299-Speed 5015.84 samples/sec Loss 6.2408 LearningRate 0.2250 Epoch: 6 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:51,501-Speed 4995.24 samples/sec Loss 6.3174 LearningRate 0.2249 Epoch: 6 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:06:59,681-Speed 5007.92 samples/sec Loss 6.2574 LearningRate 0.2248 Epoch: 6 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:07:07,873-Speed 5000.28 samples/sec Loss 6.2497 LearningRate 0.2247 Epoch: 6 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:07:16,031-Speed 5021.90 samples/sec Loss 6.2675 LearningRate 0.2247 Epoch: 6 Global Step: 27160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:07:24,092-Speed 5081.61 samples/sec Loss 6.2711 LearningRate 0.2246 Epoch: 6 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:07:32,147-Speed 5085.77 samples/sec Loss 6.2339 LearningRate 0.2245 Epoch: 6 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:07:40,286-Speed 5033.58 samples/sec Loss 6.2623 LearningRate 0.2244 Epoch: 6 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:07:48,459-Speed 5012.33 samples/sec Loss 6.2919 LearningRate 0.2243 Epoch: 6 Global Step: 27200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:07:56,621-Speed 5019.56 samples/sec Loss 6.2142 LearningRate 0.2243 Epoch: 6 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:08:04,902-Speed 4946.53 samples/sec Loss 6.2532 LearningRate 0.2242 Epoch: 6 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:08:13,160-Speed 4960.61 samples/sec Loss 6.2923 LearningRate 0.2241 Epoch: 6 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:08:21,313-Speed 5024.69 samples/sec Loss 6.2271 LearningRate 0.2240 Epoch: 6 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:08:29,480-Speed 5015.82 samples/sec Loss 6.2708 LearningRate 0.2239 Epoch: 6 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:08:37,649-Speed 5014.83 samples/sec Loss 6.3323 LearningRate 0.2239 Epoch: 6 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:08:45,788-Speed 5033.17 samples/sec Loss 6.2553 LearningRate 0.2238 Epoch: 6 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:08:53,917-Speed 5039.48 samples/sec Loss 6.2192 LearningRate 0.2237 Epoch: 6 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:09:02,042-Speed 5041.79 samples/sec Loss 6.2366 LearningRate 0.2236 Epoch: 6 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:09:10,341-Speed 4935.97 samples/sec Loss 6.2499 LearningRate 0.2235 Epoch: 6 Global Step: 27300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:09:18,495-Speed 5024.59 samples/sec Loss 6.3130 LearningRate 0.2235 Epoch: 6 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:09:26,681-Speed 5004.33 samples/sec Loss 6.2658 LearningRate 0.2234 Epoch: 6 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:09:34,799-Speed 5046.12 samples/sec Loss 6.2710 LearningRate 0.2233 Epoch: 6 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:09:43,132-Speed 4916.24 samples/sec Loss 6.2732 LearningRate 0.2232 Epoch: 6 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:09:51,673-Speed 4796.12 samples/sec Loss 6.2021 LearningRate 0.2232 Epoch: 6 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:10:00,407-Speed 4690.56 samples/sec Loss 6.2241 LearningRate 0.2231 Epoch: 6 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:10:09,189-Speed 4664.61 samples/sec Loss 6.2514 LearningRate 0.2230 Epoch: 6 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:10:17,234-Speed 5091.50 samples/sec Loss 6.2874 LearningRate 0.2229 Epoch: 6 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:10:25,431-Speed 4997.94 samples/sec Loss 6.2304 LearningRate 0.2228 Epoch: 6 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:10:33,633-Speed 4994.94 samples/sec Loss 6.2589 LearningRate 0.2228 Epoch: 6 Global Step: 27400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:10:41,902-Speed 4953.64 samples/sec Loss 6.2468 LearningRate 0.2227 Epoch: 6 Global Step: 27410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:10:50,676-Speed 4668.63 samples/sec Loss 6.2532 LearningRate 0.2226 Epoch: 6 Global Step: 27420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:10:59,187-Speed 4813.60 samples/sec Loss 6.2454 LearningRate 0.2225 Epoch: 6 Global Step: 27430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:11:07,286-Speed 5057.71 samples/sec Loss 6.2711 LearningRate 0.2224 Epoch: 6 Global Step: 27440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:11:15,432-Speed 5029.16 samples/sec Loss 6.2297 LearningRate 0.2224 Epoch: 6 Global Step: 27450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:11:23,495-Speed 5080.71 samples/sec Loss 6.2270 LearningRate 0.2223 Epoch: 6 Global Step: 27460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:11:31,585-Speed 5063.87 samples/sec Loss 6.2142 LearningRate 0.2222 Epoch: 6 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:11:39,768-Speed 5005.94 samples/sec Loss 6.1765 LearningRate 0.2221 Epoch: 6 Global Step: 27480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:11:47,851-Speed 5067.97 samples/sec Loss 6.2296 LearningRate 0.2220 Epoch: 6 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:11:55,934-Speed 5068.35 samples/sec Loss 6.2894 LearningRate 0.2220 Epoch: 6 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:12:04,029-Speed 5061.02 samples/sec Loss 6.2723 LearningRate 0.2219 Epoch: 6 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:12:12,140-Speed 5050.46 samples/sec Loss 6.2442 LearningRate 0.2218 Epoch: 6 Global Step: 27520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:12:20,276-Speed 5035.43 samples/sec Loss 6.2524 LearningRate 0.2217 Epoch: 6 Global Step: 27530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:12:28,404-Speed 5040.09 samples/sec Loss 6.1383 LearningRate 0.2216 Epoch: 6 Global Step: 27540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:12:36,564-Speed 5019.84 samples/sec Loss 6.2462 LearningRate 0.2216 Epoch: 6 Global Step: 27550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:12:44,809-Speed 4968.44 samples/sec Loss 6.2284 LearningRate 0.2215 Epoch: 6 Global Step: 27560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:12:52,960-Speed 5025.79 samples/sec Loss 6.2125 LearningRate 0.2214 Epoch: 6 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:01,043-Speed 5068.19 samples/sec Loss 6.2211 LearningRate 0.2213 Epoch: 6 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:09,448-Speed 4874.47 samples/sec Loss 6.2448 LearningRate 0.2212 Epoch: 6 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:17,880-Speed 4857.66 samples/sec Loss 6.2848 LearningRate 0.2212 Epoch: 6 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:25,989-Speed 5052.45 samples/sec Loss 6.2910 LearningRate 0.2211 Epoch: 6 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:34,055-Speed 5078.02 samples/sec Loss 6.1798 LearningRate 0.2210 Epoch: 6 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:42,165-Speed 5051.75 samples/sec Loss 6.2191 LearningRate 0.2209 Epoch: 6 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:50,309-Speed 5030.29 samples/sec Loss 6.2503 LearningRate 0.2208 Epoch: 6 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:13:58,504-Speed 4998.34 samples/sec Loss 6.1756 LearningRate 0.2208 Epoch: 6 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:14:06,709-Speed 4992.88 samples/sec Loss 6.1893 LearningRate 0.2207 Epoch: 6 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:14:14,804-Speed 5060.88 samples/sec Loss 6.1023 LearningRate 0.2206 Epoch: 6 Global Step: 27670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:14:22,965-Speed 5019.31 samples/sec Loss 6.0922 LearningRate 0.2205 Epoch: 6 Global Step: 27680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:14:31,072-Speed 5053.25 samples/sec Loss 6.2165 LearningRate 0.2205 Epoch: 6 Global Step: 27690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:14:39,264-Speed 5000.93 samples/sec Loss 6.1789 LearningRate 0.2204 Epoch: 6 Global Step: 27700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:14:47,691-Speed 4861.03 samples/sec Loss 6.1811 LearningRate 0.2203 Epoch: 6 Global Step: 27710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:14:55,817-Speed 5041.40 samples/sec Loss 6.2378 LearningRate 0.2202 Epoch: 6 Global Step: 27720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:15:03,947-Speed 5038.40 samples/sec Loss 6.2152 LearningRate 0.2201 Epoch: 6 Global Step: 27730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:15:12,109-Speed 5018.97 samples/sec Loss 6.1562 LearningRate 0.2201 Epoch: 6 Global Step: 27740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:15:20,311-Speed 4994.86 samples/sec Loss 6.1932 LearningRate 0.2200 Epoch: 6 Global Step: 27750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:15:28,503-Speed 5000.37 samples/sec Loss 6.2277 LearningRate 0.2199 Epoch: 6 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:15:36,618-Speed 5048.59 samples/sec Loss 6.2145 LearningRate 0.2198 Epoch: 6 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:15:44,795-Speed 5009.38 samples/sec Loss 6.2214 LearningRate 0.2197 Epoch: 6 Global Step: 27780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:15:52,902-Speed 5053.08 samples/sec Loss 6.1783 LearningRate 0.2197 Epoch: 6 Global Step: 27790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:16:01,092-Speed 5002.15 samples/sec Loss 6.1965 LearningRate 0.2196 Epoch: 6 Global Step: 27800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:16:09,265-Speed 5012.43 samples/sec Loss 6.1711 LearningRate 0.2195 Epoch: 6 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:16:17,386-Speed 5044.86 samples/sec Loss 6.1687 LearningRate 0.2194 Epoch: 6 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:16:25,509-Speed 5042.41 samples/sec Loss 6.2227 LearningRate 0.2193 Epoch: 6 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:16:33,791-Speed 4946.92 samples/sec Loss 6.2083 LearningRate 0.2193 Epoch: 6 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:16:41,886-Speed 5060.07 samples/sec Loss 6.1774 LearningRate 0.2192 Epoch: 6 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:16:50,035-Speed 5027.32 samples/sec Loss 6.1531 LearningRate 0.2191 Epoch: 6 Global Step: 27860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:16:58,116-Speed 5069.70 samples/sec Loss 6.1673 LearningRate 0.2190 Epoch: 6 Global Step: 27870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:17:06,151-Speed 5097.65 samples/sec Loss 6.2164 LearningRate 0.2190 Epoch: 6 Global Step: 27880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:17:14,194-Speed 5093.47 samples/sec Loss 6.0932 LearningRate 0.2189 Epoch: 6 Global Step: 27890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:17:22,335-Speed 5032.30 samples/sec Loss 6.1613 LearningRate 0.2188 Epoch: 6 Global Step: 27900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:17:30,397-Speed 5081.08 samples/sec Loss 6.1583 LearningRate 0.2187 Epoch: 6 Global Step: 27910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:17:38,857-Speed 4842.66 samples/sec Loss 6.1114 LearningRate 0.2186 Epoch: 6 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:17:47,584-Speed 4693.82 samples/sec Loss 6.1608 LearningRate 0.2186 Epoch: 6 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:17:55,841-Speed 4961.11 samples/sec Loss 6.1398 LearningRate 0.2185 Epoch: 6 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:18:03,979-Speed 5034.00 samples/sec Loss 6.1771 LearningRate 0.2184 Epoch: 6 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:18:12,538-Speed 4785.84 samples/sec Loss 6.1082 LearningRate 0.2183 Epoch: 6 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:18:21,216-Speed 4721.10 samples/sec Loss 6.2476 LearningRate 0.2182 Epoch: 6 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:18:29,320-Speed 5055.09 samples/sec Loss 6.1851 LearningRate 0.2182 Epoch: 6 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:18:37,384-Speed 5079.82 samples/sec Loss 6.1895 LearningRate 0.2181 Epoch: 6 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:18:45,564-Speed 5008.24 samples/sec Loss 6.1555 LearningRate 0.2180 Epoch: 6 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:18:53,721-Speed 5022.55 samples/sec Loss 6.2338 LearningRate 0.2179 Epoch: 6 Global Step: 28010 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:19:01,991-Speed 4953.30 samples/sec Loss 6.1613 LearningRate 0.2179 Epoch: 6 Global Step: 28020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:19:10,110-Speed 5045.48 samples/sec Loss 6.0640 LearningRate 0.2178 Epoch: 6 Global Step: 28030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:19:18,250-Speed 5032.32 samples/sec Loss 6.1622 LearningRate 0.2177 Epoch: 6 Global Step: 28040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:19:26,385-Speed 5107.40 samples/sec Loss 6.1486 LearningRate 0.2176 Epoch: 6 Global Step: 28050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:19:34,481-Speed 5059.98 samples/sec Loss 6.2066 LearningRate 0.2175 Epoch: 6 Global Step: 28060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:19:42,649-Speed 5015.31 samples/sec Loss 6.1366 LearningRate 0.2175 Epoch: 6 Global Step: 28070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:19:50,775-Speed 5041.33 samples/sec Loss 6.1321 LearningRate 0.2174 Epoch: 6 Global Step: 28080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:19:58,930-Speed 5023.66 samples/sec Loss 6.0965 LearningRate 0.2173 Epoch: 6 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:20:07,071-Speed 5032.03 samples/sec Loss 6.1667 LearningRate 0.2172 Epoch: 6 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:20:15,243-Speed 5012.91 samples/sec Loss 6.1090 LearningRate 0.2171 Epoch: 6 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:20:23,328-Speed 5066.54 samples/sec Loss 6.2429 LearningRate 0.2171 Epoch: 6 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:20:31,391-Speed 5080.98 samples/sec Loss 6.1131 LearningRate 0.2170 Epoch: 6 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:20:39,478-Speed 5065.76 samples/sec Loss 6.1317 LearningRate 0.2169 Epoch: 6 Global Step: 28140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:20:47,624-Speed 5028.91 samples/sec Loss 6.0964 LearningRate 0.2168 Epoch: 6 Global Step: 28150 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:20:55,705-Speed 5069.24 samples/sec Loss 6.1549 LearningRate 0.2168 Epoch: 6 Global Step: 28160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:21:03,811-Speed 5054.01 samples/sec Loss 6.0954 LearningRate 0.2167 Epoch: 6 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:21:11,915-Speed 5055.02 samples/sec Loss 6.1627 LearningRate 0.2166 Epoch: 6 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:21:20,062-Speed 5028.63 samples/sec Loss 6.1157 LearningRate 0.2165 Epoch: 6 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:21:28,202-Speed 5032.38 samples/sec Loss 6.1686 LearningRate 0.2164 Epoch: 6 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:21:36,375-Speed 5012.62 samples/sec Loss 6.1339 LearningRate 0.2164 Epoch: 6 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:21:44,470-Speed 5060.11 samples/sec Loss 6.1023 LearningRate 0.2163 Epoch: 6 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:21:52,578-Speed 5052.15 samples/sec Loss 6.1608 LearningRate 0.2162 Epoch: 6 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:22:00,744-Speed 5016.91 samples/sec Loss 6.1643 LearningRate 0.2161 Epoch: 6 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:22:08,894-Speed 5026.49 samples/sec Loss 6.1024 LearningRate 0.2160 Epoch: 6 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:22:17,031-Speed 5034.61 samples/sec Loss 6.1348 LearningRate 0.2160 Epoch: 6 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:22:25,129-Speed 5058.83 samples/sec Loss 6.1046 LearningRate 0.2159 Epoch: 6 Global Step: 28270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:22:33,272-Speed 5030.65 samples/sec Loss 6.1749 LearningRate 0.2158 Epoch: 6 Global Step: 28280 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:22:41,371-Speed 5058.45 samples/sec Loss 6.1196 LearningRate 0.2157 Epoch: 6 Global Step: 28290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:22:49,479-Speed 5052.58 samples/sec Loss 6.1026 LearningRate 0.2157 Epoch: 6 Global Step: 28300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:22:57,606-Speed 5040.41 samples/sec Loss 6.0868 LearningRate 0.2156 Epoch: 6 Global Step: 28310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:23:05,771-Speed 5017.43 samples/sec Loss 6.1364 LearningRate 0.2155 Epoch: 6 Global Step: 28320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:23:13,876-Speed 5053.65 samples/sec Loss 6.0903 LearningRate 0.2154 Epoch: 6 Global Step: 28330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:23:21,964-Speed 5065.27 samples/sec Loss 6.1336 LearningRate 0.2153 Epoch: 6 Global Step: 28340 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:23:30,026-Speed 5081.04 samples/sec Loss 6.0859 LearningRate 0.2153 Epoch: 6 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:23:38,132-Speed 5054.09 samples/sec Loss 6.1568 LearningRate 0.2152 Epoch: 6 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:23:46,249-Speed 5046.93 samples/sec Loss 6.1106 LearningRate 0.2151 Epoch: 6 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:23:54,409-Speed 5020.21 samples/sec Loss 6.0429 LearningRate 0.2150 Epoch: 6 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:24:02,560-Speed 5025.54 samples/sec Loss 6.0441 LearningRate 0.2150 Epoch: 6 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:24:10,758-Speed 4997.61 samples/sec Loss 6.1050 LearningRate 0.2149 Epoch: 6 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:24:18,885-Speed 5040.23 samples/sec Loss 6.0481 LearningRate 0.2148 Epoch: 6 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:24:27,028-Speed 5030.87 samples/sec Loss 6.0600 LearningRate 0.2147 Epoch: 6 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:24:35,150-Speed 5044.02 samples/sec Loss 6.0829 LearningRate 0.2146 Epoch: 6 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:24:43,261-Speed 5050.51 samples/sec Loss 6.0772 LearningRate 0.2146 Epoch: 6 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:24:51,385-Speed 5042.09 samples/sec Loss 6.0982 LearningRate 0.2145 Epoch: 6 Global Step: 28450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:24:59,631-Speed 4967.99 samples/sec Loss 6.1159 LearningRate 0.2144 Epoch: 6 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:25:07,735-Speed 5054.73 samples/sec Loss 6.0882 LearningRate 0.2143 Epoch: 6 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:25:15,905-Speed 5014.57 samples/sec Loss 6.1143 LearningRate 0.2142 Epoch: 6 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:25:24,505-Speed 4763.20 samples/sec Loss 5.9418 LearningRate 0.2142 Epoch: 6 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:25:32,728-Speed 4981.79 samples/sec Loss 6.0729 LearningRate 0.2141 Epoch: 6 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:25:40,892-Speed 5017.84 samples/sec Loss 6.1166 LearningRate 0.2140 Epoch: 6 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:25:48,994-Speed 5056.05 samples/sec Loss 6.1569 LearningRate 0.2139 Epoch: 6 Global Step: 28520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:25:57,384-Speed 4882.53 samples/sec Loss 6.0617 LearningRate 0.2139 Epoch: 6 Global Step: 28530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:26:05,468-Speed 5067.62 samples/sec Loss 6.0753 LearningRate 0.2138 Epoch: 6 Global Step: 28540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:26:13,898-Speed 4859.52 samples/sec Loss 6.0451 LearningRate 0.2137 Epoch: 6 Global Step: 28550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:26:22,108-Speed 4990.06 samples/sec Loss 6.0518 LearningRate 0.2136 Epoch: 6 Global Step: 28560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:26:30,174-Speed 5078.64 samples/sec Loss 6.0697 LearningRate 0.2135 Epoch: 6 Global Step: 28570 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:26:38,250-Speed 5072.54 samples/sec Loss 6.0896 LearningRate 0.2135 Epoch: 6 Global Step: 28580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:26:46,326-Speed 5072.52 samples/sec Loss 6.0927 LearningRate 0.2134 Epoch: 6 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:26:54,719-Speed 4880.58 samples/sec Loss 6.0288 LearningRate 0.2133 Epoch: 6 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:27:03,576-Speed 4625.34 samples/sec Loss 6.0693 LearningRate 0.2132 Epoch: 6 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:27:12,481-Speed 4599.99 samples/sec Loss 6.0583 LearningRate 0.2132 Epoch: 6 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:27:21,361-Speed 4613.37 samples/sec Loss 6.0312 LearningRate 0.2131 Epoch: 6 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:27:29,472-Speed 5050.13 samples/sec Loss 5.9710 LearningRate 0.2130 Epoch: 6 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:27:37,614-Speed 5031.98 samples/sec Loss 6.0082 LearningRate 0.2129 Epoch: 6 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:27:45,695-Speed 5069.23 samples/sec Loss 6.1130 LearningRate 0.2128 Epoch: 6 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:27:53,777-Speed 5069.08 samples/sec Loss 5.9739 LearningRate 0.2128 Epoch: 6 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:28:01,856-Speed 5070.10 samples/sec Loss 6.0209 LearningRate 0.2127 Epoch: 6 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:28:09,926-Speed 5076.79 samples/sec Loss 6.0182 LearningRate 0.2126 Epoch: 6 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:28:18,037-Speed 5050.63 samples/sec Loss 6.1124 LearningRate 0.2125 Epoch: 6 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:28:26,163-Speed 5040.94 samples/sec Loss 6.0474 LearningRate 0.2125 Epoch: 6 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:28:34,335-Speed 5013.11 samples/sec Loss 6.0557 LearningRate 0.2124 Epoch: 6 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:28:42,452-Speed 5046.87 samples/sec Loss 6.0561 LearningRate 0.2123 Epoch: 6 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:28:50,625-Speed 5012.37 samples/sec Loss 6.0590 LearningRate 0.2122 Epoch: 6 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:28:58,808-Speed 5006.00 samples/sec Loss 6.0501 LearningRate 0.2121 Epoch: 6 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:29:06,978-Speed 5013.74 samples/sec Loss 6.0232 LearningRate 0.2121 Epoch: 6 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:29:15,092-Speed 5049.18 samples/sec Loss 6.0077 LearningRate 0.2120 Epoch: 6 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:29:23,306-Speed 4986.74 samples/sec Loss 6.0920 LearningRate 0.2119 Epoch: 6 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:29:31,449-Speed 5030.86 samples/sec Loss 5.9940 LearningRate 0.2118 Epoch: 6 Global Step: 28790 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:29:39,643-Speed 4999.27 samples/sec Loss 6.1264 LearningRate 0.2118 Epoch: 6 Global Step: 28800 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:29:47,728-Speed 5067.59 samples/sec Loss 6.0989 LearningRate 0.2117 Epoch: 6 Global Step: 28810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:29:55,845-Speed 5046.87 samples/sec Loss 6.0728 LearningRate 0.2116 Epoch: 6 Global Step: 28820 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:30:04,021-Speed 5009.58 samples/sec Loss 5.9924 LearningRate 0.2115 Epoch: 6 Global Step: 28830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:30:12,324-Speed 4934.23 samples/sec Loss 5.9913 LearningRate 0.2115 Epoch: 6 Global Step: 28840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:30:20,428-Speed 5055.03 samples/sec Loss 5.9807 LearningRate 0.2114 Epoch: 6 Global Step: 28850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:30:28,574-Speed 5028.53 samples/sec Loss 6.0795 LearningRate 0.2113 Epoch: 6 Global Step: 28860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:30:36,755-Speed 5007.72 samples/sec Loss 5.9708 LearningRate 0.2112 Epoch: 6 Global Step: 28870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:30:44,940-Speed 5004.70 samples/sec Loss 5.9671 LearningRate 0.2111 Epoch: 6 Global Step: 28880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:30:53,065-Speed 5042.42 samples/sec Loss 5.9928 LearningRate 0.2111 Epoch: 6 Global Step: 28890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:31:01,145-Speed 5070.06 samples/sec Loss 6.0295 LearningRate 0.2110 Epoch: 6 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:31:09,428-Speed 4945.28 samples/sec Loss 6.1129 LearningRate 0.2109 Epoch: 6 Global Step: 28910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:31:17,647-Speed 4984.55 samples/sec Loss 5.9851 LearningRate 0.2108 Epoch: 6 Global Step: 28920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:31:25,884-Speed 4973.42 samples/sec Loss 6.0105 LearningRate 0.2108 Epoch: 6 Global Step: 28930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:31:34,041-Speed 5021.57 samples/sec Loss 6.1004 LearningRate 0.2107 Epoch: 6 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:31:42,209-Speed 5015.85 samples/sec Loss 6.0293 LearningRate 0.2106 Epoch: 6 Global Step: 28950 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:31:50,379-Speed 5013.32 samples/sec Loss 5.9512 LearningRate 0.2105 Epoch: 6 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:31:58,556-Speed 5010.58 samples/sec Loss 5.9633 LearningRate 0.2104 Epoch: 6 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:32:06,725-Speed 5014.59 samples/sec Loss 6.0651 LearningRate 0.2104 Epoch: 6 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:32:14,849-Speed 5042.15 samples/sec Loss 5.9746 LearningRate 0.2103 Epoch: 6 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:32:23,009-Speed 5020.67 samples/sec Loss 6.0579 LearningRate 0.2102 Epoch: 6 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:32:31,179-Speed 5014.21 samples/sec Loss 6.0375 LearningRate 0.2101 Epoch: 6 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:32:39,351-Speed 5012.73 samples/sec Loss 5.9993 LearningRate 0.2101 Epoch: 6 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:32:47,514-Speed 5018.56 samples/sec Loss 5.9965 LearningRate 0.2100 Epoch: 6 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:32:55,606-Speed 5062.33 samples/sec Loss 5.9936 LearningRate 0.2099 Epoch: 6 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:33:03,721-Speed 5048.62 samples/sec Loss 6.0383 LearningRate 0.2098 Epoch: 6 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:33:11,784-Speed 5080.35 samples/sec Loss 5.9829 LearningRate 0.2098 Epoch: 6 Global Step: 29060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:33:19,875-Speed 5062.66 samples/sec Loss 5.9829 LearningRate 0.2097 Epoch: 6 Global Step: 29070 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:33:27,981-Speed 5053.59 samples/sec Loss 5.9410 LearningRate 0.2096 Epoch: 6 Global Step: 29080 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:33:36,122-Speed 5032.23 samples/sec Loss 5.9949 LearningRate 0.2095 Epoch: 6 Global Step: 29090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:33:44,222-Speed 5057.67 samples/sec Loss 6.0213 LearningRate 0.2094 Epoch: 6 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:33:52,328-Speed 5053.69 samples/sec Loss 5.9961 LearningRate 0.2094 Epoch: 6 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:00,396-Speed 5077.49 samples/sec Loss 5.9860 LearningRate 0.2093 Epoch: 6 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:08,591-Speed 4998.92 samples/sec Loss 5.9730 LearningRate 0.2092 Epoch: 6 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:16,839-Speed 4966.38 samples/sec Loss 5.9562 LearningRate 0.2091 Epoch: 6 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:24,917-Speed 5070.95 samples/sec Loss 6.0621 LearningRate 0.2091 Epoch: 6 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:33,775-Speed 4625.01 samples/sec Loss 5.9849 LearningRate 0.2090 Epoch: 6 Global Step: 29160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:42,548-Speed 4670.09 samples/sec Loss 6.0124 LearningRate 0.2089 Epoch: 6 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:51,091-Speed 4794.77 samples/sec Loss 5.9146 LearningRate 0.2088 Epoch: 6 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:34:59,292-Speed 4995.44 samples/sec Loss 6.0325 LearningRate 0.2087 Epoch: 6 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:35:07,683-Speed 4882.03 samples/sec Loss 6.0267 LearningRate 0.2087 Epoch: 6 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:01,159-Speed 765.96 samples/sec Loss 5.6798 LearningRate 0.2086 Epoch: 7 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:09,367-Speed 4991.04 samples/sec Loss 5.5175 LearningRate 0.2085 Epoch: 7 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:17,671-Speed 4933.49 samples/sec Loss 5.4544 LearningRate 0.2084 Epoch: 7 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:25,949-Speed 4948.21 samples/sec Loss 5.4293 LearningRate 0.2084 Epoch: 7 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:34,598-Speed 4736.59 samples/sec Loss 5.4726 LearningRate 0.2083 Epoch: 7 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:42,900-Speed 4934.55 samples/sec Loss 5.4844 LearningRate 0.2082 Epoch: 7 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:51,152-Speed 4964.71 samples/sec Loss 5.4880 LearningRate 0.2081 Epoch: 7 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:36:59,325-Speed 5011.66 samples/sec Loss 5.4915 LearningRate 0.2081 Epoch: 7 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:37:07,548-Speed 4981.74 samples/sec Loss 5.5048 LearningRate 0.2080 Epoch: 7 Global Step: 29290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:37:15,900-Speed 4905.18 samples/sec Loss 5.5082 LearningRate 0.2079 Epoch: 7 Global Step: 29300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:37:24,160-Speed 4959.68 samples/sec Loss 5.5551 LearningRate 0.2078 Epoch: 7 Global Step: 29310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:37:32,638-Speed 4831.88 samples/sec Loss 5.5172 LearningRate 0.2078 Epoch: 7 Global Step: 29320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:37:41,091-Speed 4846.22 samples/sec Loss 5.5734 LearningRate 0.2077 Epoch: 7 Global Step: 29330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:37:49,339-Speed 4966.26 samples/sec Loss 5.6027 LearningRate 0.2076 Epoch: 7 Global Step: 29340 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:37:57,496-Speed 5022.61 samples/sec Loss 5.5967 LearningRate 0.2075 Epoch: 7 Global Step: 29350 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:38:05,654-Speed 5021.54 samples/sec Loss 5.5441 LearningRate 0.2074 Epoch: 7 Global Step: 29360 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:38:13,882-Speed 4978.66 samples/sec Loss 5.5194 LearningRate 0.2074 Epoch: 7 Global Step: 29370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:38:22,228-Speed 4908.42 samples/sec Loss 5.5788 LearningRate 0.2073 Epoch: 7 Global Step: 29380 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:38:30,605-Speed 4890.00 samples/sec Loss 5.5427 LearningRate 0.2072 Epoch: 7 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:38:39,053-Speed 4849.17 samples/sec Loss 5.6555 LearningRate 0.2071 Epoch: 7 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:38:47,278-Speed 4980.78 samples/sec Loss 5.6281 LearningRate 0.2071 Epoch: 7 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:38:55,729-Speed 4847.66 samples/sec Loss 5.6572 LearningRate 0.2070 Epoch: 7 Global Step: 29420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:39:04,033-Speed 4932.94 samples/sec Loss 5.5918 LearningRate 0.2069 Epoch: 7 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:39:12,154-Speed 5043.76 samples/sec Loss 5.6748 LearningRate 0.2068 Epoch: 7 Global Step: 29440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:39:20,353-Speed 4997.34 samples/sec Loss 5.6498 LearningRate 0.2068 Epoch: 7 Global Step: 29450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:39:28,624-Speed 4952.41 samples/sec Loss 5.7760 LearningRate 0.2067 Epoch: 7 Global Step: 29460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:39:36,834-Speed 4990.19 samples/sec Loss 5.6903 LearningRate 0.2066 Epoch: 7 Global Step: 29470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:39:44,962-Speed 5039.24 samples/sec Loss 5.7865 LearningRate 0.2065 Epoch: 7 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:39:53,189-Speed 4979.54 samples/sec Loss 5.6913 LearningRate 0.2064 Epoch: 7 Global Step: 29490 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:40:01,419-Speed 4978.12 samples/sec Loss 5.7201 LearningRate 0.2064 Epoch: 7 Global Step: 29500 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:40:09,606-Speed 5003.12 samples/sec Loss 5.7271 LearningRate 0.2063 Epoch: 7 Global Step: 29510 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:40:17,924-Speed 4924.87 samples/sec Loss 5.7699 LearningRate 0.2062 Epoch: 7 Global Step: 29520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:40:26,186-Speed 4958.69 samples/sec Loss 5.7302 LearningRate 0.2061 Epoch: 7 Global Step: 29530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:40:34,399-Speed 4987.76 samples/sec Loss 5.7290 LearningRate 0.2061 Epoch: 7 Global Step: 29540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:40:42,620-Speed 4983.34 samples/sec Loss 5.7471 LearningRate 0.2060 Epoch: 7 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:40:50,823-Speed 4994.00 samples/sec Loss 5.7157 LearningRate 0.2059 Epoch: 7 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:40:59,011-Speed 5002.40 samples/sec Loss 5.7977 LearningRate 0.2058 Epoch: 7 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:41:07,226-Speed 4987.38 samples/sec Loss 5.7955 LearningRate 0.2058 Epoch: 7 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:41:15,576-Speed 4905.73 samples/sec Loss 5.7922 LearningRate 0.2057 Epoch: 7 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:41:23,861-Speed 4944.81 samples/sec Loss 5.7461 LearningRate 0.2056 Epoch: 7 Global Step: 29600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:41:31,981-Speed 5044.56 samples/sec Loss 5.7501 LearningRate 0.2055 Epoch: 7 Global Step: 29610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:41:40,305-Speed 4921.40 samples/sec Loss 5.7141 LearningRate 0.2055 Epoch: 7 Global Step: 29620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:41:48,527-Speed 4982.70 samples/sec Loss 5.7785 LearningRate 0.2054 Epoch: 7 Global Step: 29630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:41:56,813-Speed 4943.77 samples/sec Loss 5.8165 LearningRate 0.2053 Epoch: 7 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:42:05,164-Speed 4905.17 samples/sec Loss 5.8219 LearningRate 0.2052 Epoch: 7 Global Step: 29650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:42:13,529-Speed 4897.86 samples/sec Loss 5.7627 LearningRate 0.2051 Epoch: 7 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:42:21,861-Speed 4915.98 samples/sec Loss 5.7361 LearningRate 0.2051 Epoch: 7 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:42:30,235-Speed 4892.34 samples/sec Loss 5.7545 LearningRate 0.2050 Epoch: 7 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:42:38,692-Speed 4844.88 samples/sec Loss 5.8344 LearningRate 0.2049 Epoch: 7 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:42:47,136-Speed 4850.95 samples/sec Loss 5.8350 LearningRate 0.2048 Epoch: 7 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:42:55,545-Speed 4871.90 samples/sec Loss 5.7313 LearningRate 0.2048 Epoch: 7 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:43:03,902-Speed 4902.07 samples/sec Loss 5.8541 LearningRate 0.2047 Epoch: 7 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:43:12,054-Speed 5024.96 samples/sec Loss 5.7602 LearningRate 0.2046 Epoch: 7 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:43:20,219-Speed 5016.89 samples/sec Loss 5.8059 LearningRate 0.2045 Epoch: 7 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:43:28,426-Speed 4991.50 samples/sec Loss 5.8436 LearningRate 0.2045 Epoch: 7 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:43:36,648-Speed 4982.47 samples/sec Loss 5.8017 LearningRate 0.2044 Epoch: 7 Global Step: 29760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:43:44,819-Speed 5013.89 samples/sec Loss 5.8986 LearningRate 0.2043 Epoch: 7 Global Step: 29770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:43:53,032-Speed 4987.30 samples/sec Loss 5.8483 LearningRate 0.2042 Epoch: 7 Global Step: 29780 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:44:02,039-Speed 4548.25 samples/sec Loss 5.8260 LearningRate 0.2042 Epoch: 7 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:44:11,239-Speed 4453.03 samples/sec Loss 5.8538 LearningRate 0.2041 Epoch: 7 Global Step: 29800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:44:20,167-Speed 4588.03 samples/sec Loss 5.8520 LearningRate 0.2040 Epoch: 7 Global Step: 29810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:44:29,116-Speed 4577.89 samples/sec Loss 5.8340 LearningRate 0.2039 Epoch: 7 Global Step: 29820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:44:38,089-Speed 4565.15 samples/sec Loss 5.8402 LearningRate 0.2039 Epoch: 7 Global Step: 29830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:44:46,243-Speed 5024.10 samples/sec Loss 5.8610 LearningRate 0.2038 Epoch: 7 Global Step: 29840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:44:54,404-Speed 5019.63 samples/sec Loss 5.8572 LearningRate 0.2037 Epoch: 7 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:02,607-Speed 4993.75 samples/sec Loss 5.8806 LearningRate 0.2036 Epoch: 7 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:10,746-Speed 5033.32 samples/sec Loss 5.8554 LearningRate 0.2035 Epoch: 7 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:19,003-Speed 4961.40 samples/sec Loss 5.8719 LearningRate 0.2035 Epoch: 7 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:27,251-Speed 4966.54 samples/sec Loss 5.8091 LearningRate 0.2034 Epoch: 7 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:35,378-Speed 5040.75 samples/sec Loss 5.7841 LearningRate 0.2033 Epoch: 7 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:43,502-Speed 5042.36 samples/sec Loss 5.7775 LearningRate 0.2032 Epoch: 7 Global Step: 29910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:51,737-Speed 4974.19 samples/sec Loss 5.8656 LearningRate 0.2032 Epoch: 7 Global Step: 29920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:45:59,956-Speed 4984.40 samples/sec Loss 5.8108 LearningRate 0.2031 Epoch: 7 Global Step: 29930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:46:08,130-Speed 5011.70 samples/sec Loss 5.8675 LearningRate 0.2030 Epoch: 7 Global Step: 29940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:46:16,307-Speed 5009.91 samples/sec Loss 5.8270 LearningRate 0.2029 Epoch: 7 Global Step: 29950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:46:24,453-Speed 5029.25 samples/sec Loss 5.8479 LearningRate 0.2029 Epoch: 7 Global Step: 29960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:46:32,816-Speed 4898.30 samples/sec Loss 5.8248 LearningRate 0.2028 Epoch: 7 Global Step: 29970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:46:40,917-Speed 5056.87 samples/sec Loss 5.8751 LearningRate 0.2027 Epoch: 7 Global Step: 29980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:46:49,100-Speed 5005.99 samples/sec Loss 5.8527 LearningRate 0.2026 Epoch: 7 Global Step: 29990 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:46:57,349-Speed 4965.71 samples/sec Loss 5.8719 LearningRate 0.2026 Epoch: 7 Global Step: 30000 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:47:43,930-[lfw][30000]XNorm: 21.963039 Training: 2022-01-17 04:47:43,930-[lfw][30000]Accuracy-Flip: 0.99817+-0.00263 Training: 2022-01-17 04:47:43,931-[lfw][30000]Accuracy-Highest: 0.99817 Training: 2022-01-17 04:48:38,594-[cfp_fp][30000]XNorm: 19.419536 Training: 2022-01-17 04:48:38,595-[cfp_fp][30000]Accuracy-Flip: 0.97343+-0.01015 Training: 2022-01-17 04:48:38,596-[cfp_fp][30000]Accuracy-Highest: 0.97343 Training: 2022-01-17 04:49:25,576-[agedb_30][30000]XNorm: 21.540817 Training: 2022-01-17 04:49:25,577-[agedb_30][30000]Accuracy-Flip: 0.97450+-0.00796 Training: 2022-01-17 04:49:25,577-[agedb_30][30000]Accuracy-Highest: 0.97450 Training: 2022-01-17 04:49:33,707-Speed 261.96 samples/sec Loss 5.8039 LearningRate 0.2025 Epoch: 7 Global Step: 30010 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:49:41,875-Speed 5015.66 samples/sec Loss 5.7642 LearningRate 0.2024 Epoch: 7 Global Step: 30020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:49:49,980-Speed 5054.75 samples/sec Loss 5.8372 LearningRate 0.2023 Epoch: 7 Global Step: 30030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:49:58,109-Speed 5039.15 samples/sec Loss 5.8266 LearningRate 0.2023 Epoch: 7 Global Step: 30040 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:50:06,298-Speed 5002.33 samples/sec Loss 5.7987 LearningRate 0.2022 Epoch: 7 Global Step: 30050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:50:14,511-Speed 4988.46 samples/sec Loss 5.7917 LearningRate 0.2021 Epoch: 7 Global Step: 30060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:50:22,709-Speed 4996.62 samples/sec Loss 5.8387 LearningRate 0.2020 Epoch: 7 Global Step: 30070 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:50:30,792-Speed 5068.45 samples/sec Loss 5.8456 LearningRate 0.2020 Epoch: 7 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:50:38,866-Speed 5073.20 samples/sec Loss 5.8547 LearningRate 0.2019 Epoch: 7 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:50:47,091-Speed 4980.65 samples/sec Loss 5.8561 LearningRate 0.2018 Epoch: 7 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:50:55,213-Speed 5043.96 samples/sec Loss 5.8557 LearningRate 0.2017 Epoch: 7 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:51:03,344-Speed 5038.36 samples/sec Loss 5.8813 LearningRate 0.2017 Epoch: 7 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:51:11,564-Speed 4983.80 samples/sec Loss 5.8078 LearningRate 0.2016 Epoch: 7 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:51:19,679-Speed 5047.44 samples/sec Loss 5.8841 LearningRate 0.2015 Epoch: 7 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:51:27,846-Speed 5016.31 samples/sec Loss 5.8762 LearningRate 0.2014 Epoch: 7 Global Step: 30150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:51:36,292-Speed 4850.80 samples/sec Loss 5.8566 LearningRate 0.2014 Epoch: 7 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:51:44,608-Speed 4925.55 samples/sec Loss 5.8864 LearningRate 0.2013 Epoch: 7 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:51:52,812-Speed 4993.26 samples/sec Loss 5.7910 LearningRate 0.2012 Epoch: 7 Global Step: 30180 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:52:00,957-Speed 5029.95 samples/sec Loss 5.8433 LearningRate 0.2011 Epoch: 7 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:52:09,032-Speed 5072.75 samples/sec Loss 5.7840 LearningRate 0.2010 Epoch: 7 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:52:17,152-Speed 5045.03 samples/sec Loss 5.8185 LearningRate 0.2010 Epoch: 7 Global Step: 30210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:52:25,291-Speed 5033.29 samples/sec Loss 5.7802 LearningRate 0.2009 Epoch: 7 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:52:33,406-Speed 5048.33 samples/sec Loss 5.8172 LearningRate 0.2008 Epoch: 7 Global Step: 30230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:52:41,673-Speed 4955.15 samples/sec Loss 5.8171 LearningRate 0.2007 Epoch: 7 Global Step: 30240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:52:49,854-Speed 5007.18 samples/sec Loss 5.7463 LearningRate 0.2007 Epoch: 7 Global Step: 30250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:52:57,981-Speed 5040.61 samples/sec Loss 5.7728 LearningRate 0.2006 Epoch: 7 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:53:06,111-Speed 5039.01 samples/sec Loss 5.8010 LearningRate 0.2005 Epoch: 7 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:53:14,213-Speed 5055.80 samples/sec Loss 5.8118 LearningRate 0.2004 Epoch: 7 Global Step: 30280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:53:22,412-Speed 4996.13 samples/sec Loss 5.8370 LearningRate 0.2004 Epoch: 7 Global Step: 30290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:53:30,664-Speed 4964.45 samples/sec Loss 5.8321 LearningRate 0.2003 Epoch: 7 Global Step: 30300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:53:38,844-Speed 5008.03 samples/sec Loss 5.8809 LearningRate 0.2002 Epoch: 7 Global Step: 30310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:53:46,946-Speed 5056.15 samples/sec Loss 5.7746 LearningRate 0.2001 Epoch: 7 Global Step: 30320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:53:55,040-Speed 5061.17 samples/sec Loss 5.8598 LearningRate 0.2001 Epoch: 7 Global Step: 30330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:03,229-Speed 5003.02 samples/sec Loss 5.8775 LearningRate 0.2000 Epoch: 7 Global Step: 30340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:11,309-Speed 5070.47 samples/sec Loss 5.8910 LearningRate 0.1999 Epoch: 7 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:19,408-Speed 5057.80 samples/sec Loss 5.7776 LearningRate 0.1998 Epoch: 7 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:27,506-Speed 5058.64 samples/sec Loss 5.7733 LearningRate 0.1998 Epoch: 7 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:35,650-Speed 5030.26 samples/sec Loss 5.8113 LearningRate 0.1997 Epoch: 7 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:43,812-Speed 5019.13 samples/sec Loss 5.8111 LearningRate 0.1996 Epoch: 7 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:51,879-Speed 5077.69 samples/sec Loss 5.8351 LearningRate 0.1995 Epoch: 7 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:54:59,988-Speed 5051.87 samples/sec Loss 5.7809 LearningRate 0.1995 Epoch: 7 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:55:08,129-Speed 5032.29 samples/sec Loss 5.8360 LearningRate 0.1994 Epoch: 7 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:55:16,387-Speed 4960.53 samples/sec Loss 5.7685 LearningRate 0.1993 Epoch: 7 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:55:24,493-Speed 5053.91 samples/sec Loss 5.8485 LearningRate 0.1992 Epoch: 7 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:55:32,535-Speed 5094.10 samples/sec Loss 5.8663 LearningRate 0.1992 Epoch: 7 Global Step: 30450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:55:40,660-Speed 5041.52 samples/sec Loss 5.7579 LearningRate 0.1991 Epoch: 7 Global Step: 30460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:55:48,702-Speed 5094.60 samples/sec Loss 5.8453 LearningRate 0.1990 Epoch: 7 Global Step: 30470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:55:56,813-Speed 5050.20 samples/sec Loss 5.8448 LearningRate 0.1989 Epoch: 7 Global Step: 30480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:56:04,933-Speed 5044.94 samples/sec Loss 5.7972 LearningRate 0.1989 Epoch: 7 Global Step: 30490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:56:13,003-Speed 5076.30 samples/sec Loss 5.8157 LearningRate 0.1988 Epoch: 7 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:56:21,144-Speed 5031.61 samples/sec Loss 5.7998 LearningRate 0.1987 Epoch: 7 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:56:29,484-Speed 4912.11 samples/sec Loss 5.7943 LearningRate 0.1986 Epoch: 7 Global Step: 30520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:56:37,638-Speed 5023.77 samples/sec Loss 5.8011 LearningRate 0.1986 Epoch: 7 Global Step: 30530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:56:45,810-Speed 5013.06 samples/sec Loss 5.7870 LearningRate 0.1985 Epoch: 7 Global Step: 30540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:56:54,082-Speed 4951.95 samples/sec Loss 5.8181 LearningRate 0.1984 Epoch: 7 Global Step: 30550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:57:02,211-Speed 5039.91 samples/sec Loss 5.7849 LearningRate 0.1983 Epoch: 7 Global Step: 30560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:57:10,345-Speed 5036.56 samples/sec Loss 5.7449 LearningRate 0.1983 Epoch: 7 Global Step: 30570 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:57:18,410-Speed 5079.16 samples/sec Loss 5.7864 LearningRate 0.1982 Epoch: 7 Global Step: 30580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:57:26,544-Speed 5036.16 samples/sec Loss 5.8260 LearningRate 0.1981 Epoch: 7 Global Step: 30590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:57:34,642-Speed 5059.03 samples/sec Loss 5.7810 LearningRate 0.1980 Epoch: 7 Global Step: 30600 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:57:42,717-Speed 5072.93 samples/sec Loss 5.7741 LearningRate 0.1980 Epoch: 7 Global Step: 30610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 04:57:50,820-Speed 5055.60 samples/sec Loss 5.8041 LearningRate 0.1979 Epoch: 7 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:57:58,877-Speed 5084.53 samples/sec Loss 5.7482 LearningRate 0.1978 Epoch: 7 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:58:07,009-Speed 5037.03 samples/sec Loss 5.8209 LearningRate 0.1977 Epoch: 7 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:58:15,128-Speed 5045.67 samples/sec Loss 5.8003 LearningRate 0.1977 Epoch: 7 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:58:23,186-Speed 5083.71 samples/sec Loss 5.7561 LearningRate 0.1976 Epoch: 7 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:58:31,280-Speed 5061.42 samples/sec Loss 5.7968 LearningRate 0.1975 Epoch: 7 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:58:39,375-Speed 5060.44 samples/sec Loss 5.7079 LearningRate 0.1974 Epoch: 7 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:58:47,496-Speed 5044.44 samples/sec Loss 5.6808 LearningRate 0.1974 Epoch: 7 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:58:55,573-Speed 5072.33 samples/sec Loss 5.7407 LearningRate 0.1973 Epoch: 7 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:59:03,718-Speed 5029.34 samples/sec Loss 5.7469 LearningRate 0.1972 Epoch: 7 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:59:11,899-Speed 5006.97 samples/sec Loss 5.7708 LearningRate 0.1971 Epoch: 7 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 04:59:20,142-Speed 4969.97 samples/sec Loss 5.7594 LearningRate 0.1971 Epoch: 7 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:59:28,454-Speed 4928.49 samples/sec Loss 5.7604 LearningRate 0.1970 Epoch: 7 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:59:36,950-Speed 4821.99 samples/sec Loss 5.7614 LearningRate 0.1969 Epoch: 7 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:59:45,023-Speed 5074.44 samples/sec Loss 5.7503 LearningRate 0.1968 Epoch: 7 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 04:59:53,165-Speed 5031.14 samples/sec Loss 5.7859 LearningRate 0.1968 Epoch: 7 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:00:01,386-Speed 4982.68 samples/sec Loss 5.7690 LearningRate 0.1967 Epoch: 7 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:00:09,513-Speed 5041.16 samples/sec Loss 5.7562 LearningRate 0.1966 Epoch: 7 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:00:17,673-Speed 5019.99 samples/sec Loss 5.7219 LearningRate 0.1965 Epoch: 7 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:00:25,821-Speed 5028.13 samples/sec Loss 5.7687 LearningRate 0.1965 Epoch: 7 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:00:34,020-Speed 4995.91 samples/sec Loss 5.7441 LearningRate 0.1964 Epoch: 7 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:00:42,210-Speed 5002.05 samples/sec Loss 5.7858 LearningRate 0.1963 Epoch: 7 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:00:50,350-Speed 5032.72 samples/sec Loss 5.7607 LearningRate 0.1962 Epoch: 7 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:00:58,432-Speed 5068.56 samples/sec Loss 5.7797 LearningRate 0.1962 Epoch: 7 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:01:06,683-Speed 4965.18 samples/sec Loss 5.7615 LearningRate 0.1961 Epoch: 7 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:01:14,858-Speed 5010.62 samples/sec Loss 5.7221 LearningRate 0.1960 Epoch: 7 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:01:23,311-Speed 4846.21 samples/sec Loss 5.7308 LearningRate 0.1959 Epoch: 7 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:01:31,898-Speed 4770.80 samples/sec Loss 5.7549 LearningRate 0.1959 Epoch: 7 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:01:40,464-Speed 4782.20 samples/sec Loss 5.8053 LearningRate 0.1958 Epoch: 7 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:01:48,658-Speed 4999.49 samples/sec Loss 5.7902 LearningRate 0.1957 Epoch: 7 Global Step: 30910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:01:56,836-Speed 5009.20 samples/sec Loss 5.7732 LearningRate 0.1956 Epoch: 7 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:02:05,050-Speed 4987.36 samples/sec Loss 5.7715 LearningRate 0.1956 Epoch: 7 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:02:13,199-Speed 5026.67 samples/sec Loss 5.7597 LearningRate 0.1955 Epoch: 7 Global Step: 30940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:02:21,316-Speed 5047.35 samples/sec Loss 5.7146 LearningRate 0.1954 Epoch: 7 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:02:29,390-Speed 5073.46 samples/sec Loss 5.7615 LearningRate 0.1954 Epoch: 7 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:02:37,498-Speed 5052.54 samples/sec Loss 5.7737 LearningRate 0.1953 Epoch: 7 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:02:45,648-Speed 5026.76 samples/sec Loss 5.7506 LearningRate 0.1952 Epoch: 7 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:02:53,833-Speed 5004.95 samples/sec Loss 5.7498 LearningRate 0.1951 Epoch: 7 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:03:02,044-Speed 4989.03 samples/sec Loss 5.7127 LearningRate 0.1951 Epoch: 7 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-17 05:03:10,190-Speed 5028.89 samples/sec Loss 5.7518 LearningRate 0.1950 Epoch: 7 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:03:18,292-Speed 5055.60 samples/sec Loss 5.7129 LearningRate 0.1949 Epoch: 7 Global Step: 31020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:03:26,427-Speed 5036.17 samples/sec Loss 5.7803 LearningRate 0.1948 Epoch: 7 Global Step: 31030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:03:34,556-Speed 5039.63 samples/sec Loss 5.7473 LearningRate 0.1948 Epoch: 7 Global Step: 31040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:03:42,778-Speed 4982.25 samples/sec Loss 5.6368 LearningRate 0.1947 Epoch: 7 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:03:50,974-Speed 4998.32 samples/sec Loss 5.7701 LearningRate 0.1946 Epoch: 7 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:03:59,307-Speed 4916.06 samples/sec Loss 5.7217 LearningRate 0.1945 Epoch: 7 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:04:07,519-Speed 4988.51 samples/sec Loss 5.7984 LearningRate 0.1945 Epoch: 7 Global Step: 31080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:04:15,718-Speed 4996.02 samples/sec Loss 5.7330 LearningRate 0.1944 Epoch: 7 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:04:23,986-Speed 4954.54 samples/sec Loss 5.7632 LearningRate 0.1943 Epoch: 7 Global Step: 31100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:04:32,269-Speed 4945.76 samples/sec Loss 5.7804 LearningRate 0.1942 Epoch: 7 Global Step: 31110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:04:40,644-Speed 4891.78 samples/sec Loss 5.7078 LearningRate 0.1942 Epoch: 7 Global Step: 31120 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:04:48,878-Speed 4975.52 samples/sec Loss 5.6809 LearningRate 0.1941 Epoch: 7 Global Step: 31130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:04:57,114-Speed 4974.00 samples/sec Loss 5.6885 LearningRate 0.1940 Epoch: 7 Global Step: 31140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:05:05,275-Speed 5019.70 samples/sec Loss 5.7023 LearningRate 0.1939 Epoch: 7 Global Step: 31150 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:05:13,435-Speed 5020.67 samples/sec Loss 5.6866 LearningRate 0.1939 Epoch: 7 Global Step: 31160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:05:21,552-Speed 5046.31 samples/sec Loss 5.7099 LearningRate 0.1938 Epoch: 7 Global Step: 31170 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:05:29,679-Speed 5041.08 samples/sec Loss 5.7285 LearningRate 0.1937 Epoch: 7 Global Step: 31180 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-17 05:05:37,787-Speed 5052.17 samples/sec Loss 5.7180 LearningRate 0.1936 Epoch: 7 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:05:45,914-Speed 5041.17 samples/sec Loss 5.7239 LearningRate 0.1936 Epoch: 7 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:05:54,101-Speed 5003.42 samples/sec Loss 5.7302 LearningRate 0.1935 Epoch: 7 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:06:02,378-Speed 4949.32 samples/sec Loss 5.6967 LearningRate 0.1934 Epoch: 7 Global Step: 31220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:06:10,715-Speed 4913.43 samples/sec Loss 5.6759 LearningRate 0.1933 Epoch: 7 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:06:19,049-Speed 4915.79 samples/sec Loss 5.6749 LearningRate 0.1933 Epoch: 7 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:06:27,222-Speed 5012.05 samples/sec Loss 5.6388 LearningRate 0.1932 Epoch: 7 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-17 05:06:35,437-Speed 4986.90 samples/sec Loss 5.8029 LearningRate 0.1931 Epoch: 7 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:06:43,682-Speed 4968.41 samples/sec Loss 5.6736 LearningRate 0.1930 Epoch: 7 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:06:51,932-Speed 4965.17 samples/sec Loss 5.7550 LearningRate 0.1930 Epoch: 7 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:07:00,038-Speed 5053.48 samples/sec Loss 5.6582 LearningRate 0.1929 Epoch: 7 Global Step: 31290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:07:08,264-Speed 4979.91 samples/sec Loss 5.6525 LearningRate 0.1928 Epoch: 7 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:07:16,513-Speed 4966.74 samples/sec Loss 5.7099 LearningRate 0.1928 Epoch: 7 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:07:24,973-Speed 4842.11 samples/sec Loss 5.7181 LearningRate 0.1927 Epoch: 7 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:07:33,396-Speed 4863.49 samples/sec Loss 5.6624 LearningRate 0.1926 Epoch: 7 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:07:41,897-Speed 4819.13 samples/sec Loss 5.6897 LearningRate 0.1925 Epoch: 7 Global Step: 31340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:07:50,181-Speed 4944.66 samples/sec Loss 5.6862 LearningRate 0.1925 Epoch: 7 Global Step: 31350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:07:58,419-Speed 4972.85 samples/sec Loss 5.6619 LearningRate 0.1924 Epoch: 7 Global Step: 31360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:08:06,573-Speed 5023.94 samples/sec Loss 5.6569 LearningRate 0.1923 Epoch: 7 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:08:14,787-Speed 4987.18 samples/sec Loss 5.6781 LearningRate 0.1922 Epoch: 7 Global Step: 31380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:08:22,887-Speed 5058.00 samples/sec Loss 5.6881 LearningRate 0.1922 Epoch: 7 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:08:31,012-Speed 5041.83 samples/sec Loss 5.6740 LearningRate 0.1921 Epoch: 7 Global Step: 31400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:08:39,168-Speed 5022.31 samples/sec Loss 5.6867 LearningRate 0.1920 Epoch: 7 Global Step: 31410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:08:47,435-Speed 4955.42 samples/sec Loss 5.7078 LearningRate 0.1919 Epoch: 7 Global Step: 31420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:08:55,568-Speed 5037.25 samples/sec Loss 5.6483 LearningRate 0.1919 Epoch: 7 Global Step: 31430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:09:03,814-Speed 4967.40 samples/sec Loss 5.6418 LearningRate 0.1918 Epoch: 7 Global Step: 31440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:09:11,960-Speed 5029.26 samples/sec Loss 5.6612 LearningRate 0.1917 Epoch: 7 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:09:20,136-Speed 5010.18 samples/sec Loss 5.6600 LearningRate 0.1916 Epoch: 7 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:09:28,373-Speed 4973.24 samples/sec Loss 5.7034 LearningRate 0.1916 Epoch: 7 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:09:36,611-Speed 4972.91 samples/sec Loss 5.7000 LearningRate 0.1915 Epoch: 7 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:09:45,133-Speed 4807.11 samples/sec Loss 5.7482 LearningRate 0.1914 Epoch: 7 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:09:53,301-Speed 5015.43 samples/sec Loss 5.6633 LearningRate 0.1914 Epoch: 7 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:10:01,455-Speed 5023.63 samples/sec Loss 5.7338 LearningRate 0.1913 Epoch: 7 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:10:09,776-Speed 4923.45 samples/sec Loss 5.6686 LearningRate 0.1912 Epoch: 7 Global Step: 31520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:10:18,005-Speed 4977.70 samples/sec Loss 5.6572 LearningRate 0.1911 Epoch: 7 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:10:26,151-Speed 5029.52 samples/sec Loss 5.6660 LearningRate 0.1911 Epoch: 7 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:10:34,273-Speed 5043.51 samples/sec Loss 5.6545 LearningRate 0.1910 Epoch: 7 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:10:42,402-Speed 5039.02 samples/sec Loss 5.7119 LearningRate 0.1909 Epoch: 7 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:10:50,600-Speed 4997.30 samples/sec Loss 5.6801 LearningRate 0.1908 Epoch: 7 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:10:58,702-Speed 5056.06 samples/sec Loss 5.6687 LearningRate 0.1908 Epoch: 7 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:11:06,834-Speed 5037.66 samples/sec Loss 5.6503 LearningRate 0.1907 Epoch: 7 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:11:15,047-Speed 4987.48 samples/sec Loss 5.6370 LearningRate 0.1906 Epoch: 7 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:11:23,152-Speed 5054.74 samples/sec Loss 5.5774 LearningRate 0.1905 Epoch: 7 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:11:31,320-Speed 5015.04 samples/sec Loss 5.6754 LearningRate 0.1905 Epoch: 7 Global Step: 31620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:11:39,453-Speed 5037.09 samples/sec Loss 5.6150 LearningRate 0.1904 Epoch: 7 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:11:47,635-Speed 5006.69 samples/sec Loss 5.6771 LearningRate 0.1903 Epoch: 7 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:11:55,816-Speed 5007.25 samples/sec Loss 5.6676 LearningRate 0.1902 Epoch: 7 Global Step: 31650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:12:03,955-Speed 5033.69 samples/sec Loss 5.6438 LearningRate 0.1902 Epoch: 7 Global Step: 31660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:12:12,063-Speed 5052.41 samples/sec Loss 5.6415 LearningRate 0.1901 Epoch: 7 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:12:20,198-Speed 5035.72 samples/sec Loss 5.6730 LearningRate 0.1900 Epoch: 7 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:12:28,334-Speed 5034.91 samples/sec Loss 5.7124 LearningRate 0.1900 Epoch: 7 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:12:36,475-Speed 5031.79 samples/sec Loss 5.6097 LearningRate 0.1899 Epoch: 7 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:12:44,604-Speed 5039.56 samples/sec Loss 5.6756 LearningRate 0.1898 Epoch: 7 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:12:52,777-Speed 5011.85 samples/sec Loss 5.6151 LearningRate 0.1897 Epoch: 7 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:13:00,982-Speed 4993.38 samples/sec Loss 5.6214 LearningRate 0.1897 Epoch: 7 Global Step: 31730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:13:09,028-Speed 5091.47 samples/sec Loss 5.5953 LearningRate 0.1896 Epoch: 7 Global Step: 31740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:13:17,161-Speed 5036.67 samples/sec Loss 5.5715 LearningRate 0.1895 Epoch: 7 Global Step: 31750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:13:25,324-Speed 5018.32 samples/sec Loss 5.6505 LearningRate 0.1894 Epoch: 7 Global Step: 31760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:13:33,503-Speed 5008.34 samples/sec Loss 5.6091 LearningRate 0.1894 Epoch: 7 Global Step: 31770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:13:41,763-Speed 4959.83 samples/sec Loss 5.6246 LearningRate 0.1893 Epoch: 7 Global Step: 31780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:13:50,046-Speed 4945.46 samples/sec Loss 5.7008 LearningRate 0.1892 Epoch: 7 Global Step: 31790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:13:58,614-Speed 4781.40 samples/sec Loss 5.6248 LearningRate 0.1891 Epoch: 7 Global Step: 31800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:14:07,432-Speed 4645.65 samples/sec Loss 5.5876 LearningRate 0.1891 Epoch: 7 Global Step: 31810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:14:16,194-Speed 4674.78 samples/sec Loss 5.5428 LearningRate 0.1890 Epoch: 7 Global Step: 31820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:14:24,940-Speed 4683.74 samples/sec Loss 5.6262 LearningRate 0.1889 Epoch: 7 Global Step: 31830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:14:33,695-Speed 4679.53 samples/sec Loss 5.5938 LearningRate 0.1889 Epoch: 7 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:14:42,079-Speed 4886.03 samples/sec Loss 5.5766 LearningRate 0.1888 Epoch: 7 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:14:50,365-Speed 4944.11 samples/sec Loss 5.6654 LearningRate 0.1887 Epoch: 7 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:14:59,142-Speed 4667.50 samples/sec Loss 5.6794 LearningRate 0.1886 Epoch: 7 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:15:07,317-Speed 5010.80 samples/sec Loss 5.6517 LearningRate 0.1886 Epoch: 7 Global Step: 31880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:15:15,378-Speed 5081.74 samples/sec Loss 5.5996 LearningRate 0.1885 Epoch: 7 Global Step: 31890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:15:23,511-Speed 5037.04 samples/sec Loss 5.5854 LearningRate 0.1884 Epoch: 7 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:15:31,680-Speed 5015.24 samples/sec Loss 5.6425 LearningRate 0.1883 Epoch: 7 Global Step: 31910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:15:40,046-Speed 4896.69 samples/sec Loss 5.5669 LearningRate 0.1883 Epoch: 7 Global Step: 31920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:15:48,161-Speed 5048.13 samples/sec Loss 5.5918 LearningRate 0.1882 Epoch: 7 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:15:56,268-Speed 5052.77 samples/sec Loss 5.5670 LearningRate 0.1881 Epoch: 7 Global Step: 31940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:16:04,358-Speed 5063.93 samples/sec Loss 5.5862 LearningRate 0.1880 Epoch: 7 Global Step: 31950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:16:12,479-Speed 5044.48 samples/sec Loss 5.6142 LearningRate 0.1880 Epoch: 7 Global Step: 31960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:16:20,557-Speed 5070.89 samples/sec Loss 5.5993 LearningRate 0.1879 Epoch: 7 Global Step: 31970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:16:28,754-Speed 4997.47 samples/sec Loss 5.5957 LearningRate 0.1878 Epoch: 7 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:16:37,305-Speed 4790.84 samples/sec Loss 5.5719 LearningRate 0.1878 Epoch: 7 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:16:45,609-Speed 4933.25 samples/sec Loss 5.5937 LearningRate 0.1877 Epoch: 7 Global Step: 32000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:16:54,151-Speed 4795.49 samples/sec Loss 5.5475 LearningRate 0.1876 Epoch: 7 Global Step: 32010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:17:02,904-Speed 4680.20 samples/sec Loss 5.5746 LearningRate 0.1875 Epoch: 7 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:17:11,042-Speed 5034.17 samples/sec Loss 5.5661 LearningRate 0.1875 Epoch: 7 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:17:19,228-Speed 5004.25 samples/sec Loss 5.5868 LearningRate 0.1874 Epoch: 7 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:17:27,372-Speed 5029.78 samples/sec Loss 5.5562 LearningRate 0.1873 Epoch: 7 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:17:35,481-Speed 5051.57 samples/sec Loss 5.6180 LearningRate 0.1872 Epoch: 7 Global Step: 32060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:17:43,719-Speed 4973.11 samples/sec Loss 5.5847 LearningRate 0.1872 Epoch: 7 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:17:51,974-Speed 4962.69 samples/sec Loss 5.5904 LearningRate 0.1871 Epoch: 7 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:18:00,202-Speed 4978.25 samples/sec Loss 5.6811 LearningRate 0.1870 Epoch: 7 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:18:08,327-Speed 5041.88 samples/sec Loss 5.6246 LearningRate 0.1870 Epoch: 7 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:18:16,708-Speed 4888.32 samples/sec Loss 5.6290 LearningRate 0.1869 Epoch: 7 Global Step: 32110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:18:24,996-Speed 4942.32 samples/sec Loss 5.5930 LearningRate 0.1868 Epoch: 7 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:18:33,186-Speed 5001.94 samples/sec Loss 5.5179 LearningRate 0.1867 Epoch: 7 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:18:41,428-Speed 4970.03 samples/sec Loss 5.5793 LearningRate 0.1867 Epoch: 7 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:18:49,829-Speed 4876.52 samples/sec Loss 5.5570 LearningRate 0.1866 Epoch: 7 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:18:58,357-Speed 4803.44 samples/sec Loss 5.5770 LearningRate 0.1865 Epoch: 7 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:19:06,697-Speed 4911.84 samples/sec Loss 5.6120 LearningRate 0.1864 Epoch: 7 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:19:14,817-Speed 5045.26 samples/sec Loss 5.6393 LearningRate 0.1864 Epoch: 7 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:19:23,056-Speed 4972.31 samples/sec Loss 5.5282 LearningRate 0.1863 Epoch: 7 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:19:31,164-Speed 5052.49 samples/sec Loss 5.5804 LearningRate 0.1862 Epoch: 7 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:19:39,838-Speed 4722.46 samples/sec Loss 5.5393 LearningRate 0.1862 Epoch: 7 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:19:48,808-Speed 4567.11 samples/sec Loss 5.6631 LearningRate 0.1861 Epoch: 7 Global Step: 32220 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:19:57,595-Speed 4662.02 samples/sec Loss 5.6066 LearningRate 0.1860 Epoch: 7 Global Step: 32230 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:20:05,774-Speed 5008.91 samples/sec Loss 5.5473 LearningRate 0.1859 Epoch: 7 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:20:13,919-Speed 5029.78 samples/sec Loss 5.5871 LearningRate 0.1859 Epoch: 7 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:20:22,222-Speed 4934.02 samples/sec Loss 5.5423 LearningRate 0.1858 Epoch: 7 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:20:30,430-Speed 4990.62 samples/sec Loss 5.5066 LearningRate 0.1857 Epoch: 7 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:20:38,561-Speed 5038.32 samples/sec Loss 5.5635 LearningRate 0.1856 Epoch: 7 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:20:46,993-Speed 4858.31 samples/sec Loss 5.5930 LearningRate 0.1856 Epoch: 7 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:20:55,149-Speed 5022.87 samples/sec Loss 5.5636 LearningRate 0.1855 Epoch: 7 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:21:03,308-Speed 5021.01 samples/sec Loss 5.6103 LearningRate 0.1854 Epoch: 7 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:21:11,520-Speed 4987.88 samples/sec Loss 5.5344 LearningRate 0.1854 Epoch: 7 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:21:20,001-Speed 4830.50 samples/sec Loss 5.5070 LearningRate 0.1853 Epoch: 7 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:21:28,124-Speed 5043.06 samples/sec Loss 5.5548 LearningRate 0.1852 Epoch: 7 Global Step: 32340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:21:36,317-Speed 5000.33 samples/sec Loss 5.5231 LearningRate 0.1851 Epoch: 7 Global Step: 32350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:21:44,560-Speed 4969.56 samples/sec Loss 5.5764 LearningRate 0.1851 Epoch: 7 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:21:52,715-Speed 5023.33 samples/sec Loss 5.5435 LearningRate 0.1850 Epoch: 7 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:00,869-Speed 5024.08 samples/sec Loss 5.5257 LearningRate 0.1849 Epoch: 7 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:08,985-Speed 5047.31 samples/sec Loss 5.5229 LearningRate 0.1848 Epoch: 7 Global Step: 32390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:17,185-Speed 4995.82 samples/sec Loss 5.4737 LearningRate 0.1848 Epoch: 7 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:25,382-Speed 4997.67 samples/sec Loss 5.4887 LearningRate 0.1847 Epoch: 7 Global Step: 32410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:33,516-Speed 5036.60 samples/sec Loss 5.5121 LearningRate 0.1846 Epoch: 7 Global Step: 32420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:41,791-Speed 4950.23 samples/sec Loss 5.5390 LearningRate 0.1846 Epoch: 7 Global Step: 32430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:50,049-Speed 4960.72 samples/sec Loss 5.5382 LearningRate 0.1845 Epoch: 7 Global Step: 32440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:22:58,195-Speed 5029.22 samples/sec Loss 5.5348 LearningRate 0.1844 Epoch: 7 Global Step: 32450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:23:06,310-Speed 5047.79 samples/sec Loss 5.5506 LearningRate 0.1843 Epoch: 7 Global Step: 32460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:23:14,441-Speed 5038.42 samples/sec Loss 5.5221 LearningRate 0.1843 Epoch: 7 Global Step: 32470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:23:22,611-Speed 5014.20 samples/sec Loss 5.5911 LearningRate 0.1842 Epoch: 7 Global Step: 32480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:23:30,711-Speed 5057.37 samples/sec Loss 5.4690 LearningRate 0.1841 Epoch: 7 Global Step: 32490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:23:38,803-Speed 5062.45 samples/sec Loss 5.4790 LearningRate 0.1841 Epoch: 7 Global Step: 32500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:23:46,955-Speed 5025.40 samples/sec Loss 5.4785 LearningRate 0.1840 Epoch: 7 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:23:55,098-Speed 5030.30 samples/sec Loss 5.5013 LearningRate 0.1839 Epoch: 7 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:24:03,285-Speed 5003.50 samples/sec Loss 5.4824 LearningRate 0.1838 Epoch: 7 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:24:11,505-Speed 4983.77 samples/sec Loss 5.4436 LearningRate 0.1838 Epoch: 7 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:24:19,830-Speed 4920.55 samples/sec Loss 5.4582 LearningRate 0.1837 Epoch: 7 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:24:28,245-Speed 4868.34 samples/sec Loss 5.4805 LearningRate 0.1836 Epoch: 7 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:24:36,530-Speed 4944.43 samples/sec Loss 5.4447 LearningRate 0.1835 Epoch: 7 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:24:44,653-Speed 5043.71 samples/sec Loss 5.4984 LearningRate 0.1835 Epoch: 7 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:24:52,801-Speed 5027.52 samples/sec Loss 5.5125 LearningRate 0.1834 Epoch: 7 Global Step: 32590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:01,627-Speed 4641.22 samples/sec Loss 5.4556 LearningRate 0.1833 Epoch: 7 Global Step: 32600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:10,085-Speed 4843.18 samples/sec Loss 5.5095 LearningRate 0.1833 Epoch: 7 Global Step: 32610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:18,246-Speed 5019.87 samples/sec Loss 5.5357 LearningRate 0.1832 Epoch: 7 Global Step: 32620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:26,396-Speed 5026.33 samples/sec Loss 5.5174 LearningRate 0.1831 Epoch: 7 Global Step: 32630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:34,613-Speed 4984.96 samples/sec Loss 5.5850 LearningRate 0.1830 Epoch: 7 Global Step: 32640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:42,869-Speed 4962.11 samples/sec Loss 5.5089 LearningRate 0.1830 Epoch: 7 Global Step: 32650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:51,020-Speed 5025.56 samples/sec Loss 5.4848 LearningRate 0.1829 Epoch: 7 Global Step: 32660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:25:59,405-Speed 4885.26 samples/sec Loss 5.5501 LearningRate 0.1828 Epoch: 7 Global Step: 32670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:26:07,916-Speed 4813.54 samples/sec Loss 5.4505 LearningRate 0.1828 Epoch: 7 Global Step: 32680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:26:16,079-Speed 5018.48 samples/sec Loss 5.5448 LearningRate 0.1827 Epoch: 7 Global Step: 32690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:26:24,252-Speed 5012.38 samples/sec Loss 5.4728 LearningRate 0.1826 Epoch: 7 Global Step: 32700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:26:32,422-Speed 5013.91 samples/sec Loss 5.4598 LearningRate 0.1825 Epoch: 7 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:26:40,501-Speed 5070.99 samples/sec Loss 5.4599 LearningRate 0.1825 Epoch: 7 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:26:48,720-Speed 4984.33 samples/sec Loss 5.4240 LearningRate 0.1824 Epoch: 7 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:26:56,858-Speed 5033.61 samples/sec Loss 5.5152 LearningRate 0.1823 Epoch: 7 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:27:05,025-Speed 5015.43 samples/sec Loss 5.4660 LearningRate 0.1823 Epoch: 7 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:27:13,231-Speed 4992.15 samples/sec Loss 5.4698 LearningRate 0.1822 Epoch: 7 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:27:21,335-Speed 5055.12 samples/sec Loss 5.4660 LearningRate 0.1821 Epoch: 7 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:27:29,433-Speed 5058.59 samples/sec Loss 5.4704 LearningRate 0.1820 Epoch: 7 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:27:37,533-Speed 5057.29 samples/sec Loss 5.4808 LearningRate 0.1820 Epoch: 7 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:27:45,620-Speed 5065.56 samples/sec Loss 5.4647 LearningRate 0.1819 Epoch: 7 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:27:53,841-Speed 4983.21 samples/sec Loss 5.4750 LearningRate 0.1818 Epoch: 7 Global Step: 32810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:28:02,035-Speed 4999.32 samples/sec Loss 5.4811 LearningRate 0.1817 Epoch: 7 Global Step: 32820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:28:10,206-Speed 5013.35 samples/sec Loss 5.4677 LearningRate 0.1817 Epoch: 7 Global Step: 32830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:28:18,320-Speed 5049.08 samples/sec Loss 5.4471 LearningRate 0.1816 Epoch: 7 Global Step: 32840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:28:26,394-Speed 5074.33 samples/sec Loss 5.4420 LearningRate 0.1815 Epoch: 7 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:28:34,620-Speed 4979.57 samples/sec Loss 5.4555 LearningRate 0.1815 Epoch: 7 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:28:42,889-Speed 4954.21 samples/sec Loss 5.4749 LearningRate 0.1814 Epoch: 7 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:28:51,077-Speed 5003.15 samples/sec Loss 5.4725 LearningRate 0.1813 Epoch: 7 Global Step: 32880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:28:59,234-Speed 5022.52 samples/sec Loss 5.5108 LearningRate 0.1812 Epoch: 7 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:29:07,475-Speed 4970.74 samples/sec Loss 5.4327 LearningRate 0.1812 Epoch: 7 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:29:15,645-Speed 5013.91 samples/sec Loss 5.4384 LearningRate 0.1811 Epoch: 7 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:29:23,772-Speed 5040.68 samples/sec Loss 5.4979 LearningRate 0.1810 Epoch: 7 Global Step: 32920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:29:32,048-Speed 4949.69 samples/sec Loss 5.4124 LearningRate 0.1810 Epoch: 7 Global Step: 32930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:29:40,198-Speed 5026.67 samples/sec Loss 5.4452 LearningRate 0.1809 Epoch: 7 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:29:48,342-Speed 5030.70 samples/sec Loss 5.4649 LearningRate 0.1808 Epoch: 7 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:29:56,691-Speed 4906.60 samples/sec Loss 5.4632 LearningRate 0.1807 Epoch: 7 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:30:04,868-Speed 5009.28 samples/sec Loss 5.4188 LearningRate 0.1807 Epoch: 7 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:30:12,943-Speed 5073.46 samples/sec Loss 5.4555 LearningRate 0.1806 Epoch: 7 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:30:21,138-Speed 4998.61 samples/sec Loss 5.4522 LearningRate 0.1805 Epoch: 7 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:30:29,420-Speed 4946.37 samples/sec Loss 5.4090 LearningRate 0.1805 Epoch: 7 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:30:37,717-Speed 4937.14 samples/sec Loss 5.4300 LearningRate 0.1804 Epoch: 7 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:30:46,608-Speed 4607.90 samples/sec Loss 5.4995 LearningRate 0.1803 Epoch: 7 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:30:55,369-Speed 4675.58 samples/sec Loss 5.4640 LearningRate 0.1802 Epoch: 7 Global Step: 33030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:31:04,105-Speed 4689.53 samples/sec Loss 5.4482 LearningRate 0.1802 Epoch: 7 Global Step: 33040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:31:12,947-Speed 4633.11 samples/sec Loss 5.3882 LearningRate 0.1801 Epoch: 7 Global Step: 33050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:31:21,515-Speed 4781.23 samples/sec Loss 5.4455 LearningRate 0.1800 Epoch: 7 Global Step: 33060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:31:29,656-Speed 5032.05 samples/sec Loss 5.4252 LearningRate 0.1800 Epoch: 7 Global Step: 33070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:31:37,959-Speed 4933.60 samples/sec Loss 5.4420 LearningRate 0.1799 Epoch: 7 Global Step: 33080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:31:46,093-Speed 5036.06 samples/sec Loss 5.4344 LearningRate 0.1798 Epoch: 7 Global Step: 33090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:31:54,307-Speed 4987.42 samples/sec Loss 5.3561 LearningRate 0.1797 Epoch: 7 Global Step: 33100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:32:02,623-Speed 4926.13 samples/sec Loss 5.4130 LearningRate 0.1797 Epoch: 7 Global Step: 33110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:32:10,827-Speed 4993.62 samples/sec Loss 5.4068 LearningRate 0.1796 Epoch: 7 Global Step: 33120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:32:19,007-Speed 5008.18 samples/sec Loss 5.3766 LearningRate 0.1795 Epoch: 7 Global Step: 33130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:32:27,144-Speed 5034.29 samples/sec Loss 5.4976 LearningRate 0.1795 Epoch: 7 Global Step: 33140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:32:35,326-Speed 5006.27 samples/sec Loss 5.4244 LearningRate 0.1794 Epoch: 7 Global Step: 33150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:32:43,459-Speed 5037.45 samples/sec Loss 5.4083 LearningRate 0.1793 Epoch: 7 Global Step: 33160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:32:51,558-Speed 5057.40 samples/sec Loss 5.4546 LearningRate 0.1792 Epoch: 7 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:32:59,795-Speed 4973.52 samples/sec Loss 5.4633 LearningRate 0.1792 Epoch: 7 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:33:08,097-Speed 4934.38 samples/sec Loss 5.4486 LearningRate 0.1791 Epoch: 7 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:33:16,271-Speed 5011.63 samples/sec Loss 5.4179 LearningRate 0.1790 Epoch: 7 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:33:24,479-Speed 4991.05 samples/sec Loss 5.4505 LearningRate 0.1790 Epoch: 7 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:33:32,654-Speed 5011.17 samples/sec Loss 5.3959 LearningRate 0.1789 Epoch: 7 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:33:40,901-Speed 4967.71 samples/sec Loss 5.4476 LearningRate 0.1788 Epoch: 7 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:33:49,085-Speed 5005.18 samples/sec Loss 5.4234 LearningRate 0.1787 Epoch: 7 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:33:57,223-Speed 5033.71 samples/sec Loss 5.4259 LearningRate 0.1787 Epoch: 7 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:34:05,383-Speed 5020.37 samples/sec Loss 5.4005 LearningRate 0.1786 Epoch: 7 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:34:13,637-Speed 4963.40 samples/sec Loss 5.4090 LearningRate 0.1785 Epoch: 7 Global Step: 33270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:34:21,773-Speed 5034.77 samples/sec Loss 5.4918 LearningRate 0.1785 Epoch: 7 Global Step: 33280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:34:29,902-Speed 5039.61 samples/sec Loss 5.3694 LearningRate 0.1784 Epoch: 7 Global Step: 33290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:34:37,989-Speed 5065.31 samples/sec Loss 5.3710 LearningRate 0.1783 Epoch: 7 Global Step: 33300 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:34:46,236-Speed 4967.17 samples/sec Loss 5.4340 LearningRate 0.1782 Epoch: 7 Global Step: 33310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:34:54,384-Speed 5027.90 samples/sec Loss 5.3589 LearningRate 0.1782 Epoch: 7 Global Step: 33320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:35:02,488-Speed 5054.69 samples/sec Loss 5.3541 LearningRate 0.1781 Epoch: 7 Global Step: 33330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:35:10,661-Speed 5012.93 samples/sec Loss 5.3431 LearningRate 0.1780 Epoch: 7 Global Step: 33340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:35:18,953-Speed 4939.88 samples/sec Loss 5.3888 LearningRate 0.1780 Epoch: 7 Global Step: 33350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:35:27,149-Speed 4998.01 samples/sec Loss 5.3356 LearningRate 0.1779 Epoch: 7 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:35:35,494-Speed 4909.02 samples/sec Loss 5.3694 LearningRate 0.1778 Epoch: 7 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:36:10,258-Speed 1178.28 samples/sec Loss 5.2025 LearningRate 0.1777 Epoch: 8 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:36:19,093-Speed 4636.94 samples/sec Loss 4.8649 LearningRate 0.1777 Epoch: 8 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:36:27,786-Speed 4712.23 samples/sec Loss 4.9094 LearningRate 0.1776 Epoch: 8 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:36:36,097-Speed 4928.71 samples/sec Loss 4.9140 LearningRate 0.1775 Epoch: 8 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:36:44,245-Speed 5027.63 samples/sec Loss 4.8314 LearningRate 0.1775 Epoch: 8 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:36:52,458-Speed 4987.90 samples/sec Loss 4.8705 LearningRate 0.1774 Epoch: 8 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:37:00,622-Speed 5017.45 samples/sec Loss 4.8295 LearningRate 0.1773 Epoch: 8 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:37:08,897-Speed 4951.23 samples/sec Loss 4.8718 LearningRate 0.1773 Epoch: 8 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:37:17,163-Speed 4955.72 samples/sec Loss 4.8712 LearningRate 0.1772 Epoch: 8 Global Step: 33460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:37:25,469-Speed 4932.22 samples/sec Loss 4.9074 LearningRate 0.1771 Epoch: 8 Global Step: 33470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:37:33,782-Speed 4927.91 samples/sec Loss 4.9148 LearningRate 0.1770 Epoch: 8 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:37:42,025-Speed 4969.80 samples/sec Loss 4.8957 LearningRate 0.1770 Epoch: 8 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:37:50,219-Speed 4999.26 samples/sec Loss 4.9566 LearningRate 0.1769 Epoch: 8 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:37:58,481-Speed 4957.84 samples/sec Loss 4.9050 LearningRate 0.1768 Epoch: 8 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:38:06,763-Speed 4946.61 samples/sec Loss 4.9346 LearningRate 0.1768 Epoch: 8 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:38:15,096-Speed 4916.05 samples/sec Loss 4.9959 LearningRate 0.1767 Epoch: 8 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:38:23,379-Speed 4945.86 samples/sec Loss 5.0022 LearningRate 0.1766 Epoch: 8 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:38:31,625-Speed 4967.77 samples/sec Loss 4.9756 LearningRate 0.1765 Epoch: 8 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:38:39,942-Speed 4925.64 samples/sec Loss 4.9755 LearningRate 0.1765 Epoch: 8 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:38:48,308-Speed 4896.61 samples/sec Loss 5.0914 LearningRate 0.1764 Epoch: 8 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:38:56,541-Speed 4975.63 samples/sec Loss 5.0358 LearningRate 0.1763 Epoch: 8 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:39:04,906-Speed 4897.12 samples/sec Loss 4.9598 LearningRate 0.1763 Epoch: 8 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:39:13,114-Speed 4990.84 samples/sec Loss 5.0564 LearningRate 0.1762 Epoch: 8 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:39:21,449-Speed 4914.70 samples/sec Loss 5.0405 LearningRate 0.1761 Epoch: 8 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:39:29,786-Speed 4914.21 samples/sec Loss 5.1277 LearningRate 0.1760 Epoch: 8 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:39:38,090-Speed 4933.32 samples/sec Loss 5.0738 LearningRate 0.1760 Epoch: 8 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:39:46,933-Speed 4632.18 samples/sec Loss 5.1258 LearningRate 0.1759 Epoch: 8 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:39:55,822-Speed 4608.52 samples/sec Loss 5.0421 LearningRate 0.1758 Epoch: 8 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:40:04,432-Speed 4757.60 samples/sec Loss 5.1007 LearningRate 0.1758 Epoch: 8 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:40:12,735-Speed 4934.09 samples/sec Loss 5.1050 LearningRate 0.1757 Epoch: 8 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:40:20,974-Speed 4971.94 samples/sec Loss 5.0990 LearningRate 0.1756 Epoch: 8 Global Step: 33680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:40:29,201-Speed 4979.22 samples/sec Loss 5.1184 LearningRate 0.1756 Epoch: 8 Global Step: 33690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:40:37,696-Speed 4822.44 samples/sec Loss 5.1138 LearningRate 0.1755 Epoch: 8 Global Step: 33700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:40:46,177-Speed 4830.30 samples/sec Loss 5.1025 LearningRate 0.1754 Epoch: 8 Global Step: 33710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:40:54,983-Speed 4652.07 samples/sec Loss 5.0831 LearningRate 0.1753 Epoch: 8 Global Step: 33720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:41:03,568-Speed 4771.91 samples/sec Loss 5.0043 LearningRate 0.1753 Epoch: 8 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:41:11,787-Speed 4984.00 samples/sec Loss 5.1160 LearningRate 0.1752 Epoch: 8 Global Step: 33740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:41:20,184-Speed 4878.74 samples/sec Loss 5.1385 LearningRate 0.1751 Epoch: 8 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:41:28,379-Speed 4998.58 samples/sec Loss 5.1064 LearningRate 0.1751 Epoch: 8 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:41:36,583-Speed 4993.03 samples/sec Loss 5.1542 LearningRate 0.1750 Epoch: 8 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:41:44,764-Speed 5007.47 samples/sec Loss 5.1957 LearningRate 0.1749 Epoch: 8 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:41:53,001-Speed 4973.83 samples/sec Loss 5.1743 LearningRate 0.1748 Epoch: 8 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:42:01,365-Speed 4897.75 samples/sec Loss 5.1652 LearningRate 0.1748 Epoch: 8 Global Step: 33800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:42:09,648-Speed 4945.55 samples/sec Loss 5.1789 LearningRate 0.1747 Epoch: 8 Global Step: 33810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:42:17,860-Speed 4988.59 samples/sec Loss 5.2337 LearningRate 0.1746 Epoch: 8 Global Step: 33820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:42:26,025-Speed 5016.84 samples/sec Loss 5.2359 LearningRate 0.1746 Epoch: 8 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:42:34,243-Speed 4984.82 samples/sec Loss 5.2413 LearningRate 0.1745 Epoch: 8 Global Step: 33840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:42:42,585-Speed 4911.15 samples/sec Loss 5.2069 LearningRate 0.1744 Epoch: 8 Global Step: 33850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:42:50,785-Speed 4995.81 samples/sec Loss 5.2061 LearningRate 0.1744 Epoch: 8 Global Step: 33860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:42:58,989-Speed 4992.90 samples/sec Loss 5.2293 LearningRate 0.1743 Epoch: 8 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:43:07,090-Speed 5056.94 samples/sec Loss 5.1840 LearningRate 0.1742 Epoch: 8 Global Step: 33880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:43:15,276-Speed 5004.37 samples/sec Loss 5.2355 LearningRate 0.1741 Epoch: 8 Global Step: 33890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:43:23,487-Speed 4989.63 samples/sec Loss 5.2187 LearningRate 0.1741 Epoch: 8 Global Step: 33900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:43:31,707-Speed 4983.46 samples/sec Loss 5.2867 LearningRate 0.1740 Epoch: 8 Global Step: 33910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:43:40,117-Speed 4870.23 samples/sec Loss 5.2478 LearningRate 0.1739 Epoch: 8 Global Step: 33920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:43:48,376-Speed 4961.42 samples/sec Loss 5.2228 LearningRate 0.1739 Epoch: 8 Global Step: 33930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:43:56,564-Speed 5002.99 samples/sec Loss 5.1727 LearningRate 0.1738 Epoch: 8 Global Step: 33940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:44:04,788-Speed 4980.79 samples/sec Loss 5.2570 LearningRate 0.1737 Epoch: 8 Global Step: 33950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:44:13,083-Speed 4938.75 samples/sec Loss 5.1941 LearningRate 0.1737 Epoch: 8 Global Step: 33960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:44:21,271-Speed 5002.84 samples/sec Loss 5.2750 LearningRate 0.1736 Epoch: 8 Global Step: 33970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-17 05:44:29,459-Speed 5003.49 samples/sec Loss 5.2790 LearningRate 0.1735 Epoch: 8 Global Step: 33980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:44:37,636-Speed 5009.84 samples/sec Loss 5.2186 LearningRate 0.1734 Epoch: 8 Global Step: 33990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:44:45,793-Speed 5021.99 samples/sec Loss 5.2701 LearningRate 0.1734 Epoch: 8 Global Step: 34000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:44:53,984-Speed 5001.17 samples/sec Loss 5.2140 LearningRate 0.1733 Epoch: 8 Global Step: 34010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:45:02,175-Speed 5001.41 samples/sec Loss 5.2510 LearningRate 0.1732 Epoch: 8 Global Step: 34020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:45:10,343-Speed 5015.47 samples/sec Loss 5.2620 LearningRate 0.1732 Epoch: 8 Global Step: 34030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:45:18,522-Speed 5008.46 samples/sec Loss 5.1867 LearningRate 0.1731 Epoch: 8 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:45:26,698-Speed 5010.19 samples/sec Loss 5.1673 LearningRate 0.1730 Epoch: 8 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:45:34,921-Speed 4982.09 samples/sec Loss 5.2643 LearningRate 0.1730 Epoch: 8 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:45:43,076-Speed 5023.08 samples/sec Loss 5.2213 LearningRate 0.1729 Epoch: 8 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:45:51,330-Speed 4963.00 samples/sec Loss 5.1908 LearningRate 0.1728 Epoch: 8 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:45:59,891-Speed 4785.53 samples/sec Loss 5.2040 LearningRate 0.1727 Epoch: 8 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:46:08,123-Speed 4976.78 samples/sec Loss 5.2666 LearningRate 0.1727 Epoch: 8 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:46:16,304-Speed 5007.39 samples/sec Loss 5.2065 LearningRate 0.1726 Epoch: 8 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:46:24,587-Speed 4946.34 samples/sec Loss 5.2455 LearningRate 0.1725 Epoch: 8 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:46:32,903-Speed 4926.39 samples/sec Loss 5.2088 LearningRate 0.1725 Epoch: 8 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:46:41,205-Speed 4933.95 samples/sec Loss 5.2337 LearningRate 0.1724 Epoch: 8 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:46:49,444-Speed 4972.30 samples/sec Loss 5.2515 LearningRate 0.1723 Epoch: 8 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:46:57,656-Speed 4988.86 samples/sec Loss 5.2402 LearningRate 0.1723 Epoch: 8 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:47:05,865-Speed 4990.29 samples/sec Loss 5.2892 LearningRate 0.1722 Epoch: 8 Global Step: 34170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:47:14,080-Speed 4986.77 samples/sec Loss 5.2188 LearningRate 0.1721 Epoch: 8 Global Step: 34180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:47:22,257-Speed 5009.32 samples/sec Loss 5.1763 LearningRate 0.1720 Epoch: 8 Global Step: 34190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:47:30,298-Speed 5094.90 samples/sec Loss 5.2459 LearningRate 0.1720 Epoch: 8 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:47:38,505-Speed 4991.57 samples/sec Loss 5.2355 LearningRate 0.1719 Epoch: 8 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:47:46,720-Speed 4986.40 samples/sec Loss 5.2429 LearningRate 0.1718 Epoch: 8 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:47:54,968-Speed 4966.96 samples/sec Loss 5.2898 LearningRate 0.1718 Epoch: 8 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:48:03,155-Speed 5003.72 samples/sec Loss 5.2579 LearningRate 0.1717 Epoch: 8 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:48:11,346-Speed 5001.14 samples/sec Loss 5.3357 LearningRate 0.1716 Epoch: 8 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:48:19,660-Speed 4927.78 samples/sec Loss 5.2354 LearningRate 0.1716 Epoch: 8 Global Step: 34260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:48:27,898-Speed 4972.53 samples/sec Loss 5.2509 LearningRate 0.1715 Epoch: 8 Global Step: 34270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:48:36,225-Speed 4919.97 samples/sec Loss 5.2678 LearningRate 0.1714 Epoch: 8 Global Step: 34280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:48:44,714-Speed 4825.63 samples/sec Loss 5.3234 LearningRate 0.1713 Epoch: 8 Global Step: 34290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:48:52,969-Speed 4962.43 samples/sec Loss 5.2507 LearningRate 0.1713 Epoch: 8 Global Step: 34300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:01,229-Speed 4959.68 samples/sec Loss 5.2400 LearningRate 0.1712 Epoch: 8 Global Step: 34310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:09,442-Speed 4988.16 samples/sec Loss 5.2829 LearningRate 0.1711 Epoch: 8 Global Step: 34320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:17,609-Speed 5015.61 samples/sec Loss 5.3220 LearningRate 0.1711 Epoch: 8 Global Step: 34330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:25,801-Speed 5000.73 samples/sec Loss 5.2503 LearningRate 0.1710 Epoch: 8 Global Step: 34340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:33,964-Speed 5018.61 samples/sec Loss 5.2921 LearningRate 0.1709 Epoch: 8 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:42,182-Speed 4984.43 samples/sec Loss 5.2202 LearningRate 0.1709 Epoch: 8 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:50,406-Speed 4981.36 samples/sec Loss 5.2942 LearningRate 0.1708 Epoch: 8 Global Step: 34370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:49:58,665-Speed 4960.05 samples/sec Loss 5.2596 LearningRate 0.1707 Epoch: 8 Global Step: 34380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:50:06,807-Speed 5031.93 samples/sec Loss 5.2666 LearningRate 0.1706 Epoch: 8 Global Step: 34390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:50:14,984-Speed 5009.49 samples/sec Loss 5.2863 LearningRate 0.1706 Epoch: 8 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:50:23,186-Speed 4994.71 samples/sec Loss 5.1780 LearningRate 0.1705 Epoch: 8 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:50:31,453-Speed 4955.41 samples/sec Loss 5.1980 LearningRate 0.1704 Epoch: 8 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:50:39,686-Speed 4975.63 samples/sec Loss 5.2220 LearningRate 0.1704 Epoch: 8 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:50:47,894-Speed 4991.52 samples/sec Loss 5.2141 LearningRate 0.1703 Epoch: 8 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:50:56,099-Speed 4992.59 samples/sec Loss 5.2066 LearningRate 0.1702 Epoch: 8 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:51:04,426-Speed 4920.03 samples/sec Loss 5.2445 LearningRate 0.1702 Epoch: 8 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:51:12,746-Speed 4923.53 samples/sec Loss 5.2789 LearningRate 0.1701 Epoch: 8 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:51:20,981-Speed 4974.18 samples/sec Loss 5.2433 LearningRate 0.1700 Epoch: 8 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:51:29,245-Speed 4957.38 samples/sec Loss 5.2581 LearningRate 0.1700 Epoch: 8 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:51:37,437-Speed 5000.16 samples/sec Loss 5.2633 LearningRate 0.1699 Epoch: 8 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:51:45,667-Speed 4978.04 samples/sec Loss 5.2798 LearningRate 0.1698 Epoch: 8 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:51:53,767-Speed 5056.91 samples/sec Loss 5.1856 LearningRate 0.1697 Epoch: 8 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:52:01,859-Speed 5062.59 samples/sec Loss 5.1781 LearningRate 0.1697 Epoch: 8 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:52:10,014-Speed 5023.40 samples/sec Loss 5.2177 LearningRate 0.1696 Epoch: 8 Global Step: 34540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:52:18,305-Speed 4940.86 samples/sec Loss 5.2991 LearningRate 0.1695 Epoch: 8 Global Step: 34550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:52:26,603-Speed 4937.39 samples/sec Loss 5.2486 LearningRate 0.1695 Epoch: 8 Global Step: 34560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:52:34,740-Speed 5034.09 samples/sec Loss 5.2161 LearningRate 0.1694 Epoch: 8 Global Step: 34570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:52:43,003-Speed 4958.20 samples/sec Loss 5.3086 LearningRate 0.1693 Epoch: 8 Global Step: 34580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:52:51,201-Speed 4996.60 samples/sec Loss 5.1860 LearningRate 0.1693 Epoch: 8 Global Step: 34590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:52:59,423-Speed 4982.47 samples/sec Loss 5.2429 LearningRate 0.1692 Epoch: 8 Global Step: 34600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:53:07,630-Speed 4991.47 samples/sec Loss 5.2232 LearningRate 0.1691 Epoch: 8 Global Step: 34610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:53:15,807-Speed 5009.67 samples/sec Loss 5.2034 LearningRate 0.1691 Epoch: 8 Global Step: 34620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 05:53:24,353-Speed 4793.49 samples/sec Loss 5.2563 LearningRate 0.1690 Epoch: 8 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:53:32,694-Speed 4911.60 samples/sec Loss 5.2025 LearningRate 0.1689 Epoch: 8 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:53:40,853-Speed 5021.23 samples/sec Loss 5.1911 LearningRate 0.1688 Epoch: 8 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:53:49,308-Speed 4845.11 samples/sec Loss 5.2137 LearningRate 0.1688 Epoch: 8 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:53:57,501-Speed 4999.64 samples/sec Loss 5.2575 LearningRate 0.1687 Epoch: 8 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:54:05,785-Speed 4945.15 samples/sec Loss 5.2554 LearningRate 0.1686 Epoch: 8 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:54:13,945-Speed 5020.50 samples/sec Loss 5.1721 LearningRate 0.1686 Epoch: 8 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:54:22,083-Speed 5034.02 samples/sec Loss 5.2264 LearningRate 0.1685 Epoch: 8 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:54:30,249-Speed 5016.18 samples/sec Loss 5.2295 LearningRate 0.1684 Epoch: 8 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:54:38,459-Speed 4990.13 samples/sec Loss 5.2486 LearningRate 0.1684 Epoch: 8 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:54:46,616-Speed 5022.06 samples/sec Loss 5.1429 LearningRate 0.1683 Epoch: 8 Global Step: 34730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:54:54,878-Speed 4958.19 samples/sec Loss 5.1760 LearningRate 0.1682 Epoch: 8 Global Step: 34740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:55:03,042-Speed 5017.57 samples/sec Loss 5.1668 LearningRate 0.1682 Epoch: 8 Global Step: 34750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:55:11,264-Speed 4982.19 samples/sec Loss 5.1826 LearningRate 0.1681 Epoch: 8 Global Step: 34760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 05:55:19,423-Speed 5021.15 samples/sec Loss 5.1550 LearningRate 0.1680 Epoch: 8 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:55:27,613-Speed 5002.01 samples/sec Loss 5.1954 LearningRate 0.1679 Epoch: 8 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:55:35,934-Speed 4923.31 samples/sec Loss 5.1579 LearningRate 0.1679 Epoch: 8 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:55:44,833-Speed 4603.20 samples/sec Loss 5.1866 LearningRate 0.1678 Epoch: 8 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:55:53,898-Speed 4519.19 samples/sec Loss 5.2050 LearningRate 0.1677 Epoch: 8 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:56:02,325-Speed 4860.63 samples/sec Loss 5.2304 LearningRate 0.1677 Epoch: 8 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:56:10,553-Speed 4979.39 samples/sec Loss 5.1822 LearningRate 0.1676 Epoch: 8 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:56:19,030-Speed 4832.53 samples/sec Loss 5.2189 LearningRate 0.1675 Epoch: 8 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:56:27,305-Speed 4950.28 samples/sec Loss 5.1873 LearningRate 0.1675 Epoch: 8 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:56:35,626-Speed 4923.17 samples/sec Loss 5.1834 LearningRate 0.1674 Epoch: 8 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:56:43,894-Speed 4954.50 samples/sec Loss 5.2215 LearningRate 0.1673 Epoch: 8 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:56:52,294-Speed 4877.21 samples/sec Loss 5.1636 LearningRate 0.1673 Epoch: 8 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:00,594-Speed 4935.45 samples/sec Loss 5.2135 LearningRate 0.1672 Epoch: 8 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:08,997-Speed 4875.11 samples/sec Loss 5.1715 LearningRate 0.1671 Epoch: 8 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:17,294-Speed 4937.00 samples/sec Loss 5.1962 LearningRate 0.1671 Epoch: 8 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:25,443-Speed 5027.68 samples/sec Loss 5.2466 LearningRate 0.1670 Epoch: 8 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:33,703-Speed 4959.09 samples/sec Loss 5.1792 LearningRate 0.1669 Epoch: 8 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:42,297-Speed 4766.88 samples/sec Loss 5.1993 LearningRate 0.1668 Epoch: 8 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:50,504-Speed 4991.58 samples/sec Loss 5.1849 LearningRate 0.1668 Epoch: 8 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:57:58,853-Speed 4906.25 samples/sec Loss 5.2179 LearningRate 0.1667 Epoch: 8 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:58:07,164-Speed 4929.18 samples/sec Loss 5.1704 LearningRate 0.1666 Epoch: 8 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:58:15,291-Speed 5040.24 samples/sec Loss 5.1674 LearningRate 0.1666 Epoch: 8 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:58:23,672-Speed 4888.34 samples/sec Loss 5.1835 LearningRate 0.1665 Epoch: 8 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:58:31,818-Speed 5028.84 samples/sec Loss 5.2502 LearningRate 0.1664 Epoch: 8 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 05:59:18,780-[lfw][35000]XNorm: 22.709657 Training: 2022-01-17 05:59:18,780-[lfw][35000]Accuracy-Flip: 0.99750+-0.00335 Training: 2022-01-17 05:59:18,781-[lfw][35000]Accuracy-Highest: 0.99817 Training: 2022-01-17 06:00:13,358-[cfp_fp][35000]XNorm: 20.002396 Training: 2022-01-17 06:00:13,359-[cfp_fp][35000]Accuracy-Flip: 0.97657+-0.00718 Training: 2022-01-17 06:00:13,359-[cfp_fp][35000]Accuracy-Highest: 0.97657 Training: 2022-01-17 06:01:00,621-[agedb_30][35000]XNorm: 21.917932 Training: 2022-01-17 06:01:00,622-[agedb_30][35000]Accuracy-Flip: 0.97200+-0.00865 Training: 2022-01-17 06:01:00,622-[agedb_30][35000]Accuracy-Highest: 0.97450 Training: 2022-01-17 06:01:08,789-Speed 260.94 samples/sec Loss 5.1691 LearningRate 0.1664 Epoch: 8 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:01:17,151-Speed 4899.31 samples/sec Loss 5.1924 LearningRate 0.1663 Epoch: 8 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:01:25,399-Speed 4966.30 samples/sec Loss 5.1600 LearningRate 0.1662 Epoch: 8 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:01:33,507-Speed 5052.50 samples/sec Loss 5.2488 LearningRate 0.1662 Epoch: 8 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:01:41,591-Speed 5067.14 samples/sec Loss 5.2182 LearningRate 0.1661 Epoch: 8 Global Step: 35050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:01:49,733-Speed 5032.19 samples/sec Loss 5.1940 LearningRate 0.1660 Epoch: 8 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:01:58,029-Speed 4938.26 samples/sec Loss 5.2238 LearningRate 0.1660 Epoch: 8 Global Step: 35070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 06:02:06,812-Speed 4663.93 samples/sec Loss 5.2154 LearningRate 0.1659 Epoch: 8 Global Step: 35080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 06:02:15,348-Speed 4799.49 samples/sec Loss 5.2060 LearningRate 0.1658 Epoch: 8 Global Step: 35090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 06:02:23,465-Speed 5046.47 samples/sec Loss 5.1877 LearningRate 0.1657 Epoch: 8 Global Step: 35100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 06:02:31,613-Speed 5027.47 samples/sec Loss 5.2447 LearningRate 0.1657 Epoch: 8 Global Step: 35110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:02:39,742-Speed 5039.49 samples/sec Loss 5.2442 LearningRate 0.1656 Epoch: 8 Global Step: 35120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:02:47,938-Speed 4997.87 samples/sec Loss 5.1922 LearningRate 0.1655 Epoch: 8 Global Step: 35130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:02:56,102-Speed 5017.89 samples/sec Loss 5.1523 LearningRate 0.1655 Epoch: 8 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:03:04,295-Speed 4999.95 samples/sec Loss 5.1449 LearningRate 0.1654 Epoch: 8 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:03:12,529-Speed 4975.61 samples/sec Loss 5.1773 LearningRate 0.1653 Epoch: 8 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:03:20,653-Speed 5042.23 samples/sec Loss 5.1476 LearningRate 0.1653 Epoch: 8 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:03:28,830-Speed 5009.63 samples/sec Loss 5.1291 LearningRate 0.1652 Epoch: 8 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:03:37,022-Speed 5000.65 samples/sec Loss 5.1517 LearningRate 0.1651 Epoch: 8 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:03:45,290-Speed 4955.04 samples/sec Loss 5.1648 LearningRate 0.1651 Epoch: 8 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:03:53,531-Speed 4971.20 samples/sec Loss 5.1970 LearningRate 0.1650 Epoch: 8 Global Step: 35210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:01,728-Speed 4997.36 samples/sec Loss 5.2205 LearningRate 0.1649 Epoch: 8 Global Step: 35220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:09,948-Speed 4982.89 samples/sec Loss 5.1324 LearningRate 0.1649 Epoch: 8 Global Step: 35230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:18,123-Speed 5011.27 samples/sec Loss 5.1365 LearningRate 0.1648 Epoch: 8 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:26,294-Speed 5013.86 samples/sec Loss 5.1491 LearningRate 0.1647 Epoch: 8 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:34,492-Speed 4996.75 samples/sec Loss 5.1358 LearningRate 0.1646 Epoch: 8 Global Step: 35260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:42,655-Speed 5018.48 samples/sec Loss 5.1436 LearningRate 0.1646 Epoch: 8 Global Step: 35270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:50,814-Speed 5021.41 samples/sec Loss 5.1714 LearningRate 0.1645 Epoch: 8 Global Step: 35280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:04:59,010-Speed 4998.37 samples/sec Loss 5.1846 LearningRate 0.1644 Epoch: 8 Global Step: 35290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:05:07,170-Speed 5019.72 samples/sec Loss 5.1438 LearningRate 0.1644 Epoch: 8 Global Step: 35300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:05:15,340-Speed 5014.27 samples/sec Loss 5.1378 LearningRate 0.1643 Epoch: 8 Global Step: 35310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:05:23,537-Speed 4997.71 samples/sec Loss 5.1987 LearningRate 0.1642 Epoch: 8 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:05:31,735-Speed 4997.13 samples/sec Loss 5.1611 LearningRate 0.1642 Epoch: 8 Global Step: 35330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:05:40,078-Speed 4909.74 samples/sec Loss 5.1017 LearningRate 0.1641 Epoch: 8 Global Step: 35340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:05:48,267-Speed 5002.58 samples/sec Loss 5.0809 LearningRate 0.1640 Epoch: 8 Global Step: 35350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:05:56,518-Speed 4964.78 samples/sec Loss 5.0831 LearningRate 0.1640 Epoch: 8 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:06:04,759-Speed 4971.07 samples/sec Loss 5.1563 LearningRate 0.1639 Epoch: 8 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:06:12,996-Speed 4973.56 samples/sec Loss 5.1472 LearningRate 0.1638 Epoch: 8 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:06:21,229-Speed 4975.47 samples/sec Loss 5.2078 LearningRate 0.1638 Epoch: 8 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:06:29,405-Speed 5010.44 samples/sec Loss 5.1701 LearningRate 0.1637 Epoch: 8 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:06:37,517-Speed 5050.42 samples/sec Loss 5.1298 LearningRate 0.1636 Epoch: 8 Global Step: 35410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-17 06:06:45,683-Speed 5016.43 samples/sec Loss 5.1225 LearningRate 0.1636 Epoch: 8 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:06:53,930-Speed 4967.32 samples/sec Loss 5.0923 LearningRate 0.1635 Epoch: 8 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:07:02,237-Speed 4930.99 samples/sec Loss 5.1578 LearningRate 0.1634 Epoch: 8 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-17 06:07:10,444-Speed 4992.20 samples/sec Loss 5.1598 LearningRate 0.1634 Epoch: 8 Global Step: 35450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:07:18,590-Speed 5028.20 samples/sec Loss 5.1144 LearningRate 0.1633 Epoch: 8 Global Step: 35460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-17 06:07:26,752-Speed 5019.51 samples/sec Loss 5.1371 LearningRate 0.1632 Epoch: 8 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:07:35,013-Speed 4958.70 samples/sec Loss 5.1419 LearningRate 0.1631 Epoch: 8 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:07:43,201-Speed 5003.40 samples/sec Loss 5.0550 LearningRate 0.1631 Epoch: 8 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:07:51,387-Speed 5004.22 samples/sec Loss 5.1354 LearningRate 0.1630 Epoch: 8 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:07:59,562-Speed 5010.54 samples/sec Loss 5.1550 LearningRate 0.1629 Epoch: 8 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:08:07,861-Speed 4936.66 samples/sec Loss 5.2100 LearningRate 0.1629 Epoch: 8 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:08:16,064-Speed 4993.97 samples/sec Loss 5.1484 LearningRate 0.1628 Epoch: 8 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:08:24,248-Speed 5005.59 samples/sec Loss 5.1310 LearningRate 0.1627 Epoch: 8 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:08:32,490-Speed 4970.37 samples/sec Loss 5.0950 LearningRate 0.1627 Epoch: 8 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:08:40,725-Speed 4974.31 samples/sec Loss 5.1359 LearningRate 0.1626 Epoch: 8 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:08:48,967-Speed 4970.07 samples/sec Loss 5.1659 LearningRate 0.1625 Epoch: 8 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:08:57,225-Speed 4960.99 samples/sec Loss 5.1519 LearningRate 0.1625 Epoch: 8 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:09:05,407-Speed 5006.49 samples/sec Loss 5.0884 LearningRate 0.1624 Epoch: 8 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:09:13,624-Speed 4985.46 samples/sec Loss 5.0984 LearningRate 0.1623 Epoch: 8 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:09:21,770-Speed 5028.88 samples/sec Loss 5.1366 LearningRate 0.1623 Epoch: 8 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:09:30,002-Speed 4975.98 samples/sec Loss 5.1121 LearningRate 0.1622 Epoch: 8 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:09:38,233-Speed 4977.45 samples/sec Loss 5.0101 LearningRate 0.1621 Epoch: 8 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:09:46,559-Speed 4919.84 samples/sec Loss 5.0716 LearningRate 0.1621 Epoch: 8 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:09:54,813-Speed 4962.86 samples/sec Loss 5.0581 LearningRate 0.1620 Epoch: 8 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:10:03,154-Speed 4911.70 samples/sec Loss 5.1159 LearningRate 0.1619 Epoch: 8 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:10:11,499-Speed 4908.74 samples/sec Loss 5.1315 LearningRate 0.1619 Epoch: 8 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:10:19,911-Speed 4870.01 samples/sec Loss 5.0631 LearningRate 0.1618 Epoch: 8 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:10:28,175-Speed 4956.86 samples/sec Loss 5.1338 LearningRate 0.1617 Epoch: 8 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:10:36,739-Speed 4783.67 samples/sec Loss 5.1051 LearningRate 0.1617 Epoch: 8 Global Step: 35700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:10:45,224-Speed 4828.11 samples/sec Loss 5.0930 LearningRate 0.1616 Epoch: 8 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:10:53,441-Speed 4985.20 samples/sec Loss 5.0749 LearningRate 0.1615 Epoch: 8 Global Step: 35720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:11:01,617-Speed 5010.35 samples/sec Loss 5.1151 LearningRate 0.1615 Epoch: 8 Global Step: 35730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:11:09,943-Speed 4920.18 samples/sec Loss 5.0952 LearningRate 0.1614 Epoch: 8 Global Step: 35740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:11:18,412-Speed 4837.14 samples/sec Loss 5.0644 LearningRate 0.1613 Epoch: 8 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:11:26,876-Speed 4839.91 samples/sec Loss 5.0302 LearningRate 0.1612 Epoch: 8 Global Step: 35760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:11:35,077-Speed 4995.38 samples/sec Loss 5.0445 LearningRate 0.1612 Epoch: 8 Global Step: 35770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:11:43,383-Speed 4931.61 samples/sec Loss 5.0643 LearningRate 0.1611 Epoch: 8 Global Step: 35780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:11:51,597-Speed 4987.84 samples/sec Loss 5.0593 LearningRate 0.1610 Epoch: 8 Global Step: 35790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:11:59,744-Speed 5027.95 samples/sec Loss 5.1574 LearningRate 0.1610 Epoch: 8 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:12:07,932-Speed 5003.40 samples/sec Loss 5.0995 LearningRate 0.1609 Epoch: 8 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:12:16,101-Speed 5014.87 samples/sec Loss 5.1023 LearningRate 0.1608 Epoch: 8 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:12:24,294-Speed 5000.03 samples/sec Loss 5.0964 LearningRate 0.1608 Epoch: 8 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:12:32,512-Speed 4984.57 samples/sec Loss 5.0554 LearningRate 0.1607 Epoch: 8 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:12:40,796-Speed 4945.21 samples/sec Loss 5.0384 LearningRate 0.1606 Epoch: 8 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:12:49,179-Speed 4887.03 samples/sec Loss 5.1056 LearningRate 0.1606 Epoch: 8 Global Step: 35860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:12:57,657-Speed 4831.56 samples/sec Loss 5.1051 LearningRate 0.1605 Epoch: 8 Global Step: 35870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:13:05,841-Speed 5005.53 samples/sec Loss 5.0875 LearningRate 0.1604 Epoch: 8 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:13:14,087-Speed 4967.74 samples/sec Loss 5.0590 LearningRate 0.1604 Epoch: 8 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:13:22,695-Speed 4759.18 samples/sec Loss 5.0877 LearningRate 0.1603 Epoch: 8 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:13:30,999-Speed 4933.17 samples/sec Loss 5.0765 LearningRate 0.1602 Epoch: 8 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:13:39,280-Speed 4946.93 samples/sec Loss 5.1031 LearningRate 0.1602 Epoch: 8 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:13:47,554-Speed 4951.23 samples/sec Loss 5.1286 LearningRate 0.1601 Epoch: 8 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:13:56,067-Speed 4812.00 samples/sec Loss 5.0631 LearningRate 0.1600 Epoch: 8 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:14:04,524-Speed 4843.89 samples/sec Loss 5.0331 LearningRate 0.1600 Epoch: 8 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:14:12,716-Speed 5000.47 samples/sec Loss 5.0950 LearningRate 0.1599 Epoch: 8 Global Step: 35960 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:14:20,931-Speed 4986.64 samples/sec Loss 5.0374 LearningRate 0.1598 Epoch: 8 Global Step: 35970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:14:29,129-Speed 4997.17 samples/sec Loss 5.0723 LearningRate 0.1598 Epoch: 8 Global Step: 35980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:14:37,335-Speed 4992.17 samples/sec Loss 5.0625 LearningRate 0.1597 Epoch: 8 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:14:45,523-Speed 5003.10 samples/sec Loss 5.0660 LearningRate 0.1596 Epoch: 8 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:14:53,749-Speed 4980.05 samples/sec Loss 5.0197 LearningRate 0.1596 Epoch: 8 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:15:02,041-Speed 4940.70 samples/sec Loss 5.0024 LearningRate 0.1595 Epoch: 8 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:15:10,412-Speed 4893.80 samples/sec Loss 5.0762 LearningRate 0.1594 Epoch: 8 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:15:18,728-Speed 4925.85 samples/sec Loss 5.1202 LearningRate 0.1594 Epoch: 8 Global Step: 36040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:15:27,118-Speed 4883.02 samples/sec Loss 5.0193 LearningRate 0.1593 Epoch: 8 Global Step: 36050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:15:35,409-Speed 4940.57 samples/sec Loss 5.0318 LearningRate 0.1592 Epoch: 8 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:15:43,672-Speed 4957.75 samples/sec Loss 5.0083 LearningRate 0.1592 Epoch: 8 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:15:52,144-Speed 4835.77 samples/sec Loss 5.0875 LearningRate 0.1591 Epoch: 8 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:16:00,419-Speed 4950.13 samples/sec Loss 5.0718 LearningRate 0.1590 Epoch: 8 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:16:08,686-Speed 4955.11 samples/sec Loss 5.0552 LearningRate 0.1590 Epoch: 8 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:16:16,954-Speed 4954.92 samples/sec Loss 5.0049 LearningRate 0.1589 Epoch: 8 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:16:25,196-Speed 4969.93 samples/sec Loss 5.0337 LearningRate 0.1588 Epoch: 8 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:16:33,454-Speed 4961.13 samples/sec Loss 5.0462 LearningRate 0.1588 Epoch: 8 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:16:41,664-Speed 4989.50 samples/sec Loss 5.0635 LearningRate 0.1587 Epoch: 8 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:16:49,885-Speed 4982.92 samples/sec Loss 5.0247 LearningRate 0.1586 Epoch: 8 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:16:58,133-Speed 4966.72 samples/sec Loss 5.0278 LearningRate 0.1586 Epoch: 8 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:17:06,901-Speed 4672.24 samples/sec Loss 5.0067 LearningRate 0.1585 Epoch: 8 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:17:15,686-Speed 4662.74 samples/sec Loss 4.9763 LearningRate 0.1584 Epoch: 8 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:17:24,506-Speed 4644.83 samples/sec Loss 5.0367 LearningRate 0.1584 Epoch: 8 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:17:33,383-Speed 4614.82 samples/sec Loss 5.0168 LearningRate 0.1583 Epoch: 8 Global Step: 36200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:17:42,067-Speed 4717.21 samples/sec Loss 5.0707 LearningRate 0.1582 Epoch: 8 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:17:50,400-Speed 4916.39 samples/sec Loss 5.0548 LearningRate 0.1582 Epoch: 8 Global Step: 36220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:17:58,875-Speed 4833.21 samples/sec Loss 5.0136 LearningRate 0.1581 Epoch: 8 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:18:06,996-Speed 5044.74 samples/sec Loss 5.0426 LearningRate 0.1580 Epoch: 8 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:18:15,244-Speed 4966.63 samples/sec Loss 5.0001 LearningRate 0.1580 Epoch: 8 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:18:23,468-Speed 4981.74 samples/sec Loss 5.0387 LearningRate 0.1579 Epoch: 8 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:18:32,166-Speed 4710.06 samples/sec Loss 5.0171 LearningRate 0.1578 Epoch: 8 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:18:41,044-Speed 4613.73 samples/sec Loss 5.1121 LearningRate 0.1578 Epoch: 8 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:18:49,423-Speed 4889.19 samples/sec Loss 4.9947 LearningRate 0.1577 Epoch: 8 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:18:57,554-Speed 5038.20 samples/sec Loss 5.0637 LearningRate 0.1576 Epoch: 8 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:19:05,812-Speed 4960.88 samples/sec Loss 5.0597 LearningRate 0.1576 Epoch: 8 Global Step: 36310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:19:14,075-Speed 4957.78 samples/sec Loss 5.0188 LearningRate 0.1575 Epoch: 8 Global Step: 36320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:19:22,384-Speed 4930.12 samples/sec Loss 5.0470 LearningRate 0.1574 Epoch: 8 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:19:30,951-Speed 4782.17 samples/sec Loss 4.9790 LearningRate 0.1574 Epoch: 8 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:19:39,148-Speed 4997.57 samples/sec Loss 4.9590 LearningRate 0.1573 Epoch: 8 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:19:47,366-Speed 4984.52 samples/sec Loss 4.9878 LearningRate 0.1572 Epoch: 8 Global Step: 36360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:19:55,575-Speed 4990.68 samples/sec Loss 4.9961 LearningRate 0.1572 Epoch: 8 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:20:03,745-Speed 5014.60 samples/sec Loss 5.0366 LearningRate 0.1571 Epoch: 8 Global Step: 36380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:20:11,846-Speed 5056.34 samples/sec Loss 5.0369 LearningRate 0.1570 Epoch: 8 Global Step: 36390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:20:20,084-Speed 4973.08 samples/sec Loss 4.9437 LearningRate 0.1569 Epoch: 8 Global Step: 36400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:20:28,303-Speed 4983.85 samples/sec Loss 5.0116 LearningRate 0.1569 Epoch: 8 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:20:36,493-Speed 5002.21 samples/sec Loss 5.0107 LearningRate 0.1568 Epoch: 8 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:20:44,804-Speed 4928.67 samples/sec Loss 4.9342 LearningRate 0.1567 Epoch: 8 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:20:53,213-Speed 4871.57 samples/sec Loss 4.9199 LearningRate 0.1567 Epoch: 8 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:01,455-Speed 4970.59 samples/sec Loss 5.0135 LearningRate 0.1566 Epoch: 8 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:09,754-Speed 4936.02 samples/sec Loss 4.9477 LearningRate 0.1565 Epoch: 8 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:18,069-Speed 4927.11 samples/sec Loss 5.0167 LearningRate 0.1565 Epoch: 8 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:26,251-Speed 5006.08 samples/sec Loss 4.9577 LearningRate 0.1564 Epoch: 8 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:34,475-Speed 4981.28 samples/sec Loss 5.0082 LearningRate 0.1563 Epoch: 8 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:42,685-Speed 4990.10 samples/sec Loss 5.0223 LearningRate 0.1563 Epoch: 8 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:50,878-Speed 4999.54 samples/sec Loss 5.0027 LearningRate 0.1562 Epoch: 8 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:21:59,268-Speed 4882.77 samples/sec Loss 4.9942 LearningRate 0.1562 Epoch: 8 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:22:08,076-Speed 4651.15 samples/sec Loss 4.9777 LearningRate 0.1561 Epoch: 8 Global Step: 36530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:22:16,264-Speed 5002.54 samples/sec Loss 5.0253 LearningRate 0.1560 Epoch: 8 Global Step: 36540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:22:24,499-Speed 4974.84 samples/sec Loss 5.0186 LearningRate 0.1560 Epoch: 8 Global Step: 36550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:22:32,799-Speed 4935.41 samples/sec Loss 5.0367 LearningRate 0.1559 Epoch: 8 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:22:40,949-Speed 5026.34 samples/sec Loss 5.0429 LearningRate 0.1558 Epoch: 8 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:22:49,207-Speed 4960.50 samples/sec Loss 4.9700 LearningRate 0.1558 Epoch: 8 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:22:57,390-Speed 5006.50 samples/sec Loss 4.9314 LearningRate 0.1557 Epoch: 8 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:23:05,719-Speed 4918.26 samples/sec Loss 5.0285 LearningRate 0.1556 Epoch: 8 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:23:13,840-Speed 5044.41 samples/sec Loss 4.9955 LearningRate 0.1556 Epoch: 8 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:23:22,096-Speed 4962.12 samples/sec Loss 4.9398 LearningRate 0.1555 Epoch: 8 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:23:30,209-Speed 5049.24 samples/sec Loss 5.0125 LearningRate 0.1554 Epoch: 8 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:23:38,322-Speed 5049.50 samples/sec Loss 4.9881 LearningRate 0.1554 Epoch: 8 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:23:46,464-Speed 5031.01 samples/sec Loss 4.9264 LearningRate 0.1553 Epoch: 8 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:23:54,561-Speed 5059.67 samples/sec Loss 5.0227 LearningRate 0.1552 Epoch: 8 Global Step: 36660 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:24:02,771-Speed 4989.62 samples/sec Loss 4.9452 LearningRate 0.1552 Epoch: 8 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:24:11,029-Speed 4960.89 samples/sec Loss 4.9765 LearningRate 0.1551 Epoch: 8 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:24:19,309-Speed 4946.95 samples/sec Loss 4.8908 LearningRate 0.1550 Epoch: 8 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:24:27,548-Speed 4972.10 samples/sec Loss 4.9297 LearningRate 0.1550 Epoch: 8 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:24:35,742-Speed 4999.85 samples/sec Loss 4.9640 LearningRate 0.1549 Epoch: 8 Global Step: 36710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:24:43,893-Speed 5025.45 samples/sec Loss 4.9605 LearningRate 0.1548 Epoch: 8 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:24:52,101-Speed 4991.15 samples/sec Loss 4.9859 LearningRate 0.1548 Epoch: 8 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:25:00,587-Speed 4827.48 samples/sec Loss 4.9513 LearningRate 0.1547 Epoch: 8 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:25:09,173-Speed 4771.16 samples/sec Loss 4.9404 LearningRate 0.1546 Epoch: 8 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:25:17,557-Speed 4886.18 samples/sec Loss 4.9699 LearningRate 0.1546 Epoch: 8 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:25:25,955-Speed 4878.08 samples/sec Loss 4.9578 LearningRate 0.1545 Epoch: 8 Global Step: 36770 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:25:34,545-Speed 4768.75 samples/sec Loss 4.9656 LearningRate 0.1544 Epoch: 8 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:25:43,350-Speed 4652.59 samples/sec Loss 4.9535 LearningRate 0.1544 Epoch: 8 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:25:52,201-Speed 4628.12 samples/sec Loss 4.9474 LearningRate 0.1543 Epoch: 8 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:00,924-Speed 4696.18 samples/sec Loss 4.9567 LearningRate 0.1542 Epoch: 8 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:09,202-Speed 4948.87 samples/sec Loss 4.9944 LearningRate 0.1542 Epoch: 8 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:17,480-Speed 4948.94 samples/sec Loss 4.9514 LearningRate 0.1541 Epoch: 8 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:25,774-Speed 4939.02 samples/sec Loss 4.9322 LearningRate 0.1540 Epoch: 8 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:34,163-Speed 4883.28 samples/sec Loss 4.9652 LearningRate 0.1540 Epoch: 8 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:42,566-Speed 4875.13 samples/sec Loss 4.9448 LearningRate 0.1539 Epoch: 8 Global Step: 36860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:50,951-Speed 4885.73 samples/sec Loss 4.9399 LearningRate 0.1538 Epoch: 8 Global Step: 36870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:26:59,381-Speed 4859.07 samples/sec Loss 4.9036 LearningRate 0.1538 Epoch: 8 Global Step: 36880 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:27:07,751-Speed 4894.53 samples/sec Loss 4.9459 LearningRate 0.1537 Epoch: 8 Global Step: 36890 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:27:16,120-Speed 4894.77 samples/sec Loss 4.9557 LearningRate 0.1536 Epoch: 8 Global Step: 36900 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:27:24,500-Speed 4888.46 samples/sec Loss 4.9422 LearningRate 0.1536 Epoch: 8 Global Step: 36910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:27:32,917-Speed 4866.88 samples/sec Loss 4.9066 LearningRate 0.1535 Epoch: 8 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:27:41,354-Speed 4855.81 samples/sec Loss 4.9107 LearningRate 0.1534 Epoch: 8 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:27:50,188-Speed 4636.97 samples/sec Loss 4.9466 LearningRate 0.1534 Epoch: 8 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:27:58,543-Speed 4903.18 samples/sec Loss 4.9060 LearningRate 0.1533 Epoch: 8 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:28:06,809-Speed 4955.71 samples/sec Loss 4.9437 LearningRate 0.1532 Epoch: 8 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:28:15,547-Speed 4688.03 samples/sec Loss 4.9102 LearningRate 0.1532 Epoch: 8 Global Step: 36970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:28:24,081-Speed 4800.29 samples/sec Loss 4.9058 LearningRate 0.1531 Epoch: 8 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:28:32,449-Speed 4895.61 samples/sec Loss 4.8995 LearningRate 0.1530 Epoch: 8 Global Step: 36990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:28:40,853-Speed 4874.68 samples/sec Loss 4.9111 LearningRate 0.1530 Epoch: 8 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:28:49,352-Speed 4819.78 samples/sec Loss 4.9009 LearningRate 0.1529 Epoch: 8 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:28:57,739-Speed 4884.48 samples/sec Loss 4.9175 LearningRate 0.1528 Epoch: 8 Global Step: 37020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:29:06,143-Speed 4873.80 samples/sec Loss 4.9112 LearningRate 0.1528 Epoch: 8 Global Step: 37030 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:29:14,666-Speed 4806.92 samples/sec Loss 4.9031 LearningRate 0.1527 Epoch: 8 Global Step: 37040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:29:23,352-Speed 4716.06 samples/sec Loss 4.9096 LearningRate 0.1526 Epoch: 8 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:29:32,025-Speed 4723.02 samples/sec Loss 4.9325 LearningRate 0.1526 Epoch: 8 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:29:40,678-Speed 4734.58 samples/sec Loss 4.8947 LearningRate 0.1525 Epoch: 8 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:29:48,885-Speed 4991.16 samples/sec Loss 4.9173 LearningRate 0.1524 Epoch: 8 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:29:57,121-Speed 4974.56 samples/sec Loss 4.8687 LearningRate 0.1524 Epoch: 8 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:30:05,475-Speed 4903.06 samples/sec Loss 4.8767 LearningRate 0.1523 Epoch: 8 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:30:13,725-Speed 4965.94 samples/sec Loss 4.9071 LearningRate 0.1522 Epoch: 8 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:30:22,023-Speed 4936.54 samples/sec Loss 4.8984 LearningRate 0.1522 Epoch: 8 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:30:30,326-Speed 4933.55 samples/sec Loss 4.9458 LearningRate 0.1521 Epoch: 8 Global Step: 37130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:30:38,763-Speed 4855.50 samples/sec Loss 4.9313 LearningRate 0.1521 Epoch: 8 Global Step: 37140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:30:47,069-Speed 4931.94 samples/sec Loss 4.9047 LearningRate 0.1520 Epoch: 8 Global Step: 37150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:30:55,316-Speed 4967.67 samples/sec Loss 4.9455 LearningRate 0.1519 Epoch: 8 Global Step: 37160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:31:03,595-Speed 4948.25 samples/sec Loss 4.9186 LearningRate 0.1519 Epoch: 8 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:31:11,815-Speed 4983.38 samples/sec Loss 4.8237 LearningRate 0.1518 Epoch: 8 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:31:20,191-Speed 4890.37 samples/sec Loss 4.8948 LearningRate 0.1517 Epoch: 8 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:31:28,448-Speed 4961.77 samples/sec Loss 4.8548 LearningRate 0.1517 Epoch: 8 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:31:36,682-Speed 4975.17 samples/sec Loss 4.9458 LearningRate 0.1516 Epoch: 8 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:31:44,908-Speed 4979.63 samples/sec Loss 4.9159 LearningRate 0.1515 Epoch: 8 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:31:53,283-Speed 4892.02 samples/sec Loss 4.8922 LearningRate 0.1515 Epoch: 8 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:32:01,528-Speed 4968.28 samples/sec Loss 4.8853 LearningRate 0.1514 Epoch: 8 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:32:09,846-Speed 4924.88 samples/sec Loss 4.8765 LearningRate 0.1513 Epoch: 8 Global Step: 37250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:32:18,126-Speed 4947.26 samples/sec Loss 4.9111 LearningRate 0.1513 Epoch: 8 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:32:26,464-Speed 4913.48 samples/sec Loss 4.8999 LearningRate 0.1512 Epoch: 8 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:32:34,722-Speed 4960.64 samples/sec Loss 4.9148 LearningRate 0.1511 Epoch: 8 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:32:42,986-Speed 4957.18 samples/sec Loss 4.8828 LearningRate 0.1511 Epoch: 8 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:32:51,258-Speed 4951.72 samples/sec Loss 4.8538 LearningRate 0.1510 Epoch: 8 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:32:59,714-Speed 4845.01 samples/sec Loss 4.8487 LearningRate 0.1509 Epoch: 8 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:33:08,154-Speed 4853.78 samples/sec Loss 4.9129 LearningRate 0.1509 Epoch: 8 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:33:16,480-Speed 4919.72 samples/sec Loss 4.9079 LearningRate 0.1508 Epoch: 8 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:33:24,792-Speed 4928.61 samples/sec Loss 4.8790 LearningRate 0.1507 Epoch: 8 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:33:33,039-Speed 4966.99 samples/sec Loss 4.8142 LearningRate 0.1507 Epoch: 8 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:33:41,424-Speed 4885.78 samples/sec Loss 4.8940 LearningRate 0.1506 Epoch: 8 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:33:49,786-Speed 4898.61 samples/sec Loss 4.8303 LearningRate 0.1505 Epoch: 8 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:33:58,115-Speed 4918.54 samples/sec Loss 4.8626 LearningRate 0.1505 Epoch: 8 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:34:06,397-Speed 4946.54 samples/sec Loss 4.7875 LearningRate 0.1504 Epoch: 8 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:34:14,753-Speed 4902.84 samples/sec Loss 4.8233 LearningRate 0.1503 Epoch: 8 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:34:23,028-Speed 4950.29 samples/sec Loss 4.8980 LearningRate 0.1503 Epoch: 8 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:34:31,273-Speed 4968.81 samples/sec Loss 4.8699 LearningRate 0.1502 Epoch: 8 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:34:39,459-Speed 5004.08 samples/sec Loss 4.8658 LearningRate 0.1502 Epoch: 8 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:34:47,683-Speed 4981.46 samples/sec Loss 4.9002 LearningRate 0.1501 Epoch: 8 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:34:55,911-Speed 4978.63 samples/sec Loss 4.8658 LearningRate 0.1500 Epoch: 8 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:35:04,113-Speed 4994.69 samples/sec Loss 4.8898 LearningRate 0.1500 Epoch: 8 Global Step: 37460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:35:12,905-Speed 4659.64 samples/sec Loss 4.8454 LearningRate 0.1499 Epoch: 8 Global Step: 37470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:35:21,812-Speed 4598.77 samples/sec Loss 4.8786 LearningRate 0.1498 Epoch: 8 Global Step: 37480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:35:30,738-Speed 4589.60 samples/sec Loss 4.8460 LearningRate 0.1498 Epoch: 8 Global Step: 37490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:35:39,694-Speed 4573.60 samples/sec Loss 4.8062 LearningRate 0.1497 Epoch: 8 Global Step: 37500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:35:48,594-Speed 4603.43 samples/sec Loss 4.8457 LearningRate 0.1496 Epoch: 8 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:35:56,919-Speed 4920.53 samples/sec Loss 4.8916 LearningRate 0.1496 Epoch: 8 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:36:05,097-Speed 5008.93 samples/sec Loss 4.8200 LearningRate 0.1495 Epoch: 8 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:36:13,505-Speed 4872.49 samples/sec Loss 4.8657 LearningRate 0.1494 Epoch: 8 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:36:51,274-Speed 1084.84 samples/sec Loss 4.7510 LearningRate 0.1494 Epoch: 9 Global Step: 37550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:36:59,400-Speed 5041.61 samples/sec Loss 4.3177 LearningRate 0.1493 Epoch: 9 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:37:07,738-Speed 4913.25 samples/sec Loss 4.3404 LearningRate 0.1492 Epoch: 9 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:37:15,969-Speed 4976.38 samples/sec Loss 4.2899 LearningRate 0.1492 Epoch: 9 Global Step: 37580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:37:24,268-Speed 4936.46 samples/sec Loss 4.2951 LearningRate 0.1491 Epoch: 9 Global Step: 37590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:37:32,519-Speed 4964.75 samples/sec Loss 4.2929 LearningRate 0.1490 Epoch: 9 Global Step: 37600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:37:40,759-Speed 4971.52 samples/sec Loss 4.3747 LearningRate 0.1490 Epoch: 9 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:37:48,941-Speed 5007.30 samples/sec Loss 4.2695 LearningRate 0.1489 Epoch: 9 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:37:57,238-Speed 4937.25 samples/sec Loss 4.3591 LearningRate 0.1488 Epoch: 9 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:38:05,440-Speed 4994.70 samples/sec Loss 4.3469 LearningRate 0.1488 Epoch: 9 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:38:13,575-Speed 5035.30 samples/sec Loss 4.3716 LearningRate 0.1487 Epoch: 9 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:38:21,819-Speed 4969.28 samples/sec Loss 4.4127 LearningRate 0.1487 Epoch: 9 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:38:29,980-Speed 5019.82 samples/sec Loss 4.3423 LearningRate 0.1486 Epoch: 9 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:38:38,266-Speed 4944.23 samples/sec Loss 4.4310 LearningRate 0.1485 Epoch: 9 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:38:46,415-Speed 5026.79 samples/sec Loss 4.4239 LearningRate 0.1485 Epoch: 9 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:38:54,561-Speed 5028.89 samples/sec Loss 4.4661 LearningRate 0.1484 Epoch: 9 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:39:02,688-Speed 5040.23 samples/sec Loss 4.4605 LearningRate 0.1483 Epoch: 9 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:39:10,736-Speed 5090.43 samples/sec Loss 4.4700 LearningRate 0.1483 Epoch: 9 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:39:18,834-Speed 5058.60 samples/sec Loss 4.4289 LearningRate 0.1482 Epoch: 9 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:39:27,014-Speed 5008.21 samples/sec Loss 4.4617 LearningRate 0.1481 Epoch: 9 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:39:35,177-Speed 5018.19 samples/sec Loss 4.4597 LearningRate 0.1481 Epoch: 9 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:39:43,766-Speed 4769.87 samples/sec Loss 4.4575 LearningRate 0.1480 Epoch: 9 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:39:53,494-Speed 4210.87 samples/sec Loss 4.4898 LearningRate 0.1479 Epoch: 9 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:40:01,616-Speed 5044.24 samples/sec Loss 4.4680 LearningRate 0.1479 Epoch: 9 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:40:10,745-Speed 4487.55 samples/sec Loss 4.5055 LearningRate 0.1478 Epoch: 9 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:40:20,593-Speed 4159.47 samples/sec Loss 4.5068 LearningRate 0.1477 Epoch: 9 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:40:28,812-Speed 4984.47 samples/sec Loss 4.4778 LearningRate 0.1477 Epoch: 9 Global Step: 37810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:40:37,091-Speed 4948.08 samples/sec Loss 4.5079 LearningRate 0.1476 Epoch: 9 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:40:45,188-Speed 5059.08 samples/sec Loss 4.5039 LearningRate 0.1476 Epoch: 9 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:40:53,412-Speed 4981.49 samples/sec Loss 4.5033 LearningRate 0.1475 Epoch: 9 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:41:01,579-Speed 5015.51 samples/sec Loss 4.4798 LearningRate 0.1474 Epoch: 9 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:41:09,706-Speed 5041.05 samples/sec Loss 4.5244 LearningRate 0.1474 Epoch: 9 Global Step: 37860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:41:18,068-Speed 4899.44 samples/sec Loss 4.5349 LearningRate 0.1473 Epoch: 9 Global Step: 37870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:41:26,111-Speed 5093.34 samples/sec Loss 4.5193 LearningRate 0.1472 Epoch: 9 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:41:34,209-Speed 5058.57 samples/sec Loss 4.5491 LearningRate 0.1472 Epoch: 9 Global Step: 37890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:41:42,396-Speed 5003.44 samples/sec Loss 4.5050 LearningRate 0.1471 Epoch: 9 Global Step: 37900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:41:50,583-Speed 5004.04 samples/sec Loss 4.5908 LearningRate 0.1470 Epoch: 9 Global Step: 37910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:41:58,611-Speed 5102.89 samples/sec Loss 4.5353 LearningRate 0.1470 Epoch: 9 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:42:06,694-Speed 5067.85 samples/sec Loss 4.5780 LearningRate 0.1469 Epoch: 9 Global Step: 37930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:42:14,837-Speed 5030.99 samples/sec Loss 4.5750 LearningRate 0.1468 Epoch: 9 Global Step: 37940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:42:22,962-Speed 5041.65 samples/sec Loss 4.6352 LearningRate 0.1468 Epoch: 9 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:42:31,519-Speed 4787.12 samples/sec Loss 4.6658 LearningRate 0.1467 Epoch: 9 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:42:40,010-Speed 4825.25 samples/sec Loss 4.5644 LearningRate 0.1466 Epoch: 9 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:42:48,287-Speed 4949.37 samples/sec Loss 4.6037 LearningRate 0.1466 Epoch: 9 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:42:56,492-Speed 4992.86 samples/sec Loss 4.6519 LearningRate 0.1465 Epoch: 9 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:43:04,584-Speed 5062.44 samples/sec Loss 4.5615 LearningRate 0.1465 Epoch: 9 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:43:12,697-Speed 5049.41 samples/sec Loss 4.5692 LearningRate 0.1464 Epoch: 9 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:43:21,008-Speed 4929.02 samples/sec Loss 4.6577 LearningRate 0.1463 Epoch: 9 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:43:29,508-Speed 4819.53 samples/sec Loss 4.6271 LearningRate 0.1463 Epoch: 9 Global Step: 38030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:43:37,689-Speed 5007.90 samples/sec Loss 4.6835 LearningRate 0.1462 Epoch: 9 Global Step: 38040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:43:45,792-Speed 5055.01 samples/sec Loss 4.6455 LearningRate 0.1461 Epoch: 9 Global Step: 38050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:43:54,089-Speed 4937.43 samples/sec Loss 4.6835 LearningRate 0.1461 Epoch: 9 Global Step: 38060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:44:02,283-Speed 4999.95 samples/sec Loss 4.6404 LearningRate 0.1460 Epoch: 9 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:44:10,464-Speed 5007.39 samples/sec Loss 4.5957 LearningRate 0.1459 Epoch: 9 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:44:18,655-Speed 5000.66 samples/sec Loss 4.6327 LearningRate 0.1459 Epoch: 9 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:44:26,958-Speed 4934.13 samples/sec Loss 4.6391 LearningRate 0.1458 Epoch: 9 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:44:35,916-Speed 4573.19 samples/sec Loss 4.6405 LearningRate 0.1457 Epoch: 9 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:44:45,249-Speed 4388.88 samples/sec Loss 4.6174 LearningRate 0.1457 Epoch: 9 Global Step: 38120 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:44:53,513-Speed 4957.09 samples/sec Loss 4.5964 LearningRate 0.1456 Epoch: 9 Global Step: 38130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:45:01,806-Speed 4939.93 samples/sec Loss 4.6501 LearningRate 0.1456 Epoch: 9 Global Step: 38140 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:45:10,504-Speed 4709.57 samples/sec Loss 4.6679 LearningRate 0.1455 Epoch: 9 Global Step: 38150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:45:18,789-Speed 4944.78 samples/sec Loss 4.6422 LearningRate 0.1454 Epoch: 9 Global Step: 38160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:45:27,015-Speed 4980.11 samples/sec Loss 4.6362 LearningRate 0.1454 Epoch: 9 Global Step: 38170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:45:35,140-Speed 5041.90 samples/sec Loss 4.6727 LearningRate 0.1453 Epoch: 9 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:45:43,242-Speed 5056.01 samples/sec Loss 4.7338 LearningRate 0.1452 Epoch: 9 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:45:51,299-Speed 5084.19 samples/sec Loss 4.6399 LearningRate 0.1452 Epoch: 9 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:45:59,354-Speed 5085.82 samples/sec Loss 4.6406 LearningRate 0.1451 Epoch: 9 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:46:07,426-Speed 5075.21 samples/sec Loss 4.6608 LearningRate 0.1450 Epoch: 9 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:46:15,493-Speed 5077.53 samples/sec Loss 4.6686 LearningRate 0.1450 Epoch: 9 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:46:23,694-Speed 4995.05 samples/sec Loss 4.6641 LearningRate 0.1449 Epoch: 9 Global Step: 38240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:46:31,866-Speed 5012.83 samples/sec Loss 4.7000 LearningRate 0.1448 Epoch: 9 Global Step: 38250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:46:40,553-Speed 4715.79 samples/sec Loss 4.6863 LearningRate 0.1448 Epoch: 9 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:46:49,488-Speed 4584.87 samples/sec Loss 4.6864 LearningRate 0.1447 Epoch: 9 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:46:58,240-Speed 4680.46 samples/sec Loss 4.6998 LearningRate 0.1447 Epoch: 9 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:47:07,106-Speed 4620.38 samples/sec Loss 4.6857 LearningRate 0.1446 Epoch: 9 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:47:15,974-Speed 4619.94 samples/sec Loss 4.7243 LearningRate 0.1445 Epoch: 9 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-17 06:47:24,764-Speed 4659.98 samples/sec Loss 4.7387 LearningRate 0.1445 Epoch: 9 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:47:33,555-Speed 4660.11 samples/sec Loss 4.6871 LearningRate 0.1444 Epoch: 9 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:47:42,487-Speed 4586.00 samples/sec Loss 4.6853 LearningRate 0.1443 Epoch: 9 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:47:51,338-Speed 4628.91 samples/sec Loss 4.6688 LearningRate 0.1443 Epoch: 9 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:47:59,789-Speed 4846.80 samples/sec Loss 4.6419 LearningRate 0.1442 Epoch: 9 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:48:07,910-Speed 5044.68 samples/sec Loss 4.7174 LearningRate 0.1441 Epoch: 9 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:48:16,040-Speed 5038.39 samples/sec Loss 4.7176 LearningRate 0.1441 Epoch: 9 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:48:24,262-Speed 4983.01 samples/sec Loss 4.6985 LearningRate 0.1440 Epoch: 9 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:48:32,401-Speed 5033.25 samples/sec Loss 4.7027 LearningRate 0.1440 Epoch: 9 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:48:40,595-Speed 4999.92 samples/sec Loss 4.7093 LearningRate 0.1439 Epoch: 9 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:48:48,803-Speed 4990.75 samples/sec Loss 4.7262 LearningRate 0.1438 Epoch: 9 Global Step: 38410 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:48:57,003-Speed 4995.47 samples/sec Loss 4.7191 LearningRate 0.1438 Epoch: 9 Global Step: 38420 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:49:05,155-Speed 5025.37 samples/sec Loss 4.6994 LearningRate 0.1437 Epoch: 9 Global Step: 38430 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:49:13,338-Speed 5006.69 samples/sec Loss 4.6488 LearningRate 0.1436 Epoch: 9 Global Step: 38440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:49:21,490-Speed 5025.12 samples/sec Loss 4.7096 LearningRate 0.1436 Epoch: 9 Global Step: 38450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:49:29,687-Speed 4997.11 samples/sec Loss 4.6878 LearningRate 0.1435 Epoch: 9 Global Step: 38460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:49:37,868-Speed 5007.75 samples/sec Loss 4.7679 LearningRate 0.1434 Epoch: 9 Global Step: 38470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:49:46,027-Speed 5020.62 samples/sec Loss 4.7011 LearningRate 0.1434 Epoch: 9 Global Step: 38480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:49:54,336-Speed 4930.54 samples/sec Loss 4.6304 LearningRate 0.1433 Epoch: 9 Global Step: 38490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:02,481-Speed 5029.17 samples/sec Loss 4.6806 LearningRate 0.1432 Epoch: 9 Global Step: 38500 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:10,596-Speed 5055.37 samples/sec Loss 4.7606 LearningRate 0.1432 Epoch: 9 Global Step: 38510 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:18,739-Speed 5030.79 samples/sec Loss 4.6664 LearningRate 0.1431 Epoch: 9 Global Step: 38520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:26,889-Speed 5026.61 samples/sec Loss 4.6814 LearningRate 0.1431 Epoch: 9 Global Step: 38530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:35,014-Speed 5041.84 samples/sec Loss 4.6823 LearningRate 0.1430 Epoch: 9 Global Step: 38540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:43,108-Speed 5061.24 samples/sec Loss 4.6659 LearningRate 0.1429 Epoch: 9 Global Step: 38550 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:51,199-Speed 5062.80 samples/sec Loss 4.7240 LearningRate 0.1429 Epoch: 9 Global Step: 38560 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:50:59,311-Speed 5050.12 samples/sec Loss 4.6869 LearningRate 0.1428 Epoch: 9 Global Step: 38570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:51:07,384-Speed 5074.33 samples/sec Loss 4.6574 LearningRate 0.1427 Epoch: 9 Global Step: 38580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:51:15,534-Speed 5026.06 samples/sec Loss 4.6461 LearningRate 0.1427 Epoch: 9 Global Step: 38590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:51:23,678-Speed 5030.80 samples/sec Loss 4.6756 LearningRate 0.1426 Epoch: 9 Global Step: 38600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:51:31,759-Speed 5069.27 samples/sec Loss 4.6608 LearningRate 0.1425 Epoch: 9 Global Step: 38610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:51:39,937-Speed 5009.13 samples/sec Loss 4.6780 LearningRate 0.1425 Epoch: 9 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:51:47,988-Speed 5088.43 samples/sec Loss 4.6685 LearningRate 0.1424 Epoch: 9 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:51:56,066-Speed 5071.20 samples/sec Loss 4.7370 LearningRate 0.1424 Epoch: 9 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:52:04,220-Speed 5023.71 samples/sec Loss 4.6596 LearningRate 0.1423 Epoch: 9 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:52:12,490-Speed 4953.44 samples/sec Loss 4.7174 LearningRate 0.1422 Epoch: 9 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:52:20,777-Speed 4943.61 samples/sec Loss 4.6831 LearningRate 0.1422 Epoch: 9 Global Step: 38670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:52:28,866-Speed 5064.20 samples/sec Loss 4.6585 LearningRate 0.1421 Epoch: 9 Global Step: 38680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:52:36,993-Speed 5040.49 samples/sec Loss 4.6603 LearningRate 0.1420 Epoch: 9 Global Step: 38690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:52:45,123-Speed 5038.99 samples/sec Loss 4.6996 LearningRate 0.1420 Epoch: 9 Global Step: 38700 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:52:53,208-Speed 5066.99 samples/sec Loss 4.6888 LearningRate 0.1419 Epoch: 9 Global Step: 38710 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:53:01,315-Speed 5053.36 samples/sec Loss 4.7088 LearningRate 0.1419 Epoch: 9 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:53:09,474-Speed 5020.53 samples/sec Loss 4.6755 LearningRate 0.1418 Epoch: 9 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:53:17,653-Speed 5008.47 samples/sec Loss 4.6610 LearningRate 0.1417 Epoch: 9 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:53:25,732-Speed 5070.83 samples/sec Loss 4.7374 LearningRate 0.1417 Epoch: 9 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:53:33,773-Speed 5094.23 samples/sec Loss 4.7114 LearningRate 0.1416 Epoch: 9 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:53:41,946-Speed 5012.63 samples/sec Loss 4.6617 LearningRate 0.1415 Epoch: 9 Global Step: 38770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:53:50,238-Speed 4940.57 samples/sec Loss 4.7244 LearningRate 0.1415 Epoch: 9 Global Step: 38780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:53:58,702-Speed 4839.45 samples/sec Loss 4.6682 LearningRate 0.1414 Epoch: 9 Global Step: 38790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:54:06,812-Speed 5051.49 samples/sec Loss 4.6931 LearningRate 0.1413 Epoch: 9 Global Step: 38800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:54:14,951-Speed 5033.46 samples/sec Loss 4.6759 LearningRate 0.1413 Epoch: 9 Global Step: 38810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:54:23,125-Speed 5011.23 samples/sec Loss 4.6591 LearningRate 0.1412 Epoch: 9 Global Step: 38820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:54:31,243-Speed 5046.13 samples/sec Loss 4.6608 LearningRate 0.1412 Epoch: 9 Global Step: 38830 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:54:39,301-Speed 5084.24 samples/sec Loss 4.7064 LearningRate 0.1411 Epoch: 9 Global Step: 38840 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:54:47,447-Speed 5028.56 samples/sec Loss 4.6912 LearningRate 0.1410 Epoch: 9 Global Step: 38850 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:54:55,528-Speed 5069.66 samples/sec Loss 4.6819 LearningRate 0.1410 Epoch: 9 Global Step: 38860 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:55:03,717-Speed 5002.89 samples/sec Loss 4.7157 LearningRate 0.1409 Epoch: 9 Global Step: 38870 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:55:11,942-Speed 4980.68 samples/sec Loss 4.6226 LearningRate 0.1408 Epoch: 9 Global Step: 38880 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:55:20,014-Speed 5074.88 samples/sec Loss 4.6832 LearningRate 0.1408 Epoch: 9 Global Step: 38890 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:55:28,160-Speed 5028.30 samples/sec Loss 4.6773 LearningRate 0.1407 Epoch: 9 Global Step: 38900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:55:36,232-Speed 5075.35 samples/sec Loss 4.6987 LearningRate 0.1406 Epoch: 9 Global Step: 38910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:55:44,474-Speed 4970.56 samples/sec Loss 4.6779 LearningRate 0.1406 Epoch: 9 Global Step: 38920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:55:52,690-Speed 4985.93 samples/sec Loss 4.6522 LearningRate 0.1405 Epoch: 9 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:56:00,824-Speed 5036.42 samples/sec Loss 4.6863 LearningRate 0.1405 Epoch: 9 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:56:09,042-Speed 4984.47 samples/sec Loss 4.7502 LearningRate 0.1404 Epoch: 9 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:56:17,172-Speed 5038.73 samples/sec Loss 4.6900 LearningRate 0.1403 Epoch: 9 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:56:25,291-Speed 5046.00 samples/sec Loss 4.6465 LearningRate 0.1403 Epoch: 9 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:56:33,449-Speed 5021.73 samples/sec Loss 4.6306 LearningRate 0.1402 Epoch: 9 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:56:41,702-Speed 4963.60 samples/sec Loss 4.6539 LearningRate 0.1401 Epoch: 9 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:56:49,993-Speed 4940.87 samples/sec Loss 4.6687 LearningRate 0.1401 Epoch: 9 Global Step: 39000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:56:58,269-Speed 4949.80 samples/sec Loss 4.6415 LearningRate 0.1400 Epoch: 9 Global Step: 39010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:57:06,381-Speed 5050.06 samples/sec Loss 4.6584 LearningRate 0.1400 Epoch: 9 Global Step: 39020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:57:14,763-Speed 4887.22 samples/sec Loss 4.6552 LearningRate 0.1399 Epoch: 9 Global Step: 39030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:57:23,053-Speed 4941.64 samples/sec Loss 4.6272 LearningRate 0.1398 Epoch: 9 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:57:31,840-Speed 4661.84 samples/sec Loss 4.6773 LearningRate 0.1398 Epoch: 9 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:57:40,610-Speed 4671.43 samples/sec Loss 4.6531 LearningRate 0.1397 Epoch: 9 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:57:49,269-Speed 4730.56 samples/sec Loss 4.6744 LearningRate 0.1396 Epoch: 9 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:57:57,418-Speed 5026.90 samples/sec Loss 4.6367 LearningRate 0.1396 Epoch: 9 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:58:05,507-Speed 5064.69 samples/sec Loss 4.6347 LearningRate 0.1395 Epoch: 9 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:58:13,717-Speed 4989.56 samples/sec Loss 4.6683 LearningRate 0.1394 Epoch: 9 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 06:58:21,875-Speed 5021.41 samples/sec Loss 4.5930 LearningRate 0.1394 Epoch: 9 Global Step: 39110 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:58:29,960-Speed 5066.99 samples/sec Loss 4.6288 LearningRate 0.1393 Epoch: 9 Global Step: 39120 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:58:38,042-Speed 5068.34 samples/sec Loss 4.6376 LearningRate 0.1393 Epoch: 9 Global Step: 39130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:58:46,108-Speed 5079.22 samples/sec Loss 4.6512 LearningRate 0.1392 Epoch: 9 Global Step: 39140 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:58:54,214-Speed 5053.57 samples/sec Loss 4.6352 LearningRate 0.1391 Epoch: 9 Global Step: 39150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:02,466-Speed 4964.14 samples/sec Loss 4.6419 LearningRate 0.1391 Epoch: 9 Global Step: 39160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:10,688-Speed 4982.40 samples/sec Loss 4.6402 LearningRate 0.1390 Epoch: 9 Global Step: 39170 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:18,871-Speed 5006.92 samples/sec Loss 4.6601 LearningRate 0.1389 Epoch: 9 Global Step: 39180 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:27,092-Speed 4982.54 samples/sec Loss 4.6918 LearningRate 0.1389 Epoch: 9 Global Step: 39190 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:35,171-Speed 5070.57 samples/sec Loss 4.6540 LearningRate 0.1388 Epoch: 9 Global Step: 39200 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:43,507-Speed 4914.64 samples/sec Loss 4.5917 LearningRate 0.1388 Epoch: 9 Global Step: 39210 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:51,598-Speed 5063.51 samples/sec Loss 4.6840 LearningRate 0.1387 Epoch: 9 Global Step: 39220 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 06:59:59,829-Speed 4976.99 samples/sec Loss 4.6106 LearningRate 0.1386 Epoch: 9 Global Step: 39230 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:00:07,912-Speed 5068.04 samples/sec Loss 4.6009 LearningRate 0.1386 Epoch: 9 Global Step: 39240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:00:15,977-Speed 5079.99 samples/sec Loss 4.6348 LearningRate 0.1385 Epoch: 9 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:00:24,118-Speed 5031.86 samples/sec Loss 4.5993 LearningRate 0.1384 Epoch: 9 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:00:32,181-Speed 5080.44 samples/sec Loss 4.6782 LearningRate 0.1384 Epoch: 9 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:00:40,282-Speed 5056.60 samples/sec Loss 4.6841 LearningRate 0.1383 Epoch: 9 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:00:48,435-Speed 5024.70 samples/sec Loss 4.6744 LearningRate 0.1383 Epoch: 9 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:00:56,619-Speed 5005.70 samples/sec Loss 4.5886 LearningRate 0.1382 Epoch: 9 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:01:04,834-Speed 4986.77 samples/sec Loss 4.6293 LearningRate 0.1381 Epoch: 9 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:01:13,107-Speed 4951.66 samples/sec Loss 4.6695 LearningRate 0.1381 Epoch: 9 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:01:21,313-Speed 4992.04 samples/sec Loss 4.6208 LearningRate 0.1380 Epoch: 9 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:01:29,534-Speed 4983.46 samples/sec Loss 4.6297 LearningRate 0.1379 Epoch: 9 Global Step: 39340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:01:37,788-Speed 4962.89 samples/sec Loss 4.6279 LearningRate 0.1379 Epoch: 9 Global Step: 39350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:01:45,896-Speed 5052.44 samples/sec Loss 4.5956 LearningRate 0.1378 Epoch: 9 Global Step: 39360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:01:54,046-Speed 5026.40 samples/sec Loss 4.6025 LearningRate 0.1378 Epoch: 9 Global Step: 39370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:02:02,130-Speed 5067.53 samples/sec Loss 4.5946 LearningRate 0.1377 Epoch: 9 Global Step: 39380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:02:10,331-Speed 4995.04 samples/sec Loss 4.6631 LearningRate 0.1376 Epoch: 9 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:02:18,451-Speed 5044.71 samples/sec Loss 4.6367 LearningRate 0.1376 Epoch: 9 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:02:26,645-Speed 4999.60 samples/sec Loss 4.5737 LearningRate 0.1375 Epoch: 9 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:02:35,029-Speed 4886.03 samples/sec Loss 4.6139 LearningRate 0.1374 Epoch: 9 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:02:43,267-Speed 4972.94 samples/sec Loss 4.6629 LearningRate 0.1374 Epoch: 9 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-17 07:02:51,497-Speed 4977.40 samples/sec Loss 4.6308 LearningRate 0.1373 Epoch: 9 Global Step: 39440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:02:59,654-Speed 5022.13 samples/sec Loss 4.6418 LearningRate 0.1373 Epoch: 9 Global Step: 39450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:03:07,736-Speed 5068.71 samples/sec Loss 4.6601 LearningRate 0.1372 Epoch: 9 Global Step: 39460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:03:15,924-Speed 5003.53 samples/sec Loss 4.6051 LearningRate 0.1371 Epoch: 9 Global Step: 39470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:03:24,124-Speed 4995.35 samples/sec Loss 4.6195 LearningRate 0.1371 Epoch: 9 Global Step: 39480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:03:32,548-Speed 4862.88 samples/sec Loss 4.5686 LearningRate 0.1370 Epoch: 9 Global Step: 39490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:03:40,857-Speed 4930.11 samples/sec Loss 4.5698 LearningRate 0.1369 Epoch: 9 Global Step: 39500 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-17 07:03:48,956-Speed 5058.42 samples/sec Loss 4.6190 LearningRate 0.1369 Epoch: 9 Global Step: 39510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:03:57,093-Speed 5034.30 samples/sec Loss 4.5696 LearningRate 0.1368 Epoch: 9 Global Step: 39520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:04:05,215-Speed 5043.84 samples/sec Loss 4.5985 LearningRate 0.1368 Epoch: 9 Global Step: 39530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:04:13,297-Speed 5068.50 samples/sec Loss 4.6744 LearningRate 0.1367 Epoch: 9 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:04:21,477-Speed 5008.19 samples/sec Loss 4.5996 LearningRate 0.1366 Epoch: 9 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:04:29,560-Speed 5067.79 samples/sec Loss 4.6014 LearningRate 0.1366 Epoch: 9 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:04:37,684-Speed 5042.55 samples/sec Loss 4.5775 LearningRate 0.1365 Epoch: 9 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:04:45,819-Speed 5035.61 samples/sec Loss 4.6008 LearningRate 0.1364 Epoch: 9 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:04:53,971-Speed 5025.45 samples/sec Loss 4.5223 LearningRate 0.1364 Epoch: 9 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:05:02,274-Speed 4933.99 samples/sec Loss 4.6054 LearningRate 0.1363 Epoch: 9 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:05:10,427-Speed 5024.80 samples/sec Loss 4.6596 LearningRate 0.1363 Epoch: 9 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:05:18,502-Speed 5072.72 samples/sec Loss 4.6203 LearningRate 0.1362 Epoch: 9 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:05:26,728-Speed 4979.71 samples/sec Loss 4.5997 LearningRate 0.1361 Epoch: 9 Global Step: 39630 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:05:34,888-Speed 5020.35 samples/sec Loss 4.5426 LearningRate 0.1361 Epoch: 9 Global Step: 39640 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:05:43,242-Speed 4903.86 samples/sec Loss 4.5544 LearningRate 0.1360 Epoch: 9 Global Step: 39650 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:05:51,317-Speed 5073.05 samples/sec Loss 4.5906 LearningRate 0.1359 Epoch: 9 Global Step: 39660 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:05:59,431-Speed 5048.54 samples/sec Loss 4.6103 LearningRate 0.1359 Epoch: 9 Global Step: 39670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:06:07,543-Speed 5050.38 samples/sec Loss 4.6089 LearningRate 0.1358 Epoch: 9 Global Step: 39680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:06:15,603-Speed 5082.25 samples/sec Loss 4.6142 LearningRate 0.1358 Epoch: 9 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:06:23,729-Speed 5041.38 samples/sec Loss 4.5960 LearningRate 0.1357 Epoch: 9 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:06:31,815-Speed 5066.28 samples/sec Loss 4.5879 LearningRate 0.1356 Epoch: 9 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:06:39,916-Speed 5057.08 samples/sec Loss 4.5984 LearningRate 0.1356 Epoch: 9 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:06:48,081-Speed 5017.06 samples/sec Loss 4.5667 LearningRate 0.1355 Epoch: 9 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:06:56,208-Speed 5040.80 samples/sec Loss 4.5719 LearningRate 0.1355 Epoch: 9 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:07:04,381-Speed 5011.91 samples/sec Loss 4.5716 LearningRate 0.1354 Epoch: 9 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:07:12,453-Speed 5075.03 samples/sec Loss 4.5812 LearningRate 0.1353 Epoch: 9 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:07:20,509-Speed 5085.40 samples/sec Loss 4.5492 LearningRate 0.1353 Epoch: 9 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:07:28,730-Speed 4982.73 samples/sec Loss 4.5800 LearningRate 0.1352 Epoch: 9 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:07:36,794-Speed 5080.46 samples/sec Loss 4.5216 LearningRate 0.1351 Epoch: 9 Global Step: 39790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:07:44,931-Speed 5033.95 samples/sec Loss 4.5516 LearningRate 0.1351 Epoch: 9 Global Step: 39800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:07:53,078-Speed 5028.69 samples/sec Loss 4.5807 LearningRate 0.1350 Epoch: 9 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:08:01,324-Speed 4967.32 samples/sec Loss 4.6179 LearningRate 0.1350 Epoch: 9 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:08:09,510-Speed 5004.55 samples/sec Loss 4.6004 LearningRate 0.1349 Epoch: 9 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:08:17,664-Speed 5024.44 samples/sec Loss 4.5797 LearningRate 0.1348 Epoch: 9 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:08:25,920-Speed 4961.67 samples/sec Loss 4.5310 LearningRate 0.1348 Epoch: 9 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:08:34,620-Speed 4708.52 samples/sec Loss 4.5608 LearningRate 0.1347 Epoch: 9 Global Step: 39860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:08:43,451-Speed 4638.79 samples/sec Loss 4.5675 LearningRate 0.1346 Epoch: 9 Global Step: 39870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:08:52,267-Speed 4646.78 samples/sec Loss 4.5705 LearningRate 0.1346 Epoch: 9 Global Step: 39880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:09:01,105-Speed 4634.84 samples/sec Loss 4.5896 LearningRate 0.1345 Epoch: 9 Global Step: 39890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:09:09,918-Speed 4648.43 samples/sec Loss 4.6179 LearningRate 0.1345 Epoch: 9 Global Step: 39900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:09:18,698-Speed 4665.31 samples/sec Loss 4.5518 LearningRate 0.1344 Epoch: 9 Global Step: 39910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:09:27,487-Speed 4660.90 samples/sec Loss 4.5655 LearningRate 0.1343 Epoch: 9 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:09:36,281-Speed 4658.45 samples/sec Loss 4.5281 LearningRate 0.1343 Epoch: 9 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:09:45,109-Speed 4640.41 samples/sec Loss 4.5556 LearningRate 0.1342 Epoch: 9 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:09:53,648-Speed 4797.33 samples/sec Loss 4.5170 LearningRate 0.1342 Epoch: 9 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:10:01,755-Speed 5053.26 samples/sec Loss 4.5685 LearningRate 0.1341 Epoch: 9 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:10:09,880-Speed 5041.46 samples/sec Loss 4.5792 LearningRate 0.1340 Epoch: 9 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:10:18,000-Speed 5045.43 samples/sec Loss 4.5215 LearningRate 0.1340 Epoch: 9 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:10:26,223-Speed 4981.97 samples/sec Loss 4.4980 LearningRate 0.1339 Epoch: 9 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:10:34,364-Speed 5031.72 samples/sec Loss 4.5588 LearningRate 0.1338 Epoch: 9 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:11:21,069-[lfw][40000]XNorm: 21.658923 Training: 2022-01-17 07:11:21,070-[lfw][40000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-01-17 07:11:21,070-[lfw][40000]Accuracy-Highest: 0.99817 Training: 2022-01-17 07:12:15,190-[cfp_fp][40000]XNorm: 18.975174 Training: 2022-01-17 07:12:15,191-[cfp_fp][40000]Accuracy-Flip: 0.97886+-0.00522 Training: 2022-01-17 07:12:15,192-[cfp_fp][40000]Accuracy-Highest: 0.97886 Training: 2022-01-17 07:13:01,805-[agedb_30][40000]XNorm: 20.950217 Training: 2022-01-17 07:13:01,806-[agedb_30][40000]Accuracy-Flip: 0.97783+-0.00597 Training: 2022-01-17 07:13:01,807-[agedb_30][40000]Accuracy-Highest: 0.97783 Training: 2022-01-17 07:13:09,892-Speed 263.36 samples/sec Loss 4.5392 LearningRate 0.1338 Epoch: 9 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:13:17,970-Speed 5071.19 samples/sec Loss 4.5708 LearningRate 0.1337 Epoch: 9 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:13:25,989-Speed 5108.36 samples/sec Loss 4.5448 LearningRate 0.1337 Epoch: 9 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:13:34,211-Speed 4982.16 samples/sec Loss 4.5376 LearningRate 0.1336 Epoch: 9 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:13:42,508-Speed 4937.34 samples/sec Loss 4.5267 LearningRate 0.1335 Epoch: 9 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:13:50,779-Speed 4953.29 samples/sec Loss 4.5523 LearningRate 0.1335 Epoch: 9 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:13:58,997-Speed 4984.87 samples/sec Loss 4.4982 LearningRate 0.1334 Epoch: 9 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:14:07,189-Speed 5000.55 samples/sec Loss 4.5277 LearningRate 0.1334 Epoch: 9 Global Step: 40080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:14:15,325-Speed 5034.91 samples/sec Loss 4.5327 LearningRate 0.1333 Epoch: 9 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:14:23,612-Speed 4943.37 samples/sec Loss 4.5744 LearningRate 0.1332 Epoch: 9 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:14:31,806-Speed 4999.66 samples/sec Loss 4.5019 LearningRate 0.1332 Epoch: 9 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:14:40,080-Speed 4950.62 samples/sec Loss 4.5121 LearningRate 0.1331 Epoch: 9 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:14:48,402-Speed 4922.93 samples/sec Loss 4.5552 LearningRate 0.1330 Epoch: 9 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:14:56,527-Speed 5041.99 samples/sec Loss 4.5347 LearningRate 0.1330 Epoch: 9 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:15:04,731-Speed 4993.04 samples/sec Loss 4.5486 LearningRate 0.1329 Epoch: 9 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:15:12,903-Speed 5012.85 samples/sec Loss 4.5600 LearningRate 0.1329 Epoch: 9 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:15:21,020-Speed 5046.79 samples/sec Loss 4.5209 LearningRate 0.1328 Epoch: 9 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:15:29,111-Speed 5063.43 samples/sec Loss 4.5542 LearningRate 0.1327 Epoch: 9 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:15:37,155-Speed 5092.15 samples/sec Loss 4.5588 LearningRate 0.1327 Epoch: 9 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:15:45,342-Speed 5004.28 samples/sec Loss 4.4868 LearningRate 0.1326 Epoch: 9 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:15:53,485-Speed 5030.46 samples/sec Loss 4.4965 LearningRate 0.1326 Epoch: 9 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:16:01,551-Speed 5079.08 samples/sec Loss 4.5656 LearningRate 0.1325 Epoch: 9 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:16:09,703-Speed 5025.11 samples/sec Loss 4.4950 LearningRate 0.1324 Epoch: 9 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:16:17,854-Speed 5025.48 samples/sec Loss 4.4850 LearningRate 0.1324 Epoch: 9 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:16:26,019-Speed 5017.62 samples/sec Loss 4.4882 LearningRate 0.1323 Epoch: 9 Global Step: 40250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:16:34,158-Speed 5033.07 samples/sec Loss 4.5176 LearningRate 0.1322 Epoch: 9 Global Step: 40260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:16:42,291-Speed 5037.00 samples/sec Loss 4.4952 LearningRate 0.1322 Epoch: 9 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:16:50,365-Speed 5073.54 samples/sec Loss 4.5138 LearningRate 0.1321 Epoch: 9 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:16:58,560-Speed 4999.32 samples/sec Loss 4.5314 LearningRate 0.1321 Epoch: 9 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:17:06,682-Speed 5043.75 samples/sec Loss 4.4996 LearningRate 0.1320 Epoch: 9 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:17:14,851-Speed 5014.32 samples/sec Loss 4.4751 LearningRate 0.1319 Epoch: 9 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:17:23,252-Speed 4876.96 samples/sec Loss 4.5564 LearningRate 0.1319 Epoch: 9 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:17:31,309-Speed 5084.33 samples/sec Loss 4.4976 LearningRate 0.1318 Epoch: 9 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:17:39,584-Speed 4950.27 samples/sec Loss 4.4784 LearningRate 0.1318 Epoch: 9 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:17:47,730-Speed 5029.01 samples/sec Loss 4.4784 LearningRate 0.1317 Epoch: 9 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:17:56,485-Speed 4678.99 samples/sec Loss 4.4963 LearningRate 0.1316 Epoch: 9 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:18:05,198-Speed 4702.03 samples/sec Loss 4.4734 LearningRate 0.1316 Epoch: 9 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:18:13,637-Speed 4853.73 samples/sec Loss 4.4907 LearningRate 0.1315 Epoch: 9 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:18:21,788-Speed 5026.02 samples/sec Loss 4.5312 LearningRate 0.1315 Epoch: 9 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:18:29,967-Speed 5008.70 samples/sec Loss 4.4915 LearningRate 0.1314 Epoch: 9 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:18:38,055-Speed 5064.80 samples/sec Loss 4.5081 LearningRate 0.1313 Epoch: 9 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:18:46,371-Speed 4926.57 samples/sec Loss 4.4703 LearningRate 0.1313 Epoch: 9 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:18:55,140-Speed 4671.39 samples/sec Loss 4.4622 LearningRate 0.1312 Epoch: 9 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:19:03,852-Speed 4702.09 samples/sec Loss 4.4539 LearningRate 0.1311 Epoch: 9 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:19:12,207-Speed 4903.22 samples/sec Loss 4.4768 LearningRate 0.1311 Epoch: 9 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:19:20,310-Speed 5055.27 samples/sec Loss 4.5011 LearningRate 0.1310 Epoch: 9 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:19:28,647-Speed 4913.65 samples/sec Loss 4.4809 LearningRate 0.1310 Epoch: 9 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:19:36,691-Speed 5093.13 samples/sec Loss 4.4722 LearningRate 0.1309 Epoch: 9 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:19:44,965-Speed 4950.75 samples/sec Loss 4.5156 LearningRate 0.1308 Epoch: 9 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:19:53,087-Speed 5043.69 samples/sec Loss 4.5122 LearningRate 0.1308 Epoch: 9 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:20:01,212-Speed 5042.02 samples/sec Loss 4.4501 LearningRate 0.1307 Epoch: 9 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:20:09,356-Speed 5030.45 samples/sec Loss 4.5083 LearningRate 0.1307 Epoch: 9 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:20:17,470-Speed 5048.73 samples/sec Loss 4.4516 LearningRate 0.1306 Epoch: 9 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:20:25,730-Speed 4959.55 samples/sec Loss 4.4272 LearningRate 0.1305 Epoch: 9 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:20:34,143-Speed 4868.96 samples/sec Loss 4.4575 LearningRate 0.1305 Epoch: 9 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:20:42,309-Speed 5016.72 samples/sec Loss 4.4890 LearningRate 0.1304 Epoch: 9 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:20:50,414-Speed 5054.61 samples/sec Loss 4.4669 LearningRate 0.1304 Epoch: 9 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:20:58,537-Speed 5042.89 samples/sec Loss 4.4711 LearningRate 0.1303 Epoch: 9 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:21:06,615-Speed 5070.93 samples/sec Loss 4.4595 LearningRate 0.1302 Epoch: 9 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:21:14,738-Speed 5043.55 samples/sec Loss 4.4807 LearningRate 0.1302 Epoch: 9 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:21:22,878-Speed 5032.14 samples/sec Loss 4.4762 LearningRate 0.1301 Epoch: 9 Global Step: 40610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:21:30,966-Speed 5064.90 samples/sec Loss 4.4519 LearningRate 0.1301 Epoch: 9 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:21:39,049-Speed 5068.63 samples/sec Loss 4.4902 LearningRate 0.1300 Epoch: 9 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:21:47,225-Speed 5010.01 samples/sec Loss 4.4748 LearningRate 0.1299 Epoch: 9 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:21:55,369-Speed 5030.30 samples/sec Loss 4.4207 LearningRate 0.1299 Epoch: 9 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:22:03,475-Speed 5053.83 samples/sec Loss 4.4897 LearningRate 0.1298 Epoch: 9 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:22:11,667-Speed 5001.11 samples/sec Loss 4.5127 LearningRate 0.1297 Epoch: 9 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:22:19,858-Speed 5000.97 samples/sec Loss 4.5324 LearningRate 0.1297 Epoch: 9 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:22:28,004-Speed 5028.77 samples/sec Loss 4.4227 LearningRate 0.1296 Epoch: 9 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:22:36,212-Speed 4990.81 samples/sec Loss 4.4707 LearningRate 0.1296 Epoch: 9 Global Step: 40700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:22:44,426-Speed 4987.60 samples/sec Loss 4.4633 LearningRate 0.1295 Epoch: 9 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:22:52,518-Speed 5062.35 samples/sec Loss 4.4635 LearningRate 0.1294 Epoch: 9 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:00,664-Speed 5028.91 samples/sec Loss 4.4843 LearningRate 0.1294 Epoch: 9 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:08,833-Speed 5014.91 samples/sec Loss 4.4838 LearningRate 0.1293 Epoch: 9 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:16,966-Speed 5036.85 samples/sec Loss 4.4855 LearningRate 0.1293 Epoch: 9 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:25,079-Speed 5049.41 samples/sec Loss 4.4224 LearningRate 0.1292 Epoch: 9 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:33,167-Speed 5065.05 samples/sec Loss 4.3991 LearningRate 0.1291 Epoch: 9 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:41,255-Speed 5064.75 samples/sec Loss 4.4769 LearningRate 0.1291 Epoch: 9 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:49,443-Speed 5002.87 samples/sec Loss 4.4281 LearningRate 0.1290 Epoch: 9 Global Step: 40790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:23:57,494-Speed 5088.70 samples/sec Loss 4.5233 LearningRate 0.1290 Epoch: 9 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:24:05,641-Speed 5028.24 samples/sec Loss 4.3883 LearningRate 0.1289 Epoch: 9 Global Step: 40810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:24:13,777-Speed 5035.01 samples/sec Loss 4.4510 LearningRate 0.1288 Epoch: 9 Global Step: 40820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:24:21,865-Speed 5065.11 samples/sec Loss 4.4862 LearningRate 0.1288 Epoch: 9 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:24:29,999-Speed 5036.11 samples/sec Loss 4.4097 LearningRate 0.1287 Epoch: 9 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:24:38,089-Speed 5063.82 samples/sec Loss 4.3973 LearningRate 0.1287 Epoch: 9 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:24:46,279-Speed 5001.63 samples/sec Loss 4.4469 LearningRate 0.1286 Epoch: 9 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:24:54,408-Speed 5039.66 samples/sec Loss 4.4568 LearningRate 0.1285 Epoch: 9 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:25:02,498-Speed 5063.38 samples/sec Loss 4.4059 LearningRate 0.1285 Epoch: 9 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:25:10,615-Speed 5047.39 samples/sec Loss 4.3902 LearningRate 0.1284 Epoch: 9 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:25:18,710-Speed 5060.63 samples/sec Loss 4.4538 LearningRate 0.1284 Epoch: 9 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:25:26,806-Speed 5059.68 samples/sec Loss 4.3696 LearningRate 0.1283 Epoch: 9 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:25:34,877-Speed 5075.26 samples/sec Loss 4.4358 LearningRate 0.1282 Epoch: 9 Global Step: 40920 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:25:43,002-Speed 5042.23 samples/sec Loss 4.3874 LearningRate 0.1282 Epoch: 9 Global Step: 40930 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:25:51,161-Speed 5020.93 samples/sec Loss 4.4096 LearningRate 0.1281 Epoch: 9 Global Step: 40940 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:25:59,391-Speed 4977.58 samples/sec Loss 4.4723 LearningRate 0.1281 Epoch: 9 Global Step: 40950 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:26:07,511-Speed 5045.15 samples/sec Loss 4.3982 LearningRate 0.1280 Epoch: 9 Global Step: 40960 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:26:15,658-Speed 5028.54 samples/sec Loss 4.4057 LearningRate 0.1279 Epoch: 9 Global Step: 40970 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:26:23,740-Speed 5068.32 samples/sec Loss 4.4357 LearningRate 0.1279 Epoch: 9 Global Step: 40980 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:26:31,803-Speed 5081.00 samples/sec Loss 4.4523 LearningRate 0.1278 Epoch: 9 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:26:40,138-Speed 4915.00 samples/sec Loss 4.4747 LearningRate 0.1278 Epoch: 9 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:26:48,243-Speed 5054.08 samples/sec Loss 4.4348 LearningRate 0.1277 Epoch: 9 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:26:56,360-Speed 5046.71 samples/sec Loss 4.3945 LearningRate 0.1276 Epoch: 9 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:27:04,528-Speed 5015.89 samples/sec Loss 4.3562 LearningRate 0.1276 Epoch: 9 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:27:12,617-Speed 5063.90 samples/sec Loss 4.4288 LearningRate 0.1275 Epoch: 9 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:27:20,771-Speed 5024.05 samples/sec Loss 4.4296 LearningRate 0.1275 Epoch: 9 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:27:28,954-Speed 5006.15 samples/sec Loss 4.3839 LearningRate 0.1274 Epoch: 9 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:27:37,186-Speed 4976.13 samples/sec Loss 4.3610 LearningRate 0.1273 Epoch: 9 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:27:45,473-Speed 4943.67 samples/sec Loss 4.4420 LearningRate 0.1273 Epoch: 9 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:27:53,588-Speed 5048.05 samples/sec Loss 4.4346 LearningRate 0.1272 Epoch: 9 Global Step: 41090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:28:01,918-Speed 4917.95 samples/sec Loss 4.4491 LearningRate 0.1272 Epoch: 9 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:28:10,074-Speed 5022.60 samples/sec Loss 4.4207 LearningRate 0.1271 Epoch: 9 Global Step: 41110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:28:18,188-Speed 5048.79 samples/sec Loss 4.4008 LearningRate 0.1270 Epoch: 9 Global Step: 41120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:28:26,304-Speed 5047.63 samples/sec Loss 4.3674 LearningRate 0.1270 Epoch: 9 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:28:34,434-Speed 5038.85 samples/sec Loss 4.4105 LearningRate 0.1269 Epoch: 9 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:28:42,645-Speed 4989.20 samples/sec Loss 4.3603 LearningRate 0.1269 Epoch: 9 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:28:50,751-Speed 5053.27 samples/sec Loss 4.3993 LearningRate 0.1268 Epoch: 9 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:28:58,891-Speed 5033.04 samples/sec Loss 4.3464 LearningRate 0.1267 Epoch: 9 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:29:06,968-Speed 5071.70 samples/sec Loss 4.3714 LearningRate 0.1267 Epoch: 9 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:29:15,084-Speed 5047.95 samples/sec Loss 4.3860 LearningRate 0.1266 Epoch: 9 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:29:23,194-Speed 5050.82 samples/sec Loss 4.3766 LearningRate 0.1266 Epoch: 9 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:29:31,250-Speed 5084.78 samples/sec Loss 4.4134 LearningRate 0.1265 Epoch: 9 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:29:39,401-Speed 5026.23 samples/sec Loss 4.3624 LearningRate 0.1264 Epoch: 9 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:29:47,425-Speed 5105.58 samples/sec Loss 4.3697 LearningRate 0.1264 Epoch: 9 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:29:55,497-Speed 5075.15 samples/sec Loss 4.3518 LearningRate 0.1263 Epoch: 9 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:30:03,623-Speed 5041.23 samples/sec Loss 4.3861 LearningRate 0.1263 Epoch: 9 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:30:11,762-Speed 5033.01 samples/sec Loss 4.3967 LearningRate 0.1262 Epoch: 9 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:30:19,877-Speed 5047.77 samples/sec Loss 4.4189 LearningRate 0.1261 Epoch: 9 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:30:27,917-Speed 5095.78 samples/sec Loss 4.4103 LearningRate 0.1261 Epoch: 9 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:30:36,076-Speed 5020.80 samples/sec Loss 4.3889 LearningRate 0.1260 Epoch: 9 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:30:44,294-Speed 4984.90 samples/sec Loss 4.4016 LearningRate 0.1260 Epoch: 9 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:30:52,465-Speed 5013.27 samples/sec Loss 4.3993 LearningRate 0.1259 Epoch: 9 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:31:00,605-Speed 5032.66 samples/sec Loss 4.3727 LearningRate 0.1258 Epoch: 9 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:31:08,806-Speed 4995.36 samples/sec Loss 4.3356 LearningRate 0.1258 Epoch: 9 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:31:17,025-Speed 4984.36 samples/sec Loss 4.4071 LearningRate 0.1257 Epoch: 9 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:31:25,199-Speed 5011.48 samples/sec Loss 4.3783 LearningRate 0.1257 Epoch: 9 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:31:33,434-Speed 4974.75 samples/sec Loss 4.3137 LearningRate 0.1256 Epoch: 9 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:31:41,620-Speed 5004.16 samples/sec Loss 4.3434 LearningRate 0.1255 Epoch: 9 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:31:49,915-Speed 4938.53 samples/sec Loss 4.3332 LearningRate 0.1255 Epoch: 9 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:31:58,353-Speed 4854.74 samples/sec Loss 4.3356 LearningRate 0.1254 Epoch: 9 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:32:07,023-Speed 4724.80 samples/sec Loss 4.3549 LearningRate 0.1254 Epoch: 9 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:32:15,243-Speed 4983.79 samples/sec Loss 4.3460 LearningRate 0.1253 Epoch: 9 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:32:23,422-Speed 5008.66 samples/sec Loss 4.3056 LearningRate 0.1252 Epoch: 9 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:32:31,524-Speed 5056.20 samples/sec Loss 4.3031 LearningRate 0.1252 Epoch: 9 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:32:39,650-Speed 5040.78 samples/sec Loss 4.3613 LearningRate 0.1251 Epoch: 9 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:32:47,859-Speed 4990.40 samples/sec Loss 4.4164 LearningRate 0.1251 Epoch: 9 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:32:56,003-Speed 5030.53 samples/sec Loss 4.3576 LearningRate 0.1250 Epoch: 9 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:33:04,182-Speed 5008.86 samples/sec Loss 4.3321 LearningRate 0.1249 Epoch: 9 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:33:12,275-Speed 5061.73 samples/sec Loss 4.3694 LearningRate 0.1249 Epoch: 9 Global Step: 41480 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:33:20,392-Speed 5047.11 samples/sec Loss 4.3212 LearningRate 0.1248 Epoch: 9 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:33:28,547-Speed 5023.12 samples/sec Loss 4.3431 LearningRate 0.1248 Epoch: 9 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:33:36,726-Speed 5009.03 samples/sec Loss 4.2970 LearningRate 0.1247 Epoch: 9 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:33:44,885-Speed 5021.12 samples/sec Loss 4.3588 LearningRate 0.1246 Epoch: 9 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:33:52,977-Speed 5062.39 samples/sec Loss 4.3194 LearningRate 0.1246 Epoch: 9 Global Step: 41530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:34:01,143-Speed 5016.40 samples/sec Loss 4.3571 LearningRate 0.1245 Epoch: 9 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:34:09,239-Speed 5060.19 samples/sec Loss 4.3296 LearningRate 0.1245 Epoch: 9 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:34:17,585-Speed 4908.16 samples/sec Loss 4.3437 LearningRate 0.1244 Epoch: 9 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:34:25,736-Speed 5026.23 samples/sec Loss 4.3114 LearningRate 0.1243 Epoch: 9 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:34:33,946-Speed 4989.83 samples/sec Loss 4.3327 LearningRate 0.1243 Epoch: 9 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:34:42,260-Speed 4926.90 samples/sec Loss 4.2888 LearningRate 0.1242 Epoch: 9 Global Step: 41590 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:34:50,355-Speed 5060.83 samples/sec Loss 4.2867 LearningRate 0.1242 Epoch: 9 Global Step: 41600 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:34:58,460-Speed 5053.79 samples/sec Loss 4.3519 LearningRate 0.1241 Epoch: 9 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:35:06,570-Speed 5051.49 samples/sec Loss 4.3151 LearningRate 0.1240 Epoch: 9 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:35:14,800-Speed 4977.82 samples/sec Loss 4.2859 LearningRate 0.1240 Epoch: 9 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:35:22,923-Speed 5042.89 samples/sec Loss 4.3094 LearningRate 0.1239 Epoch: 9 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:35:31,106-Speed 5006.41 samples/sec Loss 4.2426 LearningRate 0.1239 Epoch: 9 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:35:39,222-Speed 5047.11 samples/sec Loss 4.3400 LearningRate 0.1238 Epoch: 9 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:35:47,315-Speed 5061.94 samples/sec Loss 4.3249 LearningRate 0.1238 Epoch: 9 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:35:55,563-Speed 4966.97 samples/sec Loss 4.3188 LearningRate 0.1237 Epoch: 9 Global Step: 41680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:36:03,877-Speed 4927.35 samples/sec Loss 4.3710 LearningRate 0.1236 Epoch: 9 Global Step: 41690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:36:12,317-Speed 4853.58 samples/sec Loss 4.2440 LearningRate 0.1236 Epoch: 9 Global Step: 41700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:36:21,341-Speed 4539.61 samples/sec Loss 4.3512 LearningRate 0.1235 Epoch: 9 Global Step: 41710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:36:29,856-Speed 4811.16 samples/sec Loss 4.2617 LearningRate 0.1235 Epoch: 9 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:37:05,306-Speed 1155.70 samples/sec Loss 3.7664 LearningRate 0.1234 Epoch: 10 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:37:13,369-Speed 5081.40 samples/sec Loss 3.7550 LearningRate 0.1233 Epoch: 10 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:37:21,417-Speed 5090.45 samples/sec Loss 3.7663 LearningRate 0.1233 Epoch: 10 Global Step: 41750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:37:29,468-Speed 5087.99 samples/sec Loss 3.7445 LearningRate 0.1232 Epoch: 10 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:37:37,603-Speed 5035.73 samples/sec Loss 3.7775 LearningRate 0.1232 Epoch: 10 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:37:45,671-Speed 5077.60 samples/sec Loss 3.7874 LearningRate 0.1231 Epoch: 10 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:37:53,771-Speed 5057.58 samples/sec Loss 3.8100 LearningRate 0.1230 Epoch: 10 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:01,950-Speed 5007.94 samples/sec Loss 3.7909 LearningRate 0.1230 Epoch: 10 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:10,046-Speed 5060.28 samples/sec Loss 3.8097 LearningRate 0.1229 Epoch: 10 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:18,116-Speed 5076.34 samples/sec Loss 3.8152 LearningRate 0.1229 Epoch: 10 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:26,503-Speed 4884.52 samples/sec Loss 3.8484 LearningRate 0.1228 Epoch: 10 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:34,705-Speed 4994.24 samples/sec Loss 3.8283 LearningRate 0.1227 Epoch: 10 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:42,791-Speed 5066.64 samples/sec Loss 3.7993 LearningRate 0.1227 Epoch: 10 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:50,881-Speed 5063.86 samples/sec Loss 3.8569 LearningRate 0.1226 Epoch: 10 Global Step: 41860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:38:58,960-Speed 5070.78 samples/sec Loss 3.8281 LearningRate 0.1226 Epoch: 10 Global Step: 41870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:39:07,003-Speed 5093.09 samples/sec Loss 3.8978 LearningRate 0.1225 Epoch: 10 Global Step: 41880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:39:15,172-Speed 5015.20 samples/sec Loss 3.8602 LearningRate 0.1225 Epoch: 10 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:39:23,527-Speed 4902.71 samples/sec Loss 3.8903 LearningRate 0.1224 Epoch: 10 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:39:31,885-Speed 4901.18 samples/sec Loss 3.9355 LearningRate 0.1223 Epoch: 10 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:39:40,066-Speed 5007.62 samples/sec Loss 3.9509 LearningRate 0.1223 Epoch: 10 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:39:48,191-Speed 5041.39 samples/sec Loss 3.8735 LearningRate 0.1222 Epoch: 10 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:39:56,491-Speed 4935.93 samples/sec Loss 3.9364 LearningRate 0.1222 Epoch: 10 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:40:04,627-Speed 5035.15 samples/sec Loss 3.9397 LearningRate 0.1221 Epoch: 10 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:40:13,091-Speed 4839.96 samples/sec Loss 3.9414 LearningRate 0.1220 Epoch: 10 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:40:21,510-Speed 4865.52 samples/sec Loss 3.9563 LearningRate 0.1220 Epoch: 10 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:40:29,640-Speed 5038.63 samples/sec Loss 3.9649 LearningRate 0.1219 Epoch: 10 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:40:37,723-Speed 5068.41 samples/sec Loss 3.9224 LearningRate 0.1219 Epoch: 10 Global Step: 41990 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:40:45,787-Speed 5079.86 samples/sec Loss 3.9595 LearningRate 0.1218 Epoch: 10 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:40:53,856-Speed 5076.60 samples/sec Loss 3.9852 LearningRate 0.1217 Epoch: 10 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:01,952-Speed 5059.85 samples/sec Loss 3.9139 LearningRate 0.1217 Epoch: 10 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:10,042-Speed 5064.16 samples/sec Loss 3.9496 LearningRate 0.1216 Epoch: 10 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:18,121-Speed 5070.36 samples/sec Loss 3.9965 LearningRate 0.1216 Epoch: 10 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:26,217-Speed 5060.00 samples/sec Loss 4.0257 LearningRate 0.1215 Epoch: 10 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:34,338-Speed 5044.04 samples/sec Loss 4.0148 LearningRate 0.1215 Epoch: 10 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:42,444-Speed 5054.41 samples/sec Loss 3.9963 LearningRate 0.1214 Epoch: 10 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:50,667-Speed 4981.63 samples/sec Loss 4.0054 LearningRate 0.1213 Epoch: 10 Global Step: 42080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:41:58,826-Speed 5020.88 samples/sec Loss 4.0143 LearningRate 0.1213 Epoch: 10 Global Step: 42090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:42:07,089-Speed 4957.59 samples/sec Loss 3.9658 LearningRate 0.1212 Epoch: 10 Global Step: 42100 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:42:15,874-Speed 4663.21 samples/sec Loss 4.0214 LearningRate 0.1212 Epoch: 10 Global Step: 42110 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:42:24,586-Speed 4702.06 samples/sec Loss 4.0960 LearningRate 0.1211 Epoch: 10 Global Step: 42120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:42:33,400-Speed 4648.11 samples/sec Loss 4.0814 LearningRate 0.1210 Epoch: 10 Global Step: 42130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:42:41,598-Speed 4996.86 samples/sec Loss 3.9980 LearningRate 0.1210 Epoch: 10 Global Step: 42140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:42:49,703-Speed 5054.42 samples/sec Loss 4.0253 LearningRate 0.1209 Epoch: 10 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:42:57,965-Speed 4958.37 samples/sec Loss 4.0411 LearningRate 0.1209 Epoch: 10 Global Step: 42160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:43:06,193-Speed 4978.38 samples/sec Loss 4.0820 LearningRate 0.1208 Epoch: 10 Global Step: 42170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:43:14,360-Speed 5015.52 samples/sec Loss 4.0277 LearningRate 0.1207 Epoch: 10 Global Step: 42180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:43:22,469-Speed 5051.99 samples/sec Loss 4.0955 LearningRate 0.1207 Epoch: 10 Global Step: 42190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:43:30,585-Speed 5047.74 samples/sec Loss 4.0543 LearningRate 0.1206 Epoch: 10 Global Step: 42200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:43:38,743-Speed 5021.46 samples/sec Loss 4.0318 LearningRate 0.1206 Epoch: 10 Global Step: 42210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:43:46,948-Speed 4993.16 samples/sec Loss 4.0587 LearningRate 0.1205 Epoch: 10 Global Step: 42220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:43:55,201-Speed 4963.24 samples/sec Loss 4.1235 LearningRate 0.1205 Epoch: 10 Global Step: 42230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:44:03,404-Speed 4994.16 samples/sec Loss 4.0904 LearningRate 0.1204 Epoch: 10 Global Step: 42240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:44:11,529-Speed 5041.93 samples/sec Loss 4.0663 LearningRate 0.1203 Epoch: 10 Global Step: 42250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:44:19,667-Speed 5033.99 samples/sec Loss 4.0794 LearningRate 0.1203 Epoch: 10 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:44:27,766-Speed 5058.64 samples/sec Loss 4.1015 LearningRate 0.1202 Epoch: 10 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:44:35,917-Speed 5025.25 samples/sec Loss 4.1514 LearningRate 0.1202 Epoch: 10 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:44:44,062-Speed 5029.38 samples/sec Loss 4.0944 LearningRate 0.1201 Epoch: 10 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:44:52,196-Speed 5036.46 samples/sec Loss 4.0932 LearningRate 0.1200 Epoch: 10 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:45:00,300-Speed 5055.29 samples/sec Loss 4.1526 LearningRate 0.1200 Epoch: 10 Global Step: 42310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:45:08,472-Speed 5012.61 samples/sec Loss 4.0838 LearningRate 0.1199 Epoch: 10 Global Step: 42320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:45:16,580-Speed 5052.93 samples/sec Loss 4.1441 LearningRate 0.1199 Epoch: 10 Global Step: 42330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:45:24,725-Speed 5029.89 samples/sec Loss 4.1310 LearningRate 0.1198 Epoch: 10 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:45:32,905-Speed 5007.84 samples/sec Loss 4.0697 LearningRate 0.1198 Epoch: 10 Global Step: 42350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:45:40,985-Speed 5070.05 samples/sec Loss 4.0968 LearningRate 0.1197 Epoch: 10 Global Step: 42360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:45:49,158-Speed 5012.04 samples/sec Loss 4.0979 LearningRate 0.1196 Epoch: 10 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:45:57,231-Speed 5074.70 samples/sec Loss 4.1442 LearningRate 0.1196 Epoch: 10 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:46:05,394-Speed 5018.51 samples/sec Loss 4.0778 LearningRate 0.1195 Epoch: 10 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:46:13,544-Speed 5026.36 samples/sec Loss 4.1618 LearningRate 0.1195 Epoch: 10 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:46:21,612-Speed 5077.45 samples/sec Loss 4.1289 LearningRate 0.1194 Epoch: 10 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:46:29,720-Speed 5052.91 samples/sec Loss 4.1410 LearningRate 0.1193 Epoch: 10 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:46:37,932-Speed 4987.93 samples/sec Loss 4.1046 LearningRate 0.1193 Epoch: 10 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:46:46,033-Speed 5056.95 samples/sec Loss 4.0617 LearningRate 0.1192 Epoch: 10 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:46:54,129-Speed 5060.00 samples/sec Loss 4.1280 LearningRate 0.1192 Epoch: 10 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:47:02,261-Speed 5037.64 samples/sec Loss 4.1430 LearningRate 0.1191 Epoch: 10 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:47:10,351-Speed 5063.77 samples/sec Loss 4.1786 LearningRate 0.1191 Epoch: 10 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:47:18,418-Speed 5077.68 samples/sec Loss 4.1635 LearningRate 0.1190 Epoch: 10 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:47:26,715-Speed 4937.39 samples/sec Loss 4.1077 LearningRate 0.1189 Epoch: 10 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:47:34,904-Speed 5003.16 samples/sec Loss 4.1325 LearningRate 0.1189 Epoch: 10 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:47:42,992-Speed 5064.35 samples/sec Loss 4.1823 LearningRate 0.1188 Epoch: 10 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:47:51,063-Speed 5075.80 samples/sec Loss 4.1582 LearningRate 0.1188 Epoch: 10 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:47:59,164-Speed 5057.01 samples/sec Loss 4.1067 LearningRate 0.1187 Epoch: 10 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:48:07,362-Speed 4997.37 samples/sec Loss 4.1143 LearningRate 0.1187 Epoch: 10 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:48:15,468-Speed 5053.63 samples/sec Loss 4.0933 LearningRate 0.1186 Epoch: 10 Global Step: 42550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:48:23,553-Speed 5066.73 samples/sec Loss 4.1056 LearningRate 0.1185 Epoch: 10 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:48:31,683-Speed 5038.80 samples/sec Loss 4.1307 LearningRate 0.1185 Epoch: 10 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:48:40,062-Speed 4888.70 samples/sec Loss 4.1560 LearningRate 0.1184 Epoch: 10 Global Step: 42580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:48:48,369-Speed 4931.62 samples/sec Loss 4.1666 LearningRate 0.1184 Epoch: 10 Global Step: 42590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:48:56,554-Speed 5004.68 samples/sec Loss 4.1761 LearningRate 0.1183 Epoch: 10 Global Step: 42600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:49:04,718-Speed 5017.89 samples/sec Loss 4.1052 LearningRate 0.1182 Epoch: 10 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:49:13,108-Speed 4883.02 samples/sec Loss 4.1447 LearningRate 0.1182 Epoch: 10 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:49:21,561-Speed 4845.64 samples/sec Loss 4.1308 LearningRate 0.1181 Epoch: 10 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:49:29,680-Speed 5045.82 samples/sec Loss 4.1359 LearningRate 0.1181 Epoch: 10 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:49:37,877-Speed 4997.89 samples/sec Loss 4.1887 LearningRate 0.1180 Epoch: 10 Global Step: 42650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:49:45,998-Speed 5044.55 samples/sec Loss 4.1853 LearningRate 0.1180 Epoch: 10 Global Step: 42660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:49:54,076-Speed 5071.00 samples/sec Loss 4.2252 LearningRate 0.1179 Epoch: 10 Global Step: 42670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:02,196-Speed 5045.20 samples/sec Loss 4.1675 LearningRate 0.1178 Epoch: 10 Global Step: 42680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:10,334-Speed 5034.07 samples/sec Loss 4.1168 LearningRate 0.1178 Epoch: 10 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:18,513-Speed 5008.55 samples/sec Loss 4.1086 LearningRate 0.1177 Epoch: 10 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:26,775-Speed 4958.16 samples/sec Loss 4.1242 LearningRate 0.1177 Epoch: 10 Global Step: 42710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:34,866-Speed 5062.93 samples/sec Loss 4.1603 LearningRate 0.1176 Epoch: 10 Global Step: 42720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:42,923-Speed 5084.77 samples/sec Loss 4.1815 LearningRate 0.1176 Epoch: 10 Global Step: 42730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:50,993-Speed 5076.35 samples/sec Loss 4.1441 LearningRate 0.1175 Epoch: 10 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:50:59,190-Speed 4997.43 samples/sec Loss 4.1618 LearningRate 0.1174 Epoch: 10 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:51:07,362-Speed 5012.71 samples/sec Loss 4.1495 LearningRate 0.1174 Epoch: 10 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:51:15,614-Speed 4964.97 samples/sec Loss 4.1451 LearningRate 0.1173 Epoch: 10 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:51:23,662-Speed 5089.61 samples/sec Loss 4.1207 LearningRate 0.1173 Epoch: 10 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:51:31,756-Speed 5060.71 samples/sec Loss 4.1617 LearningRate 0.1172 Epoch: 10 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:51:39,947-Speed 5001.51 samples/sec Loss 4.1949 LearningRate 0.1171 Epoch: 10 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:51:48,118-Speed 5013.63 samples/sec Loss 4.1489 LearningRate 0.1171 Epoch: 10 Global Step: 42810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:51:56,250-Speed 5037.55 samples/sec Loss 4.1663 LearningRate 0.1170 Epoch: 10 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:52:04,952-Speed 4707.87 samples/sec Loss 4.1429 LearningRate 0.1170 Epoch: 10 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:52:13,774-Speed 4643.09 samples/sec Loss 4.1414 LearningRate 0.1169 Epoch: 10 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:52:22,358-Speed 4772.58 samples/sec Loss 4.1132 LearningRate 0.1169 Epoch: 10 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:52:30,513-Speed 5022.91 samples/sec Loss 4.1292 LearningRate 0.1168 Epoch: 10 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:52:38,739-Speed 4979.86 samples/sec Loss 4.1303 LearningRate 0.1167 Epoch: 10 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:52:47,145-Speed 4873.54 samples/sec Loss 4.1099 LearningRate 0.1167 Epoch: 10 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:52:55,529-Speed 4885.77 samples/sec Loss 4.1731 LearningRate 0.1166 Epoch: 10 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:53:03,608-Speed 5070.96 samples/sec Loss 4.0858 LearningRate 0.1166 Epoch: 10 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 07:53:11,772-Speed 5017.89 samples/sec Loss 4.0782 LearningRate 0.1165 Epoch: 10 Global Step: 42910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:53:19,940-Speed 5015.38 samples/sec Loss 4.1690 LearningRate 0.1165 Epoch: 10 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:53:28,246-Speed 4932.10 samples/sec Loss 4.1533 LearningRate 0.1164 Epoch: 10 Global Step: 42930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:53:36,666-Speed 4865.08 samples/sec Loss 4.0962 LearningRate 0.1163 Epoch: 10 Global Step: 42940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:53:45,084-Speed 4866.80 samples/sec Loss 4.1405 LearningRate 0.1163 Epoch: 10 Global Step: 42950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:53:53,363-Speed 4947.89 samples/sec Loss 4.1538 LearningRate 0.1162 Epoch: 10 Global Step: 42960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:54:01,540-Speed 5009.35 samples/sec Loss 4.1468 LearningRate 0.1162 Epoch: 10 Global Step: 42970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:54:09,679-Speed 5033.64 samples/sec Loss 4.1974 LearningRate 0.1161 Epoch: 10 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:54:17,851-Speed 5013.13 samples/sec Loss 4.0913 LearningRate 0.1161 Epoch: 10 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:54:25,982-Speed 5037.54 samples/sec Loss 4.1670 LearningRate 0.1160 Epoch: 10 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:54:34,207-Speed 4981.32 samples/sec Loss 4.1495 LearningRate 0.1159 Epoch: 10 Global Step: 43010 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:54:42,483-Speed 4949.60 samples/sec Loss 4.1446 LearningRate 0.1159 Epoch: 10 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:54:50,568-Speed 5067.08 samples/sec Loss 4.1481 LearningRate 0.1158 Epoch: 10 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:54:58,722-Speed 5024.08 samples/sec Loss 4.1044 LearningRate 0.1158 Epoch: 10 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:55:06,834-Speed 5049.40 samples/sec Loss 4.1427 LearningRate 0.1157 Epoch: 10 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:55:14,971-Speed 5035.22 samples/sec Loss 4.1931 LearningRate 0.1157 Epoch: 10 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:55:23,096-Speed 5041.88 samples/sec Loss 4.1172 LearningRate 0.1156 Epoch: 10 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:55:31,222-Speed 5041.48 samples/sec Loss 4.0932 LearningRate 0.1155 Epoch: 10 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:55:39,349-Speed 5040.47 samples/sec Loss 4.1213 LearningRate 0.1155 Epoch: 10 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:55:47,708-Speed 4900.90 samples/sec Loss 4.1360 LearningRate 0.1154 Epoch: 10 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:55:55,793-Speed 5066.83 samples/sec Loss 4.1266 LearningRate 0.1154 Epoch: 10 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:56:03,872-Speed 5070.27 samples/sec Loss 4.1023 LearningRate 0.1153 Epoch: 10 Global Step: 43120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:56:11,977-Speed 5054.87 samples/sec Loss 4.1314 LearningRate 0.1153 Epoch: 10 Global Step: 43130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:56:20,078-Speed 5056.89 samples/sec Loss 4.1054 LearningRate 0.1152 Epoch: 10 Global Step: 43140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:56:28,203-Speed 5041.50 samples/sec Loss 4.1587 LearningRate 0.1151 Epoch: 10 Global Step: 43150 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:56:36,334-Speed 5038.73 samples/sec Loss 4.1730 LearningRate 0.1151 Epoch: 10 Global Step: 43160 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:56:44,462-Speed 5039.82 samples/sec Loss 4.1203 LearningRate 0.1150 Epoch: 10 Global Step: 43170 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 07:56:52,546-Speed 5067.41 samples/sec Loss 4.1763 LearningRate 0.1150 Epoch: 10 Global Step: 43180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:00,677-Speed 5037.95 samples/sec Loss 4.1414 LearningRate 0.1149 Epoch: 10 Global Step: 43190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:08,746-Speed 5077.47 samples/sec Loss 4.1578 LearningRate 0.1149 Epoch: 10 Global Step: 43200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:16,934-Speed 5002.65 samples/sec Loss 4.1197 LearningRate 0.1148 Epoch: 10 Global Step: 43210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:25,144-Speed 4989.75 samples/sec Loss 4.1173 LearningRate 0.1147 Epoch: 10 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:33,294-Speed 5026.43 samples/sec Loss 4.1252 LearningRate 0.1147 Epoch: 10 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:41,496-Speed 4994.90 samples/sec Loss 4.1089 LearningRate 0.1146 Epoch: 10 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:49,665-Speed 5014.39 samples/sec Loss 4.1162 LearningRate 0.1146 Epoch: 10 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:57:57,863-Speed 4997.03 samples/sec Loss 4.0720 LearningRate 0.1145 Epoch: 10 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:58:06,069-Speed 4992.07 samples/sec Loss 4.1448 LearningRate 0.1145 Epoch: 10 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:58:14,294-Speed 4980.44 samples/sec Loss 4.1434 LearningRate 0.1144 Epoch: 10 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:58:22,437-Speed 5031.25 samples/sec Loss 4.1358 LearningRate 0.1143 Epoch: 10 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:58:30,516-Speed 5070.76 samples/sec Loss 4.1282 LearningRate 0.1143 Epoch: 10 Global Step: 43300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:58:38,568-Speed 5087.37 samples/sec Loss 4.1190 LearningRate 0.1142 Epoch: 10 Global Step: 43310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:58:46,699-Speed 5038.03 samples/sec Loss 4.1303 LearningRate 0.1142 Epoch: 10 Global Step: 43320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:58:54,753-Speed 5085.96 samples/sec Loss 4.0901 LearningRate 0.1141 Epoch: 10 Global Step: 43330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:02,878-Speed 5042.32 samples/sec Loss 4.1241 LearningRate 0.1141 Epoch: 10 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:10,979-Speed 5056.69 samples/sec Loss 4.1261 LearningRate 0.1140 Epoch: 10 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:19,059-Speed 5069.89 samples/sec Loss 4.1418 LearningRate 0.1139 Epoch: 10 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:27,165-Speed 5053.97 samples/sec Loss 4.1174 LearningRate 0.1139 Epoch: 10 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:35,329-Speed 5017.38 samples/sec Loss 4.1154 LearningRate 0.1138 Epoch: 10 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:43,441-Speed 5050.42 samples/sec Loss 4.1281 LearningRate 0.1138 Epoch: 10 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:51,528-Speed 5065.78 samples/sec Loss 4.0706 LearningRate 0.1137 Epoch: 10 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 07:59:59,624-Speed 5060.11 samples/sec Loss 4.0429 LearningRate 0.1137 Epoch: 10 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:00:07,708-Speed 5067.28 samples/sec Loss 4.1068 LearningRate 0.1136 Epoch: 10 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:00:15,883-Speed 5010.96 samples/sec Loss 4.1010 LearningRate 0.1135 Epoch: 10 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:00:23,964-Speed 5069.18 samples/sec Loss 4.1641 LearningRate 0.1135 Epoch: 10 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:00:32,094-Speed 5039.18 samples/sec Loss 4.1547 LearningRate 0.1134 Epoch: 10 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:00:40,678-Speed 4772.41 samples/sec Loss 4.1186 LearningRate 0.1134 Epoch: 10 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:00:49,421-Speed 4685.17 samples/sec Loss 4.1202 LearningRate 0.1133 Epoch: 10 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:00:57,570-Speed 5027.99 samples/sec Loss 4.1466 LearningRate 0.1133 Epoch: 10 Global Step: 43480 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 08:01:05,838-Speed 4954.99 samples/sec Loss 4.1248 LearningRate 0.1132 Epoch: 10 Global Step: 43490 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 08:01:14,103-Speed 4956.14 samples/sec Loss 4.0946 LearningRate 0.1131 Epoch: 10 Global Step: 43500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-17 08:01:22,237-Speed 5036.54 samples/sec Loss 4.0975 LearningRate 0.1131 Epoch: 10 Global Step: 43510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:01:30,539-Speed 4934.42 samples/sec Loss 4.0611 LearningRate 0.1130 Epoch: 10 Global Step: 43520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:01:38,674-Speed 5036.07 samples/sec Loss 4.1415 LearningRate 0.1130 Epoch: 10 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:01:46,894-Speed 4983.10 samples/sec Loss 4.1307 LearningRate 0.1129 Epoch: 10 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:01:55,092-Speed 4997.12 samples/sec Loss 4.0782 LearningRate 0.1129 Epoch: 10 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:02:03,235-Speed 5030.92 samples/sec Loss 4.1360 LearningRate 0.1128 Epoch: 10 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:02:11,418-Speed 5006.03 samples/sec Loss 4.0790 LearningRate 0.1128 Epoch: 10 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:02:19,515-Speed 5059.31 samples/sec Loss 4.0901 LearningRate 0.1127 Epoch: 10 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:02:27,756-Speed 4970.83 samples/sec Loss 4.0792 LearningRate 0.1126 Epoch: 10 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:02:36,132-Speed 4891.01 samples/sec Loss 4.1463 LearningRate 0.1126 Epoch: 10 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:02:44,238-Speed 5053.39 samples/sec Loss 4.0790 LearningRate 0.1125 Epoch: 10 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:02:52,337-Speed 5057.71 samples/sec Loss 4.1203 LearningRate 0.1125 Epoch: 10 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-17 08:03:00,419-Speed 5069.35 samples/sec Loss 4.1180 LearningRate 0.1124 Epoch: 10 Global Step: 43630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:03:08,566-Speed 5028.42 samples/sec Loss 4.1001 LearningRate 0.1124 Epoch: 10 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:03:16,811-Speed 4968.01 samples/sec Loss 4.0626 LearningRate 0.1123 Epoch: 10 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:03:25,070-Speed 4959.91 samples/sec Loss 4.0884 LearningRate 0.1122 Epoch: 10 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:03:33,235-Speed 5017.78 samples/sec Loss 4.0582 LearningRate 0.1122 Epoch: 10 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:03:41,427-Speed 5000.25 samples/sec Loss 4.1158 LearningRate 0.1121 Epoch: 10 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:03:49,555-Speed 5040.09 samples/sec Loss 4.0752 LearningRate 0.1121 Epoch: 10 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-17 08:03:57,665-Speed 5051.74 samples/sec Loss 4.0602 LearningRate 0.1120 Epoch: 10 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:04:06,072-Speed 4872.20 samples/sec Loss 4.1153 LearningRate 0.1120 Epoch: 10 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:04:14,455-Speed 4887.17 samples/sec Loss 4.0846 LearningRate 0.1119 Epoch: 10 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:04:22,926-Speed 4836.07 samples/sec Loss 4.0194 LearningRate 0.1118 Epoch: 10 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:04:31,107-Speed 5007.68 samples/sec Loss 4.0524 LearningRate 0.1118 Epoch: 10 Global Step: 43740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:04:39,290-Speed 5006.46 samples/sec Loss 4.0641 LearningRate 0.1117 Epoch: 10 Global Step: 43750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:04:47,423-Speed 5036.63 samples/sec Loss 4.0395 LearningRate 0.1117 Epoch: 10 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:04:55,507-Speed 5067.73 samples/sec Loss 4.1042 LearningRate 0.1116 Epoch: 10 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:05:03,592-Speed 5066.39 samples/sec Loss 4.0686 LearningRate 0.1116 Epoch: 10 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:05:11,684-Speed 5062.49 samples/sec Loss 4.1163 LearningRate 0.1115 Epoch: 10 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:05:19,973-Speed 4942.32 samples/sec Loss 4.0500 LearningRate 0.1115 Epoch: 10 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:05:28,374-Speed 4876.08 samples/sec Loss 4.0530 LearningRate 0.1114 Epoch: 10 Global Step: 43810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:05:36,654-Speed 4947.60 samples/sec Loss 4.0143 LearningRate 0.1113 Epoch: 10 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:05:44,945-Speed 4941.30 samples/sec Loss 4.0261 LearningRate 0.1113 Epoch: 10 Global Step: 43830 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 08:05:53,052-Speed 5052.95 samples/sec Loss 4.0216 LearningRate 0.1112 Epoch: 10 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:01,163-Speed 5050.72 samples/sec Loss 4.0867 LearningRate 0.1112 Epoch: 10 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:09,258-Speed 5060.42 samples/sec Loss 4.0466 LearningRate 0.1111 Epoch: 10 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:17,377-Speed 5046.02 samples/sec Loss 4.0984 LearningRate 0.1111 Epoch: 10 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:25,459-Speed 5068.24 samples/sec Loss 4.0031 LearningRate 0.1110 Epoch: 10 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:33,551-Speed 5062.25 samples/sec Loss 4.1053 LearningRate 0.1109 Epoch: 10 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:41,696-Speed 5029.88 samples/sec Loss 4.0446 LearningRate 0.1109 Epoch: 10 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:49,793-Speed 5058.94 samples/sec Loss 4.0501 LearningRate 0.1108 Epoch: 10 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:06:57,867-Speed 5074.12 samples/sec Loss 4.0693 LearningRate 0.1108 Epoch: 10 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:07:05,956-Speed 5064.57 samples/sec Loss 4.0659 LearningRate 0.1107 Epoch: 10 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:07:14,111-Speed 5023.16 samples/sec Loss 4.0779 LearningRate 0.1107 Epoch: 10 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:07:22,212-Speed 5057.14 samples/sec Loss 4.0416 LearningRate 0.1106 Epoch: 10 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:07:30,339-Speed 5040.18 samples/sec Loss 4.0698 LearningRate 0.1106 Epoch: 10 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:07:38,508-Speed 5014.97 samples/sec Loss 4.0716 LearningRate 0.1105 Epoch: 10 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:07:46,654-Speed 5028.52 samples/sec Loss 4.0537 LearningRate 0.1104 Epoch: 10 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:07:54,958-Speed 4933.36 samples/sec Loss 4.0291 LearningRate 0.1104 Epoch: 10 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:08:03,038-Speed 5069.91 samples/sec Loss 4.0909 LearningRate 0.1103 Epoch: 10 Global Step: 44000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:08:11,189-Speed 5025.91 samples/sec Loss 4.0561 LearningRate 0.1103 Epoch: 10 Global Step: 44010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:08:19,274-Speed 5066.96 samples/sec Loss 4.0977 LearningRate 0.1102 Epoch: 10 Global Step: 44020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:08:27,391-Speed 5046.95 samples/sec Loss 4.0176 LearningRate 0.1102 Epoch: 10 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:08:35,535-Speed 5030.17 samples/sec Loss 4.0538 LearningRate 0.1101 Epoch: 10 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:08:43,687-Speed 5024.99 samples/sec Loss 4.0920 LearningRate 0.1101 Epoch: 10 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:08:51,829-Speed 5031.45 samples/sec Loss 4.0713 LearningRate 0.1100 Epoch: 10 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:09:00,080-Speed 4965.26 samples/sec Loss 4.0542 LearningRate 0.1099 Epoch: 10 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:09:08,325-Speed 4968.03 samples/sec Loss 4.0245 LearningRate 0.1099 Epoch: 10 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:09:16,425-Speed 5057.16 samples/sec Loss 4.0397 LearningRate 0.1098 Epoch: 10 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:09:24,619-Speed 4999.97 samples/sec Loss 4.0347 LearningRate 0.1098 Epoch: 10 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:09:32,805-Speed 5004.36 samples/sec Loss 4.0465 LearningRate 0.1097 Epoch: 10 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:09:40,940-Speed 5035.67 samples/sec Loss 4.0406 LearningRate 0.1097 Epoch: 10 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:09:49,012-Speed 5075.06 samples/sec Loss 4.0477 LearningRate 0.1096 Epoch: 10 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:09:57,165-Speed 5024.33 samples/sec Loss 4.0180 LearningRate 0.1095 Epoch: 10 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:10:05,287-Speed 5043.47 samples/sec Loss 4.0130 LearningRate 0.1095 Epoch: 10 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:10:13,390-Speed 5055.99 samples/sec Loss 4.0168 LearningRate 0.1094 Epoch: 10 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:10:21,585-Speed 4998.55 samples/sec Loss 3.9670 LearningRate 0.1094 Epoch: 10 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:10:29,702-Speed 5047.39 samples/sec Loss 4.0251 LearningRate 0.1093 Epoch: 10 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:10:37,839-Speed 5033.93 samples/sec Loss 4.0575 LearningRate 0.1093 Epoch: 10 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:10:46,067-Speed 4978.51 samples/sec Loss 4.0057 LearningRate 0.1092 Epoch: 10 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:10:54,753-Speed 4716.33 samples/sec Loss 4.0339 LearningRate 0.1092 Epoch: 10 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:11:02,836-Speed 5068.24 samples/sec Loss 4.0204 LearningRate 0.1091 Epoch: 10 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:11:10,942-Speed 5053.69 samples/sec Loss 4.0000 LearningRate 0.1090 Epoch: 10 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:11:19,113-Speed 5014.17 samples/sec Loss 4.0250 LearningRate 0.1090 Epoch: 10 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:11:27,423-Speed 4929.50 samples/sec Loss 4.0660 LearningRate 0.1089 Epoch: 10 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:11:35,497-Speed 5073.62 samples/sec Loss 4.0069 LearningRate 0.1089 Epoch: 10 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:11:43,568-Speed 5075.72 samples/sec Loss 4.0606 LearningRate 0.1088 Epoch: 10 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:11:51,943-Speed 4891.74 samples/sec Loss 4.0296 LearningRate 0.1088 Epoch: 10 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:00,200-Speed 4961.05 samples/sec Loss 4.0280 LearningRate 0.1087 Epoch: 10 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:08,282-Speed 5068.40 samples/sec Loss 4.0166 LearningRate 0.1087 Epoch: 10 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:16,470-Speed 5002.80 samples/sec Loss 3.9848 LearningRate 0.1086 Epoch: 10 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:25,072-Speed 4762.56 samples/sec Loss 3.9990 LearningRate 0.1085 Epoch: 10 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:33,198-Speed 5041.30 samples/sec Loss 4.0405 LearningRate 0.1085 Epoch: 10 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:41,301-Speed 5055.31 samples/sec Loss 4.0599 LearningRate 0.1084 Epoch: 10 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:49,397-Speed 5059.99 samples/sec Loss 4.0394 LearningRate 0.1084 Epoch: 10 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:12:57,682-Speed 4944.47 samples/sec Loss 4.0684 LearningRate 0.1083 Epoch: 10 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:13:05,982-Speed 4936.42 samples/sec Loss 4.0087 LearningRate 0.1083 Epoch: 10 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:13:14,097-Speed 5047.55 samples/sec Loss 4.0593 LearningRate 0.1082 Epoch: 10 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:13:22,218-Speed 5044.32 samples/sec Loss 4.0274 LearningRate 0.1082 Epoch: 10 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:13:30,493-Speed 4950.78 samples/sec Loss 4.0544 LearningRate 0.1081 Epoch: 10 Global Step: 44400 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 08:13:38,806-Speed 4927.65 samples/sec Loss 4.0087 LearningRate 0.1080 Epoch: 10 Global Step: 44410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 08:13:46,920-Speed 5049.37 samples/sec Loss 4.0285 LearningRate 0.1080 Epoch: 10 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:13:54,988-Speed 5076.79 samples/sec Loss 3.9981 LearningRate 0.1079 Epoch: 10 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:14:03,230-Speed 4970.47 samples/sec Loss 4.0106 LearningRate 0.1079 Epoch: 10 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:14:11,336-Speed 5054.32 samples/sec Loss 4.0437 LearningRate 0.1078 Epoch: 10 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:14:19,490-Speed 5023.68 samples/sec Loss 3.9861 LearningRate 0.1078 Epoch: 10 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:14:27,673-Speed 5006.39 samples/sec Loss 4.0842 LearningRate 0.1077 Epoch: 10 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:14:35,727-Speed 5086.19 samples/sec Loss 3.9992 LearningRate 0.1077 Epoch: 10 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:14:43,876-Speed 5027.22 samples/sec Loss 3.9314 LearningRate 0.1076 Epoch: 10 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:14:52,056-Speed 5007.76 samples/sec Loss 4.0110 LearningRate 0.1076 Epoch: 10 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:15:00,203-Speed 5027.90 samples/sec Loss 3.9674 LearningRate 0.1075 Epoch: 10 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:15:08,288-Speed 5066.69 samples/sec Loss 4.0349 LearningRate 0.1074 Epoch: 10 Global Step: 44520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:15:16,552-Speed 4957.63 samples/sec Loss 3.9530 LearningRate 0.1074 Epoch: 10 Global Step: 44530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:15:24,643-Speed 5062.96 samples/sec Loss 3.9590 LearningRate 0.1073 Epoch: 10 Global Step: 44540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:15:32,757-Speed 5048.63 samples/sec Loss 3.9395 LearningRate 0.1073 Epoch: 10 Global Step: 44550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:15:40,852-Speed 5060.96 samples/sec Loss 3.9846 LearningRate 0.1072 Epoch: 10 Global Step: 44560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:15:48,946-Speed 5060.60 samples/sec Loss 4.0009 LearningRate 0.1072 Epoch: 10 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:15:57,071-Speed 5043.07 samples/sec Loss 3.9861 LearningRate 0.1071 Epoch: 10 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:16:05,165-Speed 5061.22 samples/sec Loss 3.9629 LearningRate 0.1071 Epoch: 10 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:16:13,291-Speed 5041.32 samples/sec Loss 4.0150 LearningRate 0.1070 Epoch: 10 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:16:21,483-Speed 5000.57 samples/sec Loss 3.9595 LearningRate 0.1069 Epoch: 10 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:16:29,783-Speed 4935.75 samples/sec Loss 3.9867 LearningRate 0.1069 Epoch: 10 Global Step: 44620 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 08:16:37,946-Speed 5017.68 samples/sec Loss 3.9470 LearningRate 0.1068 Epoch: 10 Global Step: 44630 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 08:16:46,123-Speed 5010.53 samples/sec Loss 3.9711 LearningRate 0.1068 Epoch: 10 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:16:54,272-Speed 5027.07 samples/sec Loss 4.0112 LearningRate 0.1067 Epoch: 10 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:02,379-Speed 5052.61 samples/sec Loss 3.9999 LearningRate 0.1067 Epoch: 10 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:10,560-Speed 5007.59 samples/sec Loss 3.9322 LearningRate 0.1066 Epoch: 10 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:18,663-Speed 5055.62 samples/sec Loss 3.9529 LearningRate 0.1066 Epoch: 10 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:26,775-Speed 5049.74 samples/sec Loss 3.9513 LearningRate 0.1065 Epoch: 10 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:34,900-Speed 5042.31 samples/sec Loss 3.9243 LearningRate 0.1064 Epoch: 10 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:43,119-Speed 4984.15 samples/sec Loss 3.9664 LearningRate 0.1064 Epoch: 10 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:51,233-Speed 5048.53 samples/sec Loss 3.9563 LearningRate 0.1063 Epoch: 10 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:17:59,396-Speed 5018.57 samples/sec Loss 3.9864 LearningRate 0.1063 Epoch: 10 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:18:07,707-Speed 4928.63 samples/sec Loss 4.0036 LearningRate 0.1062 Epoch: 10 Global Step: 44740 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 08:18:15,810-Speed 5055.73 samples/sec Loss 3.9836 LearningRate 0.1062 Epoch: 10 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:18:23,883-Speed 5074.25 samples/sec Loss 3.8914 LearningRate 0.1061 Epoch: 10 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:18:32,048-Speed 5017.59 samples/sec Loss 3.9468 LearningRate 0.1061 Epoch: 10 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:18:40,156-Speed 5052.48 samples/sec Loss 3.9920 LearningRate 0.1060 Epoch: 10 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:18:48,254-Speed 5058.57 samples/sec Loss 3.9463 LearningRate 0.1060 Epoch: 10 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:18:56,463-Speed 4990.10 samples/sec Loss 3.9780 LearningRate 0.1059 Epoch: 10 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:19:04,553-Speed 5063.50 samples/sec Loss 3.9224 LearningRate 0.1058 Epoch: 10 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:19:12,631-Speed 5071.85 samples/sec Loss 3.9541 LearningRate 0.1058 Epoch: 10 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:19:20,718-Speed 5065.71 samples/sec Loss 3.9356 LearningRate 0.1057 Epoch: 10 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:19:28,811-Speed 5061.40 samples/sec Loss 3.9233 LearningRate 0.1057 Epoch: 10 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:19:36,916-Speed 5054.22 samples/sec Loss 3.9842 LearningRate 0.1056 Epoch: 10 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:19:44,992-Speed 5072.70 samples/sec Loss 3.9321 LearningRate 0.1056 Epoch: 10 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:19:53,130-Speed 5033.78 samples/sec Loss 3.9079 LearningRate 0.1055 Epoch: 10 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:20:01,249-Speed 5046.08 samples/sec Loss 4.0005 LearningRate 0.1055 Epoch: 10 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:20:09,441-Speed 4999.91 samples/sec Loss 3.9015 LearningRate 0.1054 Epoch: 10 Global Step: 44890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:20:17,652-Speed 4989.52 samples/sec Loss 3.9618 LearningRate 0.1054 Epoch: 10 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:20:25,807-Speed 5023.30 samples/sec Loss 3.9966 LearningRate 0.1053 Epoch: 10 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:20:34,046-Speed 4972.16 samples/sec Loss 3.9459 LearningRate 0.1052 Epoch: 10 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:20:42,205-Speed 5021.28 samples/sec Loss 3.9646 LearningRate 0.1052 Epoch: 10 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:20:50,268-Speed 5080.35 samples/sec Loss 3.9569 LearningRate 0.1051 Epoch: 10 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:20:58,414-Speed 5028.98 samples/sec Loss 3.9631 LearningRate 0.1051 Epoch: 10 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:21:06,540-Speed 5041.15 samples/sec Loss 3.9087 LearningRate 0.1050 Epoch: 10 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:21:14,761-Speed 4982.94 samples/sec Loss 3.9233 LearningRate 0.1050 Epoch: 10 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:21:22,911-Speed 5026.30 samples/sec Loss 3.9464 LearningRate 0.1049 Epoch: 10 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:21:31,025-Speed 5048.81 samples/sec Loss 3.9674 LearningRate 0.1049 Epoch: 10 Global Step: 44990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:21:39,241-Speed 4985.71 samples/sec Loss 3.9796 LearningRate 0.1048 Epoch: 10 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:22:26,639-[lfw][45000]XNorm: 21.809927 Training: 2022-01-17 08:22:26,640-[lfw][45000]Accuracy-Flip: 0.99717+-0.00299 Training: 2022-01-17 08:22:26,641-[lfw][45000]Accuracy-Highest: 0.99817 Training: 2022-01-17 08:23:21,412-[cfp_fp][45000]XNorm: 19.531434 Training: 2022-01-17 08:23:21,413-[cfp_fp][45000]Accuracy-Flip: 0.98486+-0.00674 Training: 2022-01-17 08:23:21,413-[cfp_fp][45000]Accuracy-Highest: 0.98486 Training: 2022-01-17 08:24:08,526-[agedb_30][45000]XNorm: 21.123562 Training: 2022-01-17 08:24:08,527-[agedb_30][45000]Accuracy-Flip: 0.97717+-0.00715 Training: 2022-01-17 08:24:08,527-[agedb_30][45000]Accuracy-Highest: 0.97783 Training: 2022-01-17 08:24:16,614-Speed 260.28 samples/sec Loss 3.9548 LearningRate 0.1048 Epoch: 10 Global Step: 45010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:24:24,705-Speed 5062.79 samples/sec Loss 3.9128 LearningRate 0.1047 Epoch: 10 Global Step: 45020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:24:32,839-Speed 5036.60 samples/sec Loss 3.9410 LearningRate 0.1046 Epoch: 10 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:24:40,988-Speed 5027.07 samples/sec Loss 3.9470 LearningRate 0.1046 Epoch: 10 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:24:49,046-Speed 5083.45 samples/sec Loss 3.9313 LearningRate 0.1045 Epoch: 10 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:24:57,284-Speed 4973.13 samples/sec Loss 3.9109 LearningRate 0.1045 Epoch: 10 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:25:05,570-Speed 4943.65 samples/sec Loss 3.9182 LearningRate 0.1044 Epoch: 10 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:25:14,124-Speed 4789.02 samples/sec Loss 3.9013 LearningRate 0.1044 Epoch: 10 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:25:22,676-Speed 4790.33 samples/sec Loss 3.9020 LearningRate 0.1043 Epoch: 10 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:25:31,028-Speed 4904.63 samples/sec Loss 3.8944 LearningRate 0.1043 Epoch: 10 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:25:39,469-Speed 4853.13 samples/sec Loss 3.9157 LearningRate 0.1042 Epoch: 10 Global Step: 45110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:25:47,703-Speed 4974.95 samples/sec Loss 3.9803 LearningRate 0.1042 Epoch: 10 Global Step: 45120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:25:56,197-Speed 4823.13 samples/sec Loss 3.8877 LearningRate 0.1041 Epoch: 10 Global Step: 45130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:26:04,504-Speed 4931.26 samples/sec Loss 3.9666 LearningRate 0.1040 Epoch: 10 Global Step: 45140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:26:12,532-Speed 5102.80 samples/sec Loss 3.8676 LearningRate 0.1040 Epoch: 10 Global Step: 45150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:26:20,598-Speed 5078.60 samples/sec Loss 3.9299 LearningRate 0.1039 Epoch: 10 Global Step: 45160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:26:28,851-Speed 4963.68 samples/sec Loss 3.9049 LearningRate 0.1039 Epoch: 10 Global Step: 45170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:26:36,932-Speed 5069.40 samples/sec Loss 3.9166 LearningRate 0.1038 Epoch: 10 Global Step: 45180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:26:45,037-Speed 5054.68 samples/sec Loss 3.8854 LearningRate 0.1038 Epoch: 10 Global Step: 45190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:26:53,653-Speed 4754.88 samples/sec Loss 3.9161 LearningRate 0.1037 Epoch: 10 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:27:02,538-Speed 4610.67 samples/sec Loss 3.8788 LearningRate 0.1037 Epoch: 10 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:27:10,909-Speed 4893.08 samples/sec Loss 3.9122 LearningRate 0.1036 Epoch: 10 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:27:19,094-Speed 5005.32 samples/sec Loss 3.9625 LearningRate 0.1036 Epoch: 10 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:27:27,148-Speed 5086.17 samples/sec Loss 3.9405 LearningRate 0.1035 Epoch: 10 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:27:35,244-Speed 5060.11 samples/sec Loss 3.9078 LearningRate 0.1034 Epoch: 10 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:27:43,317-Speed 5074.44 samples/sec Loss 3.9427 LearningRate 0.1034 Epoch: 10 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:27:51,577-Speed 4959.47 samples/sec Loss 3.8732 LearningRate 0.1033 Epoch: 10 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:27:59,625-Speed 5090.69 samples/sec Loss 3.8835 LearningRate 0.1033 Epoch: 10 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:28:07,638-Speed 5112.53 samples/sec Loss 3.8500 LearningRate 0.1032 Epoch: 10 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:28:15,732-Speed 5060.88 samples/sec Loss 3.9528 LearningRate 0.1032 Epoch: 10 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:28:23,872-Speed 5032.87 samples/sec Loss 3.8777 LearningRate 0.1031 Epoch: 10 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:28:31,978-Speed 5053.41 samples/sec Loss 3.8678 LearningRate 0.1031 Epoch: 10 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:28:40,077-Speed 5058.41 samples/sec Loss 3.8712 LearningRate 0.1030 Epoch: 10 Global Step: 45330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:28:48,098-Speed 5107.33 samples/sec Loss 3.9201 LearningRate 0.1030 Epoch: 10 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:28:56,154-Speed 5084.87 samples/sec Loss 3.8640 LearningRate 0.1029 Epoch: 10 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:29:04,294-Speed 5032.93 samples/sec Loss 3.8625 LearningRate 0.1029 Epoch: 10 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:29:12,342-Speed 5089.74 samples/sec Loss 3.8904 LearningRate 0.1028 Epoch: 10 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:29:20,479-Speed 5034.46 samples/sec Loss 3.8365 LearningRate 0.1027 Epoch: 10 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:29:28,586-Speed 5053.01 samples/sec Loss 3.8842 LearningRate 0.1027 Epoch: 10 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:29:36,780-Speed 4999.69 samples/sec Loss 3.8479 LearningRate 0.1026 Epoch: 10 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:29:44,979-Speed 4996.16 samples/sec Loss 3.8412 LearningRate 0.1026 Epoch: 10 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:29:53,120-Speed 5032.18 samples/sec Loss 3.8726 LearningRate 0.1025 Epoch: 10 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:01,237-Speed 5046.71 samples/sec Loss 3.8590 LearningRate 0.1025 Epoch: 10 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:09,520-Speed 4945.59 samples/sec Loss 3.8717 LearningRate 0.1024 Epoch: 10 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:17,593-Speed 5074.49 samples/sec Loss 3.8600 LearningRate 0.1024 Epoch: 10 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:25,876-Speed 4945.54 samples/sec Loss 3.8595 LearningRate 0.1023 Epoch: 10 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:33,973-Speed 5060.06 samples/sec Loss 3.8364 LearningRate 0.1023 Epoch: 10 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:42,074-Speed 5057.34 samples/sec Loss 3.8359 LearningRate 0.1022 Epoch: 10 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:50,111-Speed 5096.29 samples/sec Loss 3.8292 LearningRate 0.1022 Epoch: 10 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:30:58,227-Speed 5047.70 samples/sec Loss 3.8376 LearningRate 0.1021 Epoch: 10 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:31:06,350-Speed 5043.68 samples/sec Loss 3.8610 LearningRate 0.1020 Epoch: 10 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:31:14,647-Speed 4937.40 samples/sec Loss 3.8658 LearningRate 0.1020 Epoch: 10 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:31:22,822-Speed 5010.93 samples/sec Loss 3.8528 LearningRate 0.1019 Epoch: 10 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:31:30,947-Speed 5041.44 samples/sec Loss 3.8465 LearningRate 0.1019 Epoch: 10 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:31:39,137-Speed 5002.04 samples/sec Loss 3.8515 LearningRate 0.1018 Epoch: 10 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:31:47,384-Speed 4967.16 samples/sec Loss 3.8966 LearningRate 0.1018 Epoch: 10 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:31:55,522-Speed 5033.70 samples/sec Loss 3.8506 LearningRate 0.1017 Epoch: 10 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:32:03,739-Speed 4985.35 samples/sec Loss 3.8979 LearningRate 0.1017 Epoch: 10 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:32:12,170-Speed 4859.41 samples/sec Loss 3.8732 LearningRate 0.1016 Epoch: 10 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:32:20,460-Speed 4941.42 samples/sec Loss 3.8431 LearningRate 0.1016 Epoch: 10 Global Step: 45600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:32:28,649-Speed 5002.26 samples/sec Loss 3.9016 LearningRate 0.1015 Epoch: 10 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:32:36,882-Speed 4975.85 samples/sec Loss 3.7791 LearningRate 0.1015 Epoch: 10 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:32:45,024-Speed 5031.38 samples/sec Loss 3.8207 LearningRate 0.1014 Epoch: 10 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:32:53,153-Speed 5039.50 samples/sec Loss 3.8273 LearningRate 0.1013 Epoch: 10 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:33:01,237-Speed 5067.58 samples/sec Loss 3.8774 LearningRate 0.1013 Epoch: 10 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:33:09,330-Speed 5061.45 samples/sec Loss 3.8320 LearningRate 0.1012 Epoch: 10 Global Step: 45660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:33:17,456-Speed 5041.14 samples/sec Loss 3.8389 LearningRate 0.1012 Epoch: 10 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:33:25,561-Speed 5054.71 samples/sec Loss 3.8173 LearningRate 0.1011 Epoch: 10 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:33:33,697-Speed 5035.14 samples/sec Loss 3.8028 LearningRate 0.1011 Epoch: 10 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:33:41,915-Speed 4985.03 samples/sec Loss 3.7795 LearningRate 0.1010 Epoch: 10 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:33:50,095-Speed 5008.12 samples/sec Loss 3.8008 LearningRate 0.1010 Epoch: 10 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:33:58,239-Speed 5029.63 samples/sec Loss 3.8608 LearningRate 0.1009 Epoch: 10 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:34:06,481-Speed 4970.07 samples/sec Loss 3.8172 LearningRate 0.1009 Epoch: 10 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:34:14,670-Speed 5002.89 samples/sec Loss 3.8344 LearningRate 0.1008 Epoch: 10 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:34:22,820-Speed 5026.50 samples/sec Loss 3.8099 LearningRate 0.1008 Epoch: 10 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:34:30,958-Speed 5033.49 samples/sec Loss 3.8155 LearningRate 0.1007 Epoch: 10 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:34:39,154-Speed 4998.84 samples/sec Loss 3.7954 LearningRate 0.1007 Epoch: 10 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:34:47,258-Speed 5054.51 samples/sec Loss 3.8239 LearningRate 0.1006 Epoch: 10 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:34:55,392-Speed 5036.54 samples/sec Loss 3.8179 LearningRate 0.1005 Epoch: 10 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:35:03,628-Speed 4974.08 samples/sec Loss 3.8025 LearningRate 0.1005 Epoch: 10 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:35:11,764-Speed 5034.87 samples/sec Loss 3.8668 LearningRate 0.1004 Epoch: 10 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:35:19,883-Speed 5045.64 samples/sec Loss 3.8125 LearningRate 0.1004 Epoch: 10 Global Step: 45820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:35:28,064-Speed 5007.75 samples/sec Loss 3.8565 LearningRate 0.1003 Epoch: 10 Global Step: 45830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:35:36,131-Speed 5078.05 samples/sec Loss 3.7923 LearningRate 0.1003 Epoch: 10 Global Step: 45840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:35:44,284-Speed 5024.33 samples/sec Loss 3.7991 LearningRate 0.1002 Epoch: 10 Global Step: 45850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:35:52,511-Speed 4979.53 samples/sec Loss 3.8294 LearningRate 0.1002 Epoch: 10 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:36:00,699-Speed 5002.78 samples/sec Loss 3.8134 LearningRate 0.1001 Epoch: 10 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:36:08,998-Speed 4936.67 samples/sec Loss 3.8077 LearningRate 0.1001 Epoch: 10 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:36:17,429-Speed 4860.95 samples/sec Loss 3.7823 LearningRate 0.1000 Epoch: 10 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:37:07,870-Speed 812.07 samples/sec Loss 3.3742 LearningRate 0.1000 Epoch: 11 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:37:16,243-Speed 4892.78 samples/sec Loss 3.2451 LearningRate 0.0999 Epoch: 11 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:37:24,335-Speed 5062.36 samples/sec Loss 3.2712 LearningRate 0.0999 Epoch: 11 Global Step: 45920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:37:32,606-Speed 4953.27 samples/sec Loss 3.2719 LearningRate 0.0998 Epoch: 11 Global Step: 45930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:37:41,383-Speed 4667.30 samples/sec Loss 3.2882 LearningRate 0.0997 Epoch: 11 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:37:50,116-Speed 4690.98 samples/sec Loss 3.2389 LearningRate 0.0997 Epoch: 11 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:37:59,112-Speed 4553.47 samples/sec Loss 3.2591 LearningRate 0.0996 Epoch: 11 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:38:07,969-Speed 4625.08 samples/sec Loss 3.3212 LearningRate 0.0996 Epoch: 11 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:38:16,492-Speed 4806.28 samples/sec Loss 3.3062 LearningRate 0.0995 Epoch: 11 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:38:24,797-Speed 4932.74 samples/sec Loss 3.2871 LearningRate 0.0995 Epoch: 11 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:38:33,035-Speed 4972.86 samples/sec Loss 3.3134 LearningRate 0.0994 Epoch: 11 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:38:41,229-Speed 4999.26 samples/sec Loss 3.3826 LearningRate 0.0994 Epoch: 11 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:38:49,435-Speed 4992.70 samples/sec Loss 3.3512 LearningRate 0.0993 Epoch: 11 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:38:57,542-Speed 5053.32 samples/sec Loss 3.3675 LearningRate 0.0993 Epoch: 11 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:39:05,651-Speed 5051.83 samples/sec Loss 3.3527 LearningRate 0.0992 Epoch: 11 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:39:13,761-Speed 5050.97 samples/sec Loss 3.3931 LearningRate 0.0992 Epoch: 11 Global Step: 46050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:39:21,799-Speed 5096.56 samples/sec Loss 3.3297 LearningRate 0.0991 Epoch: 11 Global Step: 46060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:39:29,876-Speed 5072.05 samples/sec Loss 3.3483 LearningRate 0.0991 Epoch: 11 Global Step: 46070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:39:37,939-Speed 5080.77 samples/sec Loss 3.3764 LearningRate 0.0990 Epoch: 11 Global Step: 46080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:39:46,032-Speed 5061.54 samples/sec Loss 3.3973 LearningRate 0.0989 Epoch: 11 Global Step: 46090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:39:54,150-Speed 5046.64 samples/sec Loss 3.3743 LearningRate 0.0989 Epoch: 11 Global Step: 46100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:40:02,334-Speed 5005.74 samples/sec Loss 3.4469 LearningRate 0.0988 Epoch: 11 Global Step: 46110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:40:10,497-Speed 5018.24 samples/sec Loss 3.4121 LearningRate 0.0988 Epoch: 11 Global Step: 46120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:40:18,819-Speed 4922.39 samples/sec Loss 3.3824 LearningRate 0.0987 Epoch: 11 Global Step: 46130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:40:27,108-Speed 4942.18 samples/sec Loss 3.4666 LearningRate 0.0987 Epoch: 11 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:40:35,402-Speed 4939.03 samples/sec Loss 3.4233 LearningRate 0.0986 Epoch: 11 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:40:43,557-Speed 5023.14 samples/sec Loss 3.3894 LearningRate 0.0986 Epoch: 11 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:40:51,749-Speed 5000.63 samples/sec Loss 3.3970 LearningRate 0.0985 Epoch: 11 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:40:59,955-Speed 4992.23 samples/sec Loss 3.4623 LearningRate 0.0985 Epoch: 11 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:41:08,113-Speed 5021.79 samples/sec Loss 3.3567 LearningRate 0.0984 Epoch: 11 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:41:16,300-Speed 5003.87 samples/sec Loss 3.4191 LearningRate 0.0984 Epoch: 11 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:41:24,411-Speed 5050.39 samples/sec Loss 3.4622 LearningRate 0.0983 Epoch: 11 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:41:32,577-Speed 5016.74 samples/sec Loss 3.4434 LearningRate 0.0983 Epoch: 11 Global Step: 46220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:41:40,772-Speed 4998.91 samples/sec Loss 3.4564 LearningRate 0.0982 Epoch: 11 Global Step: 46230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:41:48,957-Speed 5004.56 samples/sec Loss 3.4707 LearningRate 0.0982 Epoch: 11 Global Step: 46240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:41:57,164-Speed 4991.38 samples/sec Loss 3.4951 LearningRate 0.0981 Epoch: 11 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:42:05,515-Speed 4905.55 samples/sec Loss 3.4821 LearningRate 0.0980 Epoch: 11 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:42:13,823-Speed 4930.92 samples/sec Loss 3.4882 LearningRate 0.0980 Epoch: 11 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:42:22,268-Speed 4850.65 samples/sec Loss 3.4665 LearningRate 0.0979 Epoch: 11 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:42:30,448-Speed 5008.10 samples/sec Loss 3.5155 LearningRate 0.0979 Epoch: 11 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:42:38,579-Speed 5038.24 samples/sec Loss 3.5083 LearningRate 0.0978 Epoch: 11 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:42:46,643-Speed 5079.93 samples/sec Loss 3.4785 LearningRate 0.0978 Epoch: 11 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:42:54,881-Speed 4972.46 samples/sec Loss 3.5161 LearningRate 0.0977 Epoch: 11 Global Step: 46320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:43:03,442-Speed 4784.97 samples/sec Loss 3.5096 LearningRate 0.0977 Epoch: 11 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:43:11,561-Speed 5045.68 samples/sec Loss 3.4871 LearningRate 0.0976 Epoch: 11 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:43:19,811-Speed 4965.85 samples/sec Loss 3.4678 LearningRate 0.0976 Epoch: 11 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:43:27,961-Speed 5026.14 samples/sec Loss 3.4886 LearningRate 0.0975 Epoch: 11 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:43:36,250-Speed 4942.09 samples/sec Loss 3.4959 LearningRate 0.0975 Epoch: 11 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:43:44,486-Speed 4974.16 samples/sec Loss 3.5530 LearningRate 0.0974 Epoch: 11 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:43:52,676-Speed 5001.89 samples/sec Loss 3.5331 LearningRate 0.0974 Epoch: 11 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:00,874-Speed 4997.23 samples/sec Loss 3.5569 LearningRate 0.0973 Epoch: 11 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:08,951-Speed 5071.91 samples/sec Loss 3.5691 LearningRate 0.0973 Epoch: 11 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:17,051-Speed 5057.58 samples/sec Loss 3.5343 LearningRate 0.0972 Epoch: 11 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:25,190-Speed 5032.48 samples/sec Loss 3.5619 LearningRate 0.0972 Epoch: 11 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:33,473-Speed 4945.84 samples/sec Loss 3.5041 LearningRate 0.0971 Epoch: 11 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:41,721-Speed 4966.74 samples/sec Loss 3.5542 LearningRate 0.0970 Epoch: 11 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:49,793-Speed 5075.08 samples/sec Loss 3.5563 LearningRate 0.0970 Epoch: 11 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:44:57,856-Speed 5080.47 samples/sec Loss 3.5874 LearningRate 0.0969 Epoch: 11 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:45:05,917-Speed 5082.16 samples/sec Loss 3.5869 LearningRate 0.0969 Epoch: 11 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:45:13,987-Speed 5075.90 samples/sec Loss 3.5741 LearningRate 0.0968 Epoch: 11 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:45:22,069-Speed 5068.98 samples/sec Loss 3.5870 LearningRate 0.0968 Epoch: 11 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:45:30,336-Speed 4954.85 samples/sec Loss 3.5979 LearningRate 0.0967 Epoch: 11 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:45:38,438-Speed 5056.96 samples/sec Loss 3.5475 LearningRate 0.0967 Epoch: 11 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:45:46,495-Speed 5084.03 samples/sec Loss 3.5744 LearningRate 0.0966 Epoch: 11 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:45:54,772-Speed 4949.02 samples/sec Loss 3.5728 LearningRate 0.0966 Epoch: 11 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:46:03,192-Speed 4865.41 samples/sec Loss 3.5677 LearningRate 0.0965 Epoch: 11 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:46:11,305-Speed 5049.31 samples/sec Loss 3.6131 LearningRate 0.0965 Epoch: 11 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:46:19,681-Speed 4890.82 samples/sec Loss 3.6087 LearningRate 0.0964 Epoch: 11 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:46:27,772-Speed 5063.48 samples/sec Loss 3.5659 LearningRate 0.0964 Epoch: 11 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:46:36,053-Speed 4946.19 samples/sec Loss 3.6103 LearningRate 0.0963 Epoch: 11 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:46:44,282-Speed 4978.90 samples/sec Loss 3.6216 LearningRate 0.0963 Epoch: 11 Global Step: 46600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:46:52,480-Speed 4996.63 samples/sec Loss 3.5925 LearningRate 0.0962 Epoch: 11 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:00,585-Speed 5054.83 samples/sec Loss 3.6256 LearningRate 0.0962 Epoch: 11 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:08,693-Speed 5052.06 samples/sec Loss 3.5560 LearningRate 0.0961 Epoch: 11 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:16,768-Speed 5073.09 samples/sec Loss 3.5600 LearningRate 0.0961 Epoch: 11 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:24,846-Speed 5071.46 samples/sec Loss 3.5947 LearningRate 0.0960 Epoch: 11 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:32,925-Speed 5070.44 samples/sec Loss 3.5948 LearningRate 0.0960 Epoch: 11 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:41,127-Speed 4994.65 samples/sec Loss 3.6417 LearningRate 0.0959 Epoch: 11 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:49,206-Speed 5070.88 samples/sec Loss 3.6636 LearningRate 0.0958 Epoch: 11 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:47:57,330-Speed 5042.30 samples/sec Loss 3.6546 LearningRate 0.0958 Epoch: 11 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:48:05,580-Speed 4965.12 samples/sec Loss 3.6121 LearningRate 0.0957 Epoch: 11 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:48:13,741-Speed 5019.71 samples/sec Loss 3.5879 LearningRate 0.0957 Epoch: 11 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:48:21,904-Speed 5018.86 samples/sec Loss 3.5900 LearningRate 0.0956 Epoch: 11 Global Step: 46720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:48:30,048-Speed 5030.15 samples/sec Loss 3.6232 LearningRate 0.0956 Epoch: 11 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:48:38,184-Speed 5035.14 samples/sec Loss 3.5950 LearningRate 0.0955 Epoch: 11 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:48:46,346-Speed 5018.32 samples/sec Loss 3.6350 LearningRate 0.0955 Epoch: 11 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:48:54,486-Speed 5032.80 samples/sec Loss 3.6222 LearningRate 0.0954 Epoch: 11 Global Step: 46760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:49:02,586-Speed 5057.70 samples/sec Loss 3.5845 LearningRate 0.0954 Epoch: 11 Global Step: 46770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:49:10,768-Speed 5006.59 samples/sec Loss 3.6078 LearningRate 0.0953 Epoch: 11 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:49:18,876-Speed 5052.50 samples/sec Loss 3.6337 LearningRate 0.0953 Epoch: 11 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:49:26,967-Speed 5063.63 samples/sec Loss 3.5882 LearningRate 0.0952 Epoch: 11 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:49:35,075-Speed 5052.11 samples/sec Loss 3.6116 LearningRate 0.0952 Epoch: 11 Global Step: 46810 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 08:49:43,215-Speed 5032.73 samples/sec Loss 3.6154 LearningRate 0.0951 Epoch: 11 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:49:51,362-Speed 5027.99 samples/sec Loss 3.6262 LearningRate 0.0951 Epoch: 11 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:49:59,504-Speed 5031.64 samples/sec Loss 3.6286 LearningRate 0.0950 Epoch: 11 Global Step: 46840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:50:07,643-Speed 5032.81 samples/sec Loss 3.6378 LearningRate 0.0950 Epoch: 11 Global Step: 46850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:50:15,706-Speed 5080.68 samples/sec Loss 3.5520 LearningRate 0.0949 Epoch: 11 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:50:23,757-Speed 5088.77 samples/sec Loss 3.5987 LearningRate 0.0949 Epoch: 11 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:50:31,847-Speed 5063.30 samples/sec Loss 3.6636 LearningRate 0.0948 Epoch: 11 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:50:39,940-Speed 5062.06 samples/sec Loss 3.6566 LearningRate 0.0948 Epoch: 11 Global Step: 46890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:50:48,092-Speed 5025.19 samples/sec Loss 3.5997 LearningRate 0.0947 Epoch: 11 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:50:56,220-Speed 5040.39 samples/sec Loss 3.5785 LearningRate 0.0947 Epoch: 11 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:51:04,492-Speed 4952.08 samples/sec Loss 3.5595 LearningRate 0.0946 Epoch: 11 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:51:12,623-Speed 5038.26 samples/sec Loss 3.6210 LearningRate 0.0945 Epoch: 11 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:51:20,770-Speed 5028.18 samples/sec Loss 3.6156 LearningRate 0.0945 Epoch: 11 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:51:28,830-Speed 5082.31 samples/sec Loss 3.6096 LearningRate 0.0944 Epoch: 11 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:51:37,120-Speed 4941.99 samples/sec Loss 3.6056 LearningRate 0.0944 Epoch: 11 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:51:45,341-Speed 4983.14 samples/sec Loss 3.6501 LearningRate 0.0943 Epoch: 11 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:51:53,391-Speed 5088.48 samples/sec Loss 3.6375 LearningRate 0.0943 Epoch: 11 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:52:01,417-Speed 5104.28 samples/sec Loss 3.6140 LearningRate 0.0942 Epoch: 11 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:52:09,529-Speed 5049.75 samples/sec Loss 3.6266 LearningRate 0.0942 Epoch: 11 Global Step: 47000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:52:17,605-Speed 5072.62 samples/sec Loss 3.6111 LearningRate 0.0941 Epoch: 11 Global Step: 47010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:52:25,761-Speed 5022.67 samples/sec Loss 3.6480 LearningRate 0.0941 Epoch: 11 Global Step: 47020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:52:33,846-Speed 5067.23 samples/sec Loss 3.6454 LearningRate 0.0940 Epoch: 11 Global Step: 47030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:52:41,954-Speed 5052.06 samples/sec Loss 3.6492 LearningRate 0.0940 Epoch: 11 Global Step: 47040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:52:50,044-Speed 5063.89 samples/sec Loss 3.6229 LearningRate 0.0939 Epoch: 11 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:52:58,103-Speed 5082.98 samples/sec Loss 3.6494 LearningRate 0.0939 Epoch: 11 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:53:06,151-Speed 5090.13 samples/sec Loss 3.6817 LearningRate 0.0938 Epoch: 11 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:53:14,233-Speed 5068.86 samples/sec Loss 3.6382 LearningRate 0.0938 Epoch: 11 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:53:22,248-Speed 5111.00 samples/sec Loss 3.6223 LearningRate 0.0937 Epoch: 11 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:53:30,418-Speed 5014.18 samples/sec Loss 3.5974 LearningRate 0.0937 Epoch: 11 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:53:38,640-Speed 4982.47 samples/sec Loss 3.5728 LearningRate 0.0936 Epoch: 11 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:53:46,732-Speed 5062.19 samples/sec Loss 3.6094 LearningRate 0.0936 Epoch: 11 Global Step: 47120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:53:54,944-Speed 4988.69 samples/sec Loss 3.6204 LearningRate 0.0935 Epoch: 11 Global Step: 47130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:54:03,133-Speed 5002.87 samples/sec Loss 3.6043 LearningRate 0.0935 Epoch: 11 Global Step: 47140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:54:11,272-Speed 5033.09 samples/sec Loss 3.6442 LearningRate 0.0934 Epoch: 11 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:54:19,377-Speed 5054.33 samples/sec Loss 3.5986 LearningRate 0.0934 Epoch: 11 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:54:27,514-Speed 5034.70 samples/sec Loss 3.5716 LearningRate 0.0933 Epoch: 11 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:54:35,856-Speed 4910.18 samples/sec Loss 3.6319 LearningRate 0.0933 Epoch: 11 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:54:44,075-Speed 4984.80 samples/sec Loss 3.6468 LearningRate 0.0932 Epoch: 11 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:54:52,269-Speed 4999.30 samples/sec Loss 3.6469 LearningRate 0.0932 Epoch: 11 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:55:00,464-Speed 4998.80 samples/sec Loss 3.6337 LearningRate 0.0931 Epoch: 11 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:55:08,615-Speed 5026.31 samples/sec Loss 3.6296 LearningRate 0.0931 Epoch: 11 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:55:16,720-Speed 5054.18 samples/sec Loss 3.6528 LearningRate 0.0930 Epoch: 11 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:55:24,894-Speed 5011.20 samples/sec Loss 3.6668 LearningRate 0.0929 Epoch: 11 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:55:33,054-Speed 5020.41 samples/sec Loss 3.6723 LearningRate 0.0929 Epoch: 11 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:55:41,200-Speed 5028.73 samples/sec Loss 3.6495 LearningRate 0.0928 Epoch: 11 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:55:49,275-Speed 5073.86 samples/sec Loss 3.5929 LearningRate 0.0928 Epoch: 11 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:55:57,373-Speed 5058.37 samples/sec Loss 3.6257 LearningRate 0.0927 Epoch: 11 Global Step: 47280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:56:05,555-Speed 5006.63 samples/sec Loss 3.6436 LearningRate 0.0927 Epoch: 11 Global Step: 47290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:56:13,657-Speed 5055.96 samples/sec Loss 3.6406 LearningRate 0.0926 Epoch: 11 Global Step: 47300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:56:21,819-Speed 5019.14 samples/sec Loss 3.6632 LearningRate 0.0926 Epoch: 11 Global Step: 47310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:56:30,184-Speed 4897.00 samples/sec Loss 3.6278 LearningRate 0.0925 Epoch: 11 Global Step: 47320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:56:38,494-Speed 4929.98 samples/sec Loss 3.5971 LearningRate 0.0925 Epoch: 11 Global Step: 47330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:56:46,653-Speed 5021.12 samples/sec Loss 3.6039 LearningRate 0.0924 Epoch: 11 Global Step: 47340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:56:54,805-Speed 5024.62 samples/sec Loss 3.6466 LearningRate 0.0924 Epoch: 11 Global Step: 47350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:57:03,077-Speed 4952.53 samples/sec Loss 3.6438 LearningRate 0.0923 Epoch: 11 Global Step: 47360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:57:11,330-Speed 4963.96 samples/sec Loss 3.6421 LearningRate 0.0923 Epoch: 11 Global Step: 47370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-17 08:57:19,631-Speed 4935.21 samples/sec Loss 3.6755 LearningRate 0.0922 Epoch: 11 Global Step: 47380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:57:27,691-Speed 5082.51 samples/sec Loss 3.6329 LearningRate 0.0922 Epoch: 11 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:57:35,780-Speed 5064.24 samples/sec Loss 3.5725 LearningRate 0.0921 Epoch: 11 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:57:43,848-Speed 5077.11 samples/sec Loss 3.5383 LearningRate 0.0921 Epoch: 11 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:57:52,032-Speed 5006.03 samples/sec Loss 3.6404 LearningRate 0.0920 Epoch: 11 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:58:00,330-Speed 4936.60 samples/sec Loss 3.6075 LearningRate 0.0920 Epoch: 11 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:58:08,432-Speed 5056.19 samples/sec Loss 3.6273 LearningRate 0.0919 Epoch: 11 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:58:16,490-Speed 5083.54 samples/sec Loss 3.6390 LearningRate 0.0919 Epoch: 11 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:58:24,676-Speed 5004.46 samples/sec Loss 3.5853 LearningRate 0.0918 Epoch: 11 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:58:32,776-Speed 5057.16 samples/sec Loss 3.6611 LearningRate 0.0918 Epoch: 11 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:58:40,930-Speed 5023.99 samples/sec Loss 3.6373 LearningRate 0.0917 Epoch: 11 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 08:58:49,108-Speed 5009.66 samples/sec Loss 3.6028 LearningRate 0.0917 Epoch: 11 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:58:57,269-Speed 5019.94 samples/sec Loss 3.6664 LearningRate 0.0916 Epoch: 11 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:59:05,437-Speed 5014.75 samples/sec Loss 3.6272 LearningRate 0.0916 Epoch: 11 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:59:13,562-Speed 5042.38 samples/sec Loss 3.6044 LearningRate 0.0915 Epoch: 11 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:59:21,742-Speed 5007.42 samples/sec Loss 3.6165 LearningRate 0.0915 Epoch: 11 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:59:29,834-Speed 5062.92 samples/sec Loss 3.6121 LearningRate 0.0914 Epoch: 11 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:59:37,899-Speed 5078.92 samples/sec Loss 3.6637 LearningRate 0.0914 Epoch: 11 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:59:45,963-Speed 5080.14 samples/sec Loss 3.6174 LearningRate 0.0913 Epoch: 11 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 08:59:54,232-Speed 4954.46 samples/sec Loss 3.6488 LearningRate 0.0913 Epoch: 11 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:00:02,347-Speed 5048.39 samples/sec Loss 3.6230 LearningRate 0.0912 Epoch: 11 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:00:10,536-Speed 5002.47 samples/sec Loss 3.6271 LearningRate 0.0912 Epoch: 11 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:00:18,657-Speed 5044.35 samples/sec Loss 3.5899 LearningRate 0.0911 Epoch: 11 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:00:26,812-Speed 5023.66 samples/sec Loss 3.5741 LearningRate 0.0911 Epoch: 11 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:00:34,884-Speed 5074.95 samples/sec Loss 3.5511 LearningRate 0.0910 Epoch: 11 Global Step: 47620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:00:43,053-Speed 5014.45 samples/sec Loss 3.5957 LearningRate 0.0910 Epoch: 11 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:00:51,301-Speed 4966.72 samples/sec Loss 3.5687 LearningRate 0.0909 Epoch: 11 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:00:59,559-Speed 4960.86 samples/sec Loss 3.5773 LearningRate 0.0909 Epoch: 11 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:01:07,757-Speed 4997.29 samples/sec Loss 3.6025 LearningRate 0.0908 Epoch: 11 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:01:15,816-Speed 5083.00 samples/sec Loss 3.5943 LearningRate 0.0908 Epoch: 11 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:01:23,972-Speed 5023.15 samples/sec Loss 3.6127 LearningRate 0.0907 Epoch: 11 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:01:32,113-Speed 5031.78 samples/sec Loss 3.5995 LearningRate 0.0907 Epoch: 11 Global Step: 47690 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 09:01:40,228-Speed 5047.95 samples/sec Loss 3.5810 LearningRate 0.0906 Epoch: 11 Global Step: 47700 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-17 09:01:48,388-Speed 5020.42 samples/sec Loss 3.6145 LearningRate 0.0906 Epoch: 11 Global Step: 47710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:01:56,457-Speed 5076.92 samples/sec Loss 3.5781 LearningRate 0.0905 Epoch: 11 Global Step: 47720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:02:04,625-Speed 5015.15 samples/sec Loss 3.6123 LearningRate 0.0904 Epoch: 11 Global Step: 47730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:02:12,720-Speed 5060.61 samples/sec Loss 3.6242 LearningRate 0.0904 Epoch: 11 Global Step: 47740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:02:20,836-Speed 5047.15 samples/sec Loss 3.6130 LearningRate 0.0903 Epoch: 11 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:02:29,052-Speed 4986.59 samples/sec Loss 3.5594 LearningRate 0.0903 Epoch: 11 Global Step: 47760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-17 09:02:37,158-Speed 5053.85 samples/sec Loss 3.5900 LearningRate 0.0902 Epoch: 11 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:02:45,265-Speed 5052.93 samples/sec Loss 3.5928 LearningRate 0.0902 Epoch: 11 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:02:53,377-Speed 5050.16 samples/sec Loss 3.5900 LearningRate 0.0901 Epoch: 11 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:01,489-Speed 5049.57 samples/sec Loss 3.5303 LearningRate 0.0901 Epoch: 11 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:09,695-Speed 4992.19 samples/sec Loss 3.5784 LearningRate 0.0900 Epoch: 11 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:17,812-Speed 5046.69 samples/sec Loss 3.5729 LearningRate 0.0900 Epoch: 11 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:26,376-Speed 4783.62 samples/sec Loss 3.5875 LearningRate 0.0899 Epoch: 11 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:34,501-Speed 5042.27 samples/sec Loss 3.6052 LearningRate 0.0899 Epoch: 11 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:42,681-Speed 5007.97 samples/sec Loss 3.6054 LearningRate 0.0898 Epoch: 11 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:50,916-Speed 4974.18 samples/sec Loss 3.5892 LearningRate 0.0898 Epoch: 11 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:03:59,270-Speed 4904.14 samples/sec Loss 3.6129 LearningRate 0.0897 Epoch: 11 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:04:07,643-Speed 4892.72 samples/sec Loss 3.6238 LearningRate 0.0897 Epoch: 11 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-17 09:04:15,934-Speed 4940.47 samples/sec Loss 3.6169 LearningRate 0.0896 Epoch: 11 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:04:24,441-Speed 4815.96 samples/sec Loss 3.5513 LearningRate 0.0896 Epoch: 11 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:04:32,768-Speed 4919.69 samples/sec Loss 3.5663 LearningRate 0.0895 Epoch: 11 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:04:41,081-Speed 4928.04 samples/sec Loss 3.5777 LearningRate 0.0895 Epoch: 11 Global Step: 47920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:04:49,473-Speed 4881.37 samples/sec Loss 3.5393 LearningRate 0.0894 Epoch: 11 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:04:57,747-Speed 4950.59 samples/sec Loss 3.5720 LearningRate 0.0894 Epoch: 11 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:05:06,115-Speed 4895.37 samples/sec Loss 3.5344 LearningRate 0.0893 Epoch: 11 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:05:14,292-Speed 5010.26 samples/sec Loss 3.5938 LearningRate 0.0893 Epoch: 11 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:05:22,634-Speed 4910.56 samples/sec Loss 3.5234 LearningRate 0.0892 Epoch: 11 Global Step: 47970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:05:30,782-Speed 5027.85 samples/sec Loss 3.5673 LearningRate 0.0892 Epoch: 11 Global Step: 47980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:05:38,857-Speed 5073.08 samples/sec Loss 3.5729 LearningRate 0.0891 Epoch: 11 Global Step: 47990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:05:46,931-Speed 5073.40 samples/sec Loss 3.6000 LearningRate 0.0891 Epoch: 11 Global Step: 48000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:05:55,035-Speed 5055.49 samples/sec Loss 3.5624 LearningRate 0.0890 Epoch: 11 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:06:03,127-Speed 5062.68 samples/sec Loss 3.5688 LearningRate 0.0890 Epoch: 11 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:06:11,406-Speed 4947.70 samples/sec Loss 3.6148 LearningRate 0.0889 Epoch: 11 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:06:19,674-Speed 4954.82 samples/sec Loss 3.5744 LearningRate 0.0889 Epoch: 11 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:06:27,816-Speed 5031.14 samples/sec Loss 3.5933 LearningRate 0.0888 Epoch: 11 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:06:35,919-Speed 5055.98 samples/sec Loss 3.5626 LearningRate 0.0888 Epoch: 11 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:06:44,090-Speed 5013.33 samples/sec Loss 3.5620 LearningRate 0.0887 Epoch: 11 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:06:52,226-Speed 5035.44 samples/sec Loss 3.5856 LearningRate 0.0887 Epoch: 11 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:07:00,611-Speed 4885.50 samples/sec Loss 3.5942 LearningRate 0.0886 Epoch: 11 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:07:08,916-Speed 4932.68 samples/sec Loss 3.5604 LearningRate 0.0886 Epoch: 11 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:07:17,230-Speed 4927.40 samples/sec Loss 3.5351 LearningRate 0.0885 Epoch: 11 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:07:25,539-Speed 4929.76 samples/sec Loss 3.5308 LearningRate 0.0885 Epoch: 11 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:07:33,590-Speed 5088.22 samples/sec Loss 3.5787 LearningRate 0.0884 Epoch: 11 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:07:42,084-Speed 4823.27 samples/sec Loss 3.5343 LearningRate 0.0884 Epoch: 11 Global Step: 48140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:07:50,423-Speed 4912.48 samples/sec Loss 3.5677 LearningRate 0.0883 Epoch: 11 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:07:58,535-Speed 5049.62 samples/sec Loss 3.5727 LearningRate 0.0883 Epoch: 11 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:08:06,910-Speed 4891.48 samples/sec Loss 3.5667 LearningRate 0.0882 Epoch: 11 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:08:15,185-Speed 4950.39 samples/sec Loss 3.5334 LearningRate 0.0882 Epoch: 11 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:08:23,342-Speed 5021.85 samples/sec Loss 3.5241 LearningRate 0.0881 Epoch: 11 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:08:31,431-Speed 5064.46 samples/sec Loss 3.5482 LearningRate 0.0881 Epoch: 11 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:08:39,662-Speed 4976.49 samples/sec Loss 3.5724 LearningRate 0.0880 Epoch: 11 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:08:47,907-Speed 4969.21 samples/sec Loss 3.5569 LearningRate 0.0880 Epoch: 11 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:08:56,045-Speed 5033.52 samples/sec Loss 3.5950 LearningRate 0.0879 Epoch: 11 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:09:04,318-Speed 4951.75 samples/sec Loss 3.5041 LearningRate 0.0879 Epoch: 11 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:09:12,495-Speed 5009.86 samples/sec Loss 3.5203 LearningRate 0.0878 Epoch: 11 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:09:20,587-Speed 5062.88 samples/sec Loss 3.5244 LearningRate 0.0878 Epoch: 11 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:09:28,690-Speed 5055.18 samples/sec Loss 3.5219 LearningRate 0.0877 Epoch: 11 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:09:37,260-Speed 4780.46 samples/sec Loss 3.5363 LearningRate 0.0877 Epoch: 11 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:09:46,048-Speed 4660.95 samples/sec Loss 3.5304 LearningRate 0.0876 Epoch: 11 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:09:54,813-Speed 4674.29 samples/sec Loss 3.4931 LearningRate 0.0876 Epoch: 11 Global Step: 48300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:10:03,052-Speed 4971.45 samples/sec Loss 3.4745 LearningRate 0.0875 Epoch: 11 Global Step: 48310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:10:11,270-Speed 4984.66 samples/sec Loss 3.5049 LearningRate 0.0875 Epoch: 11 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:10:19,496-Speed 4980.46 samples/sec Loss 3.5587 LearningRate 0.0874 Epoch: 11 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:10:27,670-Speed 5011.31 samples/sec Loss 3.4998 LearningRate 0.0874 Epoch: 11 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:10:35,796-Speed 5041.57 samples/sec Loss 3.5009 LearningRate 0.0873 Epoch: 11 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:10:43,911-Speed 5048.20 samples/sec Loss 3.5414 LearningRate 0.0873 Epoch: 11 Global Step: 48360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:10:52,197-Speed 4943.65 samples/sec Loss 3.5220 LearningRate 0.0872 Epoch: 11 Global Step: 48370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:00,349-Speed 5025.29 samples/sec Loss 3.5386 LearningRate 0.0872 Epoch: 11 Global Step: 48380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:08,401-Speed 5087.58 samples/sec Loss 3.5619 LearningRate 0.0871 Epoch: 11 Global Step: 48390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:16,531-Speed 5039.31 samples/sec Loss 3.5133 LearningRate 0.0871 Epoch: 11 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:24,685-Speed 5023.73 samples/sec Loss 3.5237 LearningRate 0.0870 Epoch: 11 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:32,710-Speed 5105.39 samples/sec Loss 3.5255 LearningRate 0.0870 Epoch: 11 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:40,848-Speed 5033.88 samples/sec Loss 3.5010 LearningRate 0.0869 Epoch: 11 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:49,197-Speed 4906.51 samples/sec Loss 3.5171 LearningRate 0.0869 Epoch: 11 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:11:57,340-Speed 5030.78 samples/sec Loss 3.5166 LearningRate 0.0868 Epoch: 11 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:12:05,493-Speed 5024.50 samples/sec Loss 3.4762 LearningRate 0.0868 Epoch: 11 Global Step: 48460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:12:13,616-Speed 5043.28 samples/sec Loss 3.4995 LearningRate 0.0867 Epoch: 11 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:12:21,670-Speed 5086.76 samples/sec Loss 3.5469 LearningRate 0.0867 Epoch: 11 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:12:29,868-Speed 4996.90 samples/sec Loss 3.5173 LearningRate 0.0866 Epoch: 11 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:12:37,951-Speed 5068.02 samples/sec Loss 3.4970 LearningRate 0.0866 Epoch: 11 Global Step: 48500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:12:46,022-Speed 5075.55 samples/sec Loss 3.4994 LearningRate 0.0865 Epoch: 11 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:12:54,103-Speed 5069.06 samples/sec Loss 3.4930 LearningRate 0.0865 Epoch: 11 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:13:02,212-Speed 5052.37 samples/sec Loss 3.4183 LearningRate 0.0864 Epoch: 11 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:13:10,322-Speed 5051.57 samples/sec Loss 3.5057 LearningRate 0.0864 Epoch: 11 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:13:18,665-Speed 4909.73 samples/sec Loss 3.4836 LearningRate 0.0863 Epoch: 11 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:13:26,786-Speed 5044.14 samples/sec Loss 3.4932 LearningRate 0.0863 Epoch: 11 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:13:34,947-Speed 5019.88 samples/sec Loss 3.4935 LearningRate 0.0862 Epoch: 11 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:13:43,167-Speed 4983.49 samples/sec Loss 3.5154 LearningRate 0.0862 Epoch: 11 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:13:51,236-Speed 5076.80 samples/sec Loss 3.4697 LearningRate 0.0861 Epoch: 11 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:13:59,433-Speed 4998.14 samples/sec Loss 3.5306 LearningRate 0.0861 Epoch: 11 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:14:07,807-Speed 4892.04 samples/sec Loss 3.5183 LearningRate 0.0860 Epoch: 11 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:14:16,380-Speed 4777.81 samples/sec Loss 3.4537 LearningRate 0.0860 Epoch: 11 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:14:24,646-Speed 4956.13 samples/sec Loss 3.5033 LearningRate 0.0859 Epoch: 11 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:14:32,757-Speed 5050.37 samples/sec Loss 3.5208 LearningRate 0.0859 Epoch: 11 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:14:40,864-Speed 5053.32 samples/sec Loss 3.5309 LearningRate 0.0858 Epoch: 11 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:14:48,949-Speed 5066.57 samples/sec Loss 3.4575 LearningRate 0.0858 Epoch: 11 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:14:57,063-Speed 5048.85 samples/sec Loss 3.5236 LearningRate 0.0858 Epoch: 11 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:15:05,551-Speed 4826.35 samples/sec Loss 3.5025 LearningRate 0.0857 Epoch: 11 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:15:14,313-Speed 4675.03 samples/sec Loss 3.4990 LearningRate 0.0857 Epoch: 11 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:15:23,030-Speed 4699.83 samples/sec Loss 3.4569 LearningRate 0.0856 Epoch: 11 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:15:31,672-Speed 4740.02 samples/sec Loss 3.5116 LearningRate 0.0856 Epoch: 11 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:15:40,515-Speed 4632.72 samples/sec Loss 3.5060 LearningRate 0.0855 Epoch: 11 Global Step: 48720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:15:49,266-Speed 4681.18 samples/sec Loss 3.5107 LearningRate 0.0855 Epoch: 11 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:15:58,047-Speed 4665.37 samples/sec Loss 3.4889 LearningRate 0.0854 Epoch: 11 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:16:06,332-Speed 4945.20 samples/sec Loss 3.5188 LearningRate 0.0854 Epoch: 11 Global Step: 48750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:16:14,447-Speed 5047.62 samples/sec Loss 3.4717 LearningRate 0.0853 Epoch: 11 Global Step: 48760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:16:22,567-Speed 5044.76 samples/sec Loss 3.4675 LearningRate 0.0853 Epoch: 11 Global Step: 48770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:16:30,631-Speed 5080.22 samples/sec Loss 3.4309 LearningRate 0.0852 Epoch: 11 Global Step: 48780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:16:38,914-Speed 4945.85 samples/sec Loss 3.5039 LearningRate 0.0852 Epoch: 11 Global Step: 48790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:16:47,049-Speed 5035.27 samples/sec Loss 3.4566 LearningRate 0.0851 Epoch: 11 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:16:55,160-Speed 5051.12 samples/sec Loss 3.4717 LearningRate 0.0851 Epoch: 11 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:17:03,336-Speed 5010.00 samples/sec Loss 3.4649 LearningRate 0.0850 Epoch: 11 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:17:11,450-Speed 5048.90 samples/sec Loss 3.4501 LearningRate 0.0850 Epoch: 11 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:17:19,587-Speed 5034.49 samples/sec Loss 3.4675 LearningRate 0.0849 Epoch: 11 Global Step: 48840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:17:27,717-Speed 5038.84 samples/sec Loss 3.5057 LearningRate 0.0849 Epoch: 11 Global Step: 48850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:17:35,897-Speed 5007.98 samples/sec Loss 3.4746 LearningRate 0.0848 Epoch: 11 Global Step: 48860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:17:43,990-Speed 5061.67 samples/sec Loss 3.4625 LearningRate 0.0848 Epoch: 11 Global Step: 48870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:17:52,084-Speed 5061.17 samples/sec Loss 3.5130 LearningRate 0.0847 Epoch: 11 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:00,159-Speed 5073.03 samples/sec Loss 3.4531 LearningRate 0.0847 Epoch: 11 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:08,354-Speed 4998.73 samples/sec Loss 3.5092 LearningRate 0.0846 Epoch: 11 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:16,513-Speed 5021.01 samples/sec Loss 3.4713 LearningRate 0.0846 Epoch: 11 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:24,622-Speed 5051.75 samples/sec Loss 3.4524 LearningRate 0.0845 Epoch: 11 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:32,885-Speed 4957.73 samples/sec Loss 3.4713 LearningRate 0.0845 Epoch: 11 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:41,111-Speed 4980.24 samples/sec Loss 3.4405 LearningRate 0.0844 Epoch: 11 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:49,234-Speed 5042.65 samples/sec Loss 3.4107 LearningRate 0.0844 Epoch: 11 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:18:57,311-Speed 5071.86 samples/sec Loss 3.4155 LearningRate 0.0843 Epoch: 11 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:19:05,420-Speed 5052.41 samples/sec Loss 3.4302 LearningRate 0.0843 Epoch: 11 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:19:13,447-Speed 5103.05 samples/sec Loss 3.4390 LearningRate 0.0842 Epoch: 11 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:19:21,634-Speed 5003.76 samples/sec Loss 3.4748 LearningRate 0.0842 Epoch: 11 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:19:29,773-Speed 5033.32 samples/sec Loss 3.4956 LearningRate 0.0841 Epoch: 11 Global Step: 49000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:19:37,837-Speed 5079.53 samples/sec Loss 3.4738 LearningRate 0.0841 Epoch: 11 Global Step: 49010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:19:45,921-Speed 5068.61 samples/sec Loss 3.4960 LearningRate 0.0840 Epoch: 11 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:19:54,170-Speed 4965.34 samples/sec Loss 3.4484 LearningRate 0.0840 Epoch: 11 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:02,371-Speed 4995.21 samples/sec Loss 3.4522 LearningRate 0.0839 Epoch: 11 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:10,503-Speed 5037.89 samples/sec Loss 3.4703 LearningRate 0.0839 Epoch: 11 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:18,705-Speed 4994.34 samples/sec Loss 3.4872 LearningRate 0.0838 Epoch: 11 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:26,941-Speed 4974.05 samples/sec Loss 3.4515 LearningRate 0.0838 Epoch: 11 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:35,048-Speed 5052.96 samples/sec Loss 3.4341 LearningRate 0.0837 Epoch: 11 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:43,198-Speed 5026.92 samples/sec Loss 3.4591 LearningRate 0.0837 Epoch: 11 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:51,297-Speed 5057.98 samples/sec Loss 3.4527 LearningRate 0.0836 Epoch: 11 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:20:59,402-Speed 5053.85 samples/sec Loss 3.5135 LearningRate 0.0836 Epoch: 11 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:21:07,483-Speed 5069.67 samples/sec Loss 3.4381 LearningRate 0.0835 Epoch: 11 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:21:15,728-Speed 4968.48 samples/sec Loss 3.3877 LearningRate 0.0835 Epoch: 11 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:21:23,802-Speed 5073.30 samples/sec Loss 3.3738 LearningRate 0.0834 Epoch: 11 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:21:31,887-Speed 5067.49 samples/sec Loss 3.4562 LearningRate 0.0834 Epoch: 11 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:21:40,103-Speed 4986.34 samples/sec Loss 3.4652 LearningRate 0.0834 Epoch: 11 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:21:48,180-Speed 5071.55 samples/sec Loss 3.4036 LearningRate 0.0833 Epoch: 11 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:21:56,263-Speed 5068.05 samples/sec Loss 3.4047 LearningRate 0.0833 Epoch: 11 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:22:04,324-Speed 5081.61 samples/sec Loss 3.4389 LearningRate 0.0832 Epoch: 11 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:22:12,414-Speed 5063.77 samples/sec Loss 3.4033 LearningRate 0.0832 Epoch: 11 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:22:20,652-Speed 4972.96 samples/sec Loss 3.4318 LearningRate 0.0831 Epoch: 11 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:22:28,828-Speed 5010.40 samples/sec Loss 3.4431 LearningRate 0.0831 Epoch: 11 Global Step: 49220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:22:37,080-Speed 4964.35 samples/sec Loss 3.4541 LearningRate 0.0830 Epoch: 11 Global Step: 49230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:22:45,213-Speed 5037.25 samples/sec Loss 3.4729 LearningRate 0.0830 Epoch: 11 Global Step: 49240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:22:53,343-Speed 5038.81 samples/sec Loss 3.4237 LearningRate 0.0829 Epoch: 11 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:01,494-Speed 5025.95 samples/sec Loss 3.4989 LearningRate 0.0829 Epoch: 11 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:09,661-Speed 5015.97 samples/sec Loss 3.4589 LearningRate 0.0828 Epoch: 11 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:17,912-Speed 4965.16 samples/sec Loss 3.4058 LearningRate 0.0828 Epoch: 11 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:26,115-Speed 4993.45 samples/sec Loss 3.4173 LearningRate 0.0827 Epoch: 11 Global Step: 49290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:34,283-Speed 5015.86 samples/sec Loss 3.4301 LearningRate 0.0827 Epoch: 11 Global Step: 49300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:42,444-Speed 5019.45 samples/sec Loss 3.4298 LearningRate 0.0826 Epoch: 11 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:50,550-Speed 5053.30 samples/sec Loss 3.4624 LearningRate 0.0826 Epoch: 11 Global Step: 49320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:23:58,717-Speed 5016.30 samples/sec Loss 3.4597 LearningRate 0.0825 Epoch: 11 Global Step: 49330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:24:06,943-Speed 4980.25 samples/sec Loss 3.4044 LearningRate 0.0825 Epoch: 11 Global Step: 49340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:24:15,104-Speed 5019.63 samples/sec Loss 3.4502 LearningRate 0.0824 Epoch: 11 Global Step: 49350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:24:23,216-Speed 5050.21 samples/sec Loss 3.4564 LearningRate 0.0824 Epoch: 11 Global Step: 49360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:24:31,272-Speed 5084.80 samples/sec Loss 3.3884 LearningRate 0.0823 Epoch: 11 Global Step: 49370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:24:39,564-Speed 4940.06 samples/sec Loss 3.4306 LearningRate 0.0823 Epoch: 11 Global Step: 49380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:24:47,574-Speed 5114.46 samples/sec Loss 3.4103 LearningRate 0.0822 Epoch: 11 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:24:55,733-Speed 5021.03 samples/sec Loss 3.4378 LearningRate 0.0822 Epoch: 11 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:25:03,821-Speed 5064.76 samples/sec Loss 3.3842 LearningRate 0.0821 Epoch: 11 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:25:12,004-Speed 5005.84 samples/sec Loss 3.3933 LearningRate 0.0821 Epoch: 11 Global Step: 49420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-17 09:25:20,068-Speed 5080.05 samples/sec Loss 3.4166 LearningRate 0.0820 Epoch: 11 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:25:28,259-Speed 5001.75 samples/sec Loss 3.4518 LearningRate 0.0820 Epoch: 11 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:25:36,391-Speed 5037.23 samples/sec Loss 3.3847 LearningRate 0.0819 Epoch: 11 Global Step: 49450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:25:44,448-Speed 5084.43 samples/sec Loss 3.3937 LearningRate 0.0819 Epoch: 11 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:25:52,527-Speed 5070.59 samples/sec Loss 3.4213 LearningRate 0.0818 Epoch: 11 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:00,619-Speed 5062.68 samples/sec Loss 3.4256 LearningRate 0.0818 Epoch: 11 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:08,741-Speed 5043.40 samples/sec Loss 3.4005 LearningRate 0.0818 Epoch: 11 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:16,882-Speed 5032.49 samples/sec Loss 3.4129 LearningRate 0.0817 Epoch: 11 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:24,959-Speed 5071.52 samples/sec Loss 3.4045 LearningRate 0.0817 Epoch: 11 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:33,121-Speed 5018.67 samples/sec Loss 3.3877 LearningRate 0.0816 Epoch: 11 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:41,191-Speed 5076.53 samples/sec Loss 3.3966 LearningRate 0.0816 Epoch: 11 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:49,244-Speed 5086.83 samples/sec Loss 3.3775 LearningRate 0.0815 Epoch: 11 Global Step: 49540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:26:57,392-Speed 5027.64 samples/sec Loss 3.3607 LearningRate 0.0815 Epoch: 11 Global Step: 49550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:27:05,443-Speed 5088.40 samples/sec Loss 3.4007 LearningRate 0.0814 Epoch: 11 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:27:13,783-Speed 4911.76 samples/sec Loss 3.3352 LearningRate 0.0814 Epoch: 11 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:27:22,104-Speed 4923.62 samples/sec Loss 3.3842 LearningRate 0.0813 Epoch: 11 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:27:30,484-Speed 4888.32 samples/sec Loss 3.4040 LearningRate 0.0813 Epoch: 11 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:27:38,768-Speed 4944.63 samples/sec Loss 3.3768 LearningRate 0.0812 Epoch: 11 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:27:47,269-Speed 4819.02 samples/sec Loss 3.3453 LearningRate 0.0812 Epoch: 11 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:27:55,562-Speed 4939.79 samples/sec Loss 3.3817 LearningRate 0.0811 Epoch: 11 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:28:03,885-Speed 4921.83 samples/sec Loss 3.3787 LearningRate 0.0811 Epoch: 11 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:28:12,165-Speed 4947.42 samples/sec Loss 3.3709 LearningRate 0.0810 Epoch: 11 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:28:20,465-Speed 4935.76 samples/sec Loss 3.3378 LearningRate 0.0810 Epoch: 11 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:28:28,925-Speed 4842.60 samples/sec Loss 3.3889 LearningRate 0.0809 Epoch: 11 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:28:37,153-Speed 4978.78 samples/sec Loss 3.3394 LearningRate 0.0809 Epoch: 11 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:28:45,316-Speed 5017.95 samples/sec Loss 3.3256 LearningRate 0.0808 Epoch: 11 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:28:53,568-Speed 4964.70 samples/sec Loss 3.3733 LearningRate 0.0808 Epoch: 11 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:29:01,689-Speed 5044.36 samples/sec Loss 3.3974 LearningRate 0.0807 Epoch: 11 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:29:09,907-Speed 4984.72 samples/sec Loss 3.4074 LearningRate 0.0807 Epoch: 11 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:29:18,044-Speed 5034.63 samples/sec Loss 3.3995 LearningRate 0.0806 Epoch: 11 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:29:26,478-Speed 4856.92 samples/sec Loss 3.3668 LearningRate 0.0806 Epoch: 11 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:29:35,403-Speed 4589.96 samples/sec Loss 3.3206 LearningRate 0.0806 Epoch: 11 Global Step: 49740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:29:44,127-Speed 4695.67 samples/sec Loss 3.3781 LearningRate 0.0805 Epoch: 11 Global Step: 49750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:29:52,858-Speed 4691.94 samples/sec Loss 3.2868 LearningRate 0.0805 Epoch: 11 Global Step: 49760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:01,600-Speed 4685.71 samples/sec Loss 3.3682 LearningRate 0.0804 Epoch: 11 Global Step: 49770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:10,284-Speed 4717.69 samples/sec Loss 3.3603 LearningRate 0.0804 Epoch: 11 Global Step: 49780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:18,467-Speed 5006.32 samples/sec Loss 3.3757 LearningRate 0.0803 Epoch: 11 Global Step: 49790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:26,559-Speed 5062.36 samples/sec Loss 3.3759 LearningRate 0.0803 Epoch: 11 Global Step: 49800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:34,694-Speed 5035.87 samples/sec Loss 3.3756 LearningRate 0.0802 Epoch: 11 Global Step: 49810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:42,783-Speed 5064.38 samples/sec Loss 3.3425 LearningRate 0.0802 Epoch: 11 Global Step: 49820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:50,857-Speed 5073.78 samples/sec Loss 3.3448 LearningRate 0.0801 Epoch: 11 Global Step: 49830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:30:58,934-Speed 5071.26 samples/sec Loss 3.3434 LearningRate 0.0801 Epoch: 11 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:31:06,984-Speed 5089.08 samples/sec Loss 3.3465 LearningRate 0.0800 Epoch: 11 Global Step: 49850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:31:15,013-Speed 5102.50 samples/sec Loss 3.3279 LearningRate 0.0800 Epoch: 11 Global Step: 49860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:31:23,046-Speed 5099.04 samples/sec Loss 3.3839 LearningRate 0.0799 Epoch: 11 Global Step: 49870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:31:31,160-Speed 5048.90 samples/sec Loss 3.3002 LearningRate 0.0799 Epoch: 11 Global Step: 49880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:31:39,359-Speed 4996.45 samples/sec Loss 3.3179 LearningRate 0.0798 Epoch: 11 Global Step: 49890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:31:47,428-Speed 5076.77 samples/sec Loss 3.3335 LearningRate 0.0798 Epoch: 11 Global Step: 49900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:31:55,507-Speed 5070.44 samples/sec Loss 3.3552 LearningRate 0.0797 Epoch: 11 Global Step: 49910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:32:03,598-Speed 5063.39 samples/sec Loss 3.3212 LearningRate 0.0797 Epoch: 11 Global Step: 49920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:32:11,763-Speed 5016.98 samples/sec Loss 3.3583 LearningRate 0.0796 Epoch: 11 Global Step: 49930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:32:19,890-Speed 5040.77 samples/sec Loss 3.3534 LearningRate 0.0796 Epoch: 11 Global Step: 49940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:32:28,000-Speed 5050.82 samples/sec Loss 3.3322 LearningRate 0.0796 Epoch: 11 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:32:36,178-Speed 5009.72 samples/sec Loss 3.3447 LearningRate 0.0795 Epoch: 11 Global Step: 49960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:32:44,222-Speed 5092.28 samples/sec Loss 3.2859 LearningRate 0.0795 Epoch: 11 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:32:52,323-Speed 5056.55 samples/sec Loss 3.3139 LearningRate 0.0794 Epoch: 11 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:33:00,461-Speed 5034.34 samples/sec Loss 3.3527 LearningRate 0.0794 Epoch: 11 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:33:08,555-Speed 5061.18 samples/sec Loss 3.3618 LearningRate 0.0793 Epoch: 11 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:33:55,003-[lfw][50000]XNorm: 22.027126 Training: 2022-01-17 09:33:55,004-[lfw][50000]Accuracy-Flip: 0.99817+-0.00252 Training: 2022-01-17 09:33:55,004-[lfw][50000]Accuracy-Highest: 0.99817 Training: 2022-01-17 09:34:48,849-[cfp_fp][50000]XNorm: 19.478287 Training: 2022-01-17 09:34:48,850-[cfp_fp][50000]Accuracy-Flip: 0.98343+-0.00636 Training: 2022-01-17 09:34:48,850-[cfp_fp][50000]Accuracy-Highest: 0.98486 Training: 2022-01-17 09:35:35,115-[agedb_30][50000]XNorm: 21.661712 Training: 2022-01-17 09:35:35,116-[agedb_30][50000]Accuracy-Flip: 0.98133+-0.00759 Training: 2022-01-17 09:35:35,117-[agedb_30][50000]Accuracy-Highest: 0.98133 Training: 2022-01-17 09:35:43,397-Speed 264.53 samples/sec Loss 3.3331 LearningRate 0.0793 Epoch: 11 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:35:51,586-Speed 5002.23 samples/sec Loss 3.3315 LearningRate 0.0792 Epoch: 11 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:35:59,656-Speed 5076.55 samples/sec Loss 3.3194 LearningRate 0.0792 Epoch: 11 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:36:07,779-Speed 5042.83 samples/sec Loss 3.3855 LearningRate 0.0791 Epoch: 11 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:36:16,111-Speed 4916.90 samples/sec Loss 3.3032 LearningRate 0.0791 Epoch: 11 Global Step: 50050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:36:24,368-Speed 4961.04 samples/sec Loss 3.3153 LearningRate 0.0790 Epoch: 11 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:37:17,348-Speed 773.14 samples/sec Loss 3.0260 LearningRate 0.0790 Epoch: 12 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:37:25,524-Speed 5010.97 samples/sec Loss 2.7952 LearningRate 0.0789 Epoch: 12 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:37:33,744-Speed 4983.33 samples/sec Loss 2.7774 LearningRate 0.0789 Epoch: 12 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:37:41,898-Speed 5024.24 samples/sec Loss 2.7874 LearningRate 0.0788 Epoch: 12 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:37:50,125-Speed 4979.41 samples/sec Loss 2.7738 LearningRate 0.0788 Epoch: 12 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:37:58,330-Speed 4992.35 samples/sec Loss 2.7531 LearningRate 0.0787 Epoch: 12 Global Step: 50120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:38:06,522-Speed 5000.62 samples/sec Loss 2.7990 LearningRate 0.0787 Epoch: 12 Global Step: 50130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:38:14,740-Speed 4985.39 samples/sec Loss 2.7867 LearningRate 0.0787 Epoch: 12 Global Step: 50140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:38:23,144-Speed 4874.37 samples/sec Loss 2.8046 LearningRate 0.0786 Epoch: 12 Global Step: 50150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:38:31,412-Speed 4955.04 samples/sec Loss 2.7663 LearningRate 0.0786 Epoch: 12 Global Step: 50160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:38:39,570-Speed 5021.43 samples/sec Loss 2.7906 LearningRate 0.0785 Epoch: 12 Global Step: 50170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:38:47,722-Speed 5024.80 samples/sec Loss 2.8665 LearningRate 0.0785 Epoch: 12 Global Step: 50180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:38:56,597-Speed 4615.69 samples/sec Loss 2.8134 LearningRate 0.0784 Epoch: 12 Global Step: 50190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:39:05,877-Speed 4414.45 samples/sec Loss 2.8356 LearningRate 0.0784 Epoch: 12 Global Step: 50200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:39:14,658-Speed 4665.19 samples/sec Loss 2.8014 LearningRate 0.0783 Epoch: 12 Global Step: 50210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:39:23,523-Speed 4620.95 samples/sec Loss 2.8377 LearningRate 0.0783 Epoch: 12 Global Step: 50220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:39:31,637-Speed 5048.73 samples/sec Loss 2.8512 LearningRate 0.0782 Epoch: 12 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:39:39,829-Speed 5000.99 samples/sec Loss 2.8541 LearningRate 0.0782 Epoch: 12 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:39:47,953-Speed 5041.97 samples/sec Loss 2.8572 LearningRate 0.0781 Epoch: 12 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:39:56,249-Speed 4938.25 samples/sec Loss 2.8232 LearningRate 0.0781 Epoch: 12 Global Step: 50260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:40:04,366-Speed 5046.86 samples/sec Loss 2.8644 LearningRate 0.0780 Epoch: 12 Global Step: 50270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:40:12,446-Speed 5070.01 samples/sec Loss 2.8748 LearningRate 0.0780 Epoch: 12 Global Step: 50280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:40:20,461-Speed 5110.74 samples/sec Loss 2.9096 LearningRate 0.0779 Epoch: 12 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:40:28,545-Speed 5067.63 samples/sec Loss 2.8966 LearningRate 0.0779 Epoch: 12 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:40:36,611-Speed 5079.09 samples/sec Loss 2.8761 LearningRate 0.0779 Epoch: 12 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:40:44,650-Speed 5095.68 samples/sec Loss 2.8471 LearningRate 0.0778 Epoch: 12 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:40:52,771-Speed 5044.59 samples/sec Loss 2.9275 LearningRate 0.0778 Epoch: 12 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:41:00,851-Speed 5069.40 samples/sec Loss 2.8962 LearningRate 0.0777 Epoch: 12 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:41:09,017-Speed 5017.02 samples/sec Loss 2.9275 LearningRate 0.0777 Epoch: 12 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:41:17,354-Speed 4913.66 samples/sec Loss 2.8944 LearningRate 0.0776 Epoch: 12 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:41:25,445-Speed 5063.11 samples/sec Loss 2.9019 LearningRate 0.0776 Epoch: 12 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:41:33,626-Speed 5007.22 samples/sec Loss 2.9144 LearningRate 0.0775 Epoch: 12 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:41:41,713-Speed 5065.69 samples/sec Loss 2.9074 LearningRate 0.0775 Epoch: 12 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:41:49,758-Speed 5092.15 samples/sec Loss 2.9475 LearningRate 0.0774 Epoch: 12 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:41:57,812-Speed 5086.07 samples/sec Loss 2.9516 LearningRate 0.0774 Epoch: 12 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:42:05,867-Speed 5085.52 samples/sec Loss 2.9195 LearningRate 0.0773 Epoch: 12 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:42:13,919-Speed 5087.37 samples/sec Loss 2.9646 LearningRate 0.0773 Epoch: 12 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:42:22,005-Speed 5066.85 samples/sec Loss 2.9114 LearningRate 0.0772 Epoch: 12 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:42:30,049-Speed 5092.44 samples/sec Loss 2.9416 LearningRate 0.0772 Epoch: 12 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:42:38,114-Speed 5079.35 samples/sec Loss 2.9395 LearningRate 0.0771 Epoch: 12 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:42:46,163-Speed 5089.42 samples/sec Loss 2.9744 LearningRate 0.0771 Epoch: 12 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:42:54,388-Speed 4980.89 samples/sec Loss 2.9565 LearningRate 0.0771 Epoch: 12 Global Step: 50480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:43:02,579-Speed 5000.91 samples/sec Loss 2.9709 LearningRate 0.0770 Epoch: 12 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:43:10,586-Speed 5116.56 samples/sec Loss 2.9968 LearningRate 0.0770 Epoch: 12 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:43:18,653-Speed 5078.40 samples/sec Loss 3.0025 LearningRate 0.0769 Epoch: 12 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:43:26,754-Speed 5056.73 samples/sec Loss 2.9970 LearningRate 0.0769 Epoch: 12 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:43:34,784-Speed 5101.39 samples/sec Loss 2.9857 LearningRate 0.0768 Epoch: 12 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:43:42,999-Speed 4986.65 samples/sec Loss 3.0225 LearningRate 0.0768 Epoch: 12 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:43:51,063-Speed 5080.14 samples/sec Loss 2.9933 LearningRate 0.0767 Epoch: 12 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:43:59,404-Speed 4910.98 samples/sec Loss 3.0285 LearningRate 0.0767 Epoch: 12 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:44:07,519-Speed 5047.89 samples/sec Loss 3.0131 LearningRate 0.0766 Epoch: 12 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:44:15,618-Speed 5058.47 samples/sec Loss 2.9690 LearningRate 0.0766 Epoch: 12 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:44:23,802-Speed 5005.86 samples/sec Loss 3.0264 LearningRate 0.0765 Epoch: 12 Global Step: 50590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:44:31,847-Speed 5091.68 samples/sec Loss 3.0285 LearningRate 0.0765 Epoch: 12 Global Step: 50600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:44:39,945-Speed 5058.93 samples/sec Loss 3.0387 LearningRate 0.0764 Epoch: 12 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:44:48,093-Speed 5027.71 samples/sec Loss 2.9742 LearningRate 0.0764 Epoch: 12 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:44:56,175-Speed 5068.60 samples/sec Loss 3.0113 LearningRate 0.0764 Epoch: 12 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:45:04,235-Speed 5082.34 samples/sec Loss 3.0431 LearningRate 0.0763 Epoch: 12 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:45:12,318-Speed 5068.78 samples/sec Loss 3.0274 LearningRate 0.0763 Epoch: 12 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:45:20,366-Speed 5089.63 samples/sec Loss 3.0351 LearningRate 0.0762 Epoch: 12 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:45:28,500-Speed 5036.64 samples/sec Loss 3.0667 LearningRate 0.0762 Epoch: 12 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:45:36,533-Speed 5099.78 samples/sec Loss 3.0294 LearningRate 0.0761 Epoch: 12 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:45:44,613-Speed 5069.92 samples/sec Loss 3.0585 LearningRate 0.0761 Epoch: 12 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:45:52,848-Speed 4974.41 samples/sec Loss 3.0090 LearningRate 0.0760 Epoch: 12 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:46:00,877-Speed 5102.18 samples/sec Loss 3.0697 LearningRate 0.0760 Epoch: 12 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:46:09,027-Speed 5026.72 samples/sec Loss 3.0629 LearningRate 0.0759 Epoch: 12 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:46:17,142-Speed 5048.18 samples/sec Loss 3.0941 LearningRate 0.0759 Epoch: 12 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:46:25,403-Speed 4959.37 samples/sec Loss 3.0556 LearningRate 0.0758 Epoch: 12 Global Step: 50740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:46:33,536-Speed 5037.15 samples/sec Loss 3.0723 LearningRate 0.0758 Epoch: 12 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:46:41,716-Speed 5007.42 samples/sec Loss 3.0317 LearningRate 0.0758 Epoch: 12 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:46:49,821-Speed 5054.45 samples/sec Loss 3.0426 LearningRate 0.0757 Epoch: 12 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:46:57,892-Speed 5075.27 samples/sec Loss 3.0380 LearningRate 0.0757 Epoch: 12 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:47:06,071-Speed 5008.85 samples/sec Loss 3.0682 LearningRate 0.0756 Epoch: 12 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:47:14,236-Speed 5017.22 samples/sec Loss 3.1169 LearningRate 0.0756 Epoch: 12 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:47:22,608-Speed 4892.84 samples/sec Loss 3.1017 LearningRate 0.0755 Epoch: 12 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:47:30,938-Speed 4917.74 samples/sec Loss 3.0834 LearningRate 0.0755 Epoch: 12 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:47:39,350-Speed 4870.14 samples/sec Loss 3.0547 LearningRate 0.0754 Epoch: 12 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:47:47,889-Speed 4797.35 samples/sec Loss 3.0853 LearningRate 0.0754 Epoch: 12 Global Step: 50840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:47:56,369-Speed 4830.90 samples/sec Loss 3.1119 LearningRate 0.0753 Epoch: 12 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:48:04,652-Speed 4945.71 samples/sec Loss 3.0663 LearningRate 0.0753 Epoch: 12 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:48:12,944-Speed 4940.35 samples/sec Loss 3.0497 LearningRate 0.0752 Epoch: 12 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:48:21,224-Speed 4947.39 samples/sec Loss 3.0896 LearningRate 0.0752 Epoch: 12 Global Step: 50880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:48:29,731-Speed 4815.41 samples/sec Loss 3.0612 LearningRate 0.0751 Epoch: 12 Global Step: 50890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:48:38,450-Speed 4698.60 samples/sec Loss 3.1102 LearningRate 0.0751 Epoch: 12 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:48:46,874-Speed 4862.88 samples/sec Loss 3.1041 LearningRate 0.0751 Epoch: 12 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:48:54,986-Speed 5050.41 samples/sec Loss 3.0688 LearningRate 0.0750 Epoch: 12 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:49:03,083-Speed 5059.48 samples/sec Loss 3.0638 LearningRate 0.0750 Epoch: 12 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:49:11,204-Speed 5044.63 samples/sec Loss 3.0678 LearningRate 0.0749 Epoch: 12 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:49:19,426-Speed 4982.53 samples/sec Loss 3.0820 LearningRate 0.0749 Epoch: 12 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:49:27,596-Speed 5014.23 samples/sec Loss 3.0993 LearningRate 0.0748 Epoch: 12 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:49:35,689-Speed 5061.86 samples/sec Loss 3.1046 LearningRate 0.0748 Epoch: 12 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:49:43,792-Speed 5055.46 samples/sec Loss 3.0845 LearningRate 0.0747 Epoch: 12 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:49:51,981-Speed 5002.67 samples/sec Loss 3.0844 LearningRate 0.0747 Epoch: 12 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:00,106-Speed 5041.67 samples/sec Loss 3.0933 LearningRate 0.0746 Epoch: 12 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:08,475-Speed 4895.25 samples/sec Loss 3.1045 LearningRate 0.0746 Epoch: 12 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:16,563-Speed 5064.39 samples/sec Loss 3.1568 LearningRate 0.0746 Epoch: 12 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:24,778-Speed 4987.20 samples/sec Loss 3.1711 LearningRate 0.0745 Epoch: 12 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:32,932-Speed 5023.79 samples/sec Loss 3.0958 LearningRate 0.0745 Epoch: 12 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:41,168-Speed 4974.28 samples/sec Loss 3.1154 LearningRate 0.0744 Epoch: 12 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:49,355-Speed 5003.49 samples/sec Loss 3.1104 LearningRate 0.0744 Epoch: 12 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:50:57,473-Speed 5046.06 samples/sec Loss 3.1891 LearningRate 0.0743 Epoch: 12 Global Step: 51070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:51:05,648-Speed 5010.80 samples/sec Loss 3.1110 LearningRate 0.0743 Epoch: 12 Global Step: 51080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:51:13,797-Speed 5027.34 samples/sec Loss 3.0764 LearningRate 0.0742 Epoch: 12 Global Step: 51090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:51:21,851-Speed 5086.10 samples/sec Loss 3.0899 LearningRate 0.0742 Epoch: 12 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:51:29,954-Speed 5055.77 samples/sec Loss 3.1251 LearningRate 0.0741 Epoch: 12 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:51:38,005-Speed 5088.18 samples/sec Loss 3.1412 LearningRate 0.0741 Epoch: 12 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:51:46,109-Speed 5054.99 samples/sec Loss 3.1518 LearningRate 0.0740 Epoch: 12 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:51:54,180-Speed 5075.77 samples/sec Loss 3.1126 LearningRate 0.0740 Epoch: 12 Global Step: 51140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:52:02,229-Speed 5089.56 samples/sec Loss 3.0887 LearningRate 0.0740 Epoch: 12 Global Step: 51150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:52:10,306-Speed 5071.46 samples/sec Loss 3.1182 LearningRate 0.0739 Epoch: 12 Global Step: 51160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:52:18,376-Speed 5076.69 samples/sec Loss 3.1210 LearningRate 0.0739 Epoch: 12 Global Step: 51170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:52:26,420-Speed 5092.34 samples/sec Loss 3.1111 LearningRate 0.0738 Epoch: 12 Global Step: 51180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:52:34,569-Speed 5027.18 samples/sec Loss 3.0957 LearningRate 0.0738 Epoch: 12 Global Step: 51190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:52:42,640-Speed 5076.07 samples/sec Loss 3.0902 LearningRate 0.0737 Epoch: 12 Global Step: 51200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:52:50,770-Speed 5038.58 samples/sec Loss 3.1360 LearningRate 0.0737 Epoch: 12 Global Step: 51210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:52:58,865-Speed 5060.42 samples/sec Loss 3.0875 LearningRate 0.0736 Epoch: 12 Global Step: 51220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:53:06,956-Speed 5062.99 samples/sec Loss 3.1345 LearningRate 0.0736 Epoch: 12 Global Step: 51230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:53:15,171-Speed 4987.18 samples/sec Loss 3.1156 LearningRate 0.0735 Epoch: 12 Global Step: 51240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:53:23,198-Speed 5103.50 samples/sec Loss 3.0901 LearningRate 0.0735 Epoch: 12 Global Step: 51250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-17 09:53:31,341-Speed 5030.21 samples/sec Loss 3.1335 LearningRate 0.0735 Epoch: 12 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:53:39,459-Speed 5046.67 samples/sec Loss 3.1245 LearningRate 0.0734 Epoch: 12 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:53:47,597-Speed 5033.51 samples/sec Loss 3.1198 LearningRate 0.0734 Epoch: 12 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:53:55,740-Speed 5031.14 samples/sec Loss 3.1344 LearningRate 0.0733 Epoch: 12 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:54:03,826-Speed 5066.21 samples/sec Loss 3.1444 LearningRate 0.0733 Epoch: 12 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:54:11,934-Speed 5052.37 samples/sec Loss 3.1005 LearningRate 0.0732 Epoch: 12 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:54:20,075-Speed 5032.49 samples/sec Loss 3.1041 LearningRate 0.0732 Epoch: 12 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:54:28,281-Speed 4991.87 samples/sec Loss 3.0848 LearningRate 0.0731 Epoch: 12 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:54:36,581-Speed 4935.34 samples/sec Loss 3.1104 LearningRate 0.0731 Epoch: 12 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:54:44,646-Speed 5079.95 samples/sec Loss 3.1372 LearningRate 0.0730 Epoch: 12 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:54:52,887-Speed 4970.70 samples/sec Loss 3.1248 LearningRate 0.0730 Epoch: 12 Global Step: 51360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:55:01,110-Speed 4982.19 samples/sec Loss 3.0941 LearningRate 0.0729 Epoch: 12 Global Step: 51370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:55:09,229-Speed 5045.82 samples/sec Loss 3.1300 LearningRate 0.0729 Epoch: 12 Global Step: 51380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:55:17,306-Speed 5071.16 samples/sec Loss 3.1283 LearningRate 0.0729 Epoch: 12 Global Step: 51390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:55:25,378-Speed 5075.73 samples/sec Loss 3.1403 LearningRate 0.0728 Epoch: 12 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:55:33,525-Speed 5028.28 samples/sec Loss 3.1309 LearningRate 0.0728 Epoch: 12 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:55:41,624-Speed 5057.90 samples/sec Loss 3.1416 LearningRate 0.0727 Epoch: 12 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:55:49,670-Speed 5091.14 samples/sec Loss 3.1163 LearningRate 0.0727 Epoch: 12 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:55:57,743-Speed 5074.37 samples/sec Loss 3.1208 LearningRate 0.0726 Epoch: 12 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:56:05,842-Speed 5058.01 samples/sec Loss 3.1346 LearningRate 0.0726 Epoch: 12 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:56:13,913-Speed 5075.73 samples/sec Loss 3.1540 LearningRate 0.0725 Epoch: 12 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:56:22,043-Speed 5038.84 samples/sec Loss 3.1037 LearningRate 0.0725 Epoch: 12 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:56:30,191-Speed 5027.39 samples/sec Loss 3.1587 LearningRate 0.0725 Epoch: 12 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:56:38,297-Speed 5053.79 samples/sec Loss 3.0694 LearningRate 0.0724 Epoch: 12 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:56:46,471-Speed 5011.19 samples/sec Loss 3.1111 LearningRate 0.0724 Epoch: 12 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:56:54,561-Speed 5064.37 samples/sec Loss 3.0884 LearningRate 0.0723 Epoch: 12 Global Step: 51510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:02,748-Speed 5003.79 samples/sec Loss 3.1057 LearningRate 0.0723 Epoch: 12 Global Step: 51520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:10,876-Speed 5040.07 samples/sec Loss 3.1660 LearningRate 0.0722 Epoch: 12 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:18,957-Speed 5069.09 samples/sec Loss 3.1490 LearningRate 0.0722 Epoch: 12 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:27,050-Speed 5061.96 samples/sec Loss 3.0873 LearningRate 0.0721 Epoch: 12 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:35,162-Speed 5049.66 samples/sec Loss 3.1216 LearningRate 0.0721 Epoch: 12 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:43,469-Speed 4931.83 samples/sec Loss 3.0606 LearningRate 0.0720 Epoch: 12 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:51,609-Speed 5032.41 samples/sec Loss 3.1239 LearningRate 0.0720 Epoch: 12 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:57:59,703-Speed 5061.23 samples/sec Loss 3.0864 LearningRate 0.0720 Epoch: 12 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:58:07,819-Speed 5047.26 samples/sec Loss 3.0860 LearningRate 0.0719 Epoch: 12 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 09:58:16,055-Speed 4974.42 samples/sec Loss 3.0720 LearningRate 0.0719 Epoch: 12 Global Step: 51610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:58:24,184-Speed 5039.31 samples/sec Loss 3.1172 LearningRate 0.0718 Epoch: 12 Global Step: 51620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:58:32,247-Speed 5080.67 samples/sec Loss 3.1433 LearningRate 0.0718 Epoch: 12 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:58:40,303-Speed 5084.95 samples/sec Loss 3.1444 LearningRate 0.0717 Epoch: 12 Global Step: 51640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:58:48,428-Speed 5042.25 samples/sec Loss 3.1483 LearningRate 0.0717 Epoch: 12 Global Step: 51650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:58:56,496-Speed 5077.45 samples/sec Loss 3.1463 LearningRate 0.0716 Epoch: 12 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:59:04,551-Speed 5085.64 samples/sec Loss 3.1356 LearningRate 0.0716 Epoch: 12 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:59:12,608-Speed 5084.03 samples/sec Loss 3.1500 LearningRate 0.0715 Epoch: 12 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:59:20,738-Speed 5039.27 samples/sec Loss 3.1101 LearningRate 0.0715 Epoch: 12 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:59:28,762-Speed 5105.42 samples/sec Loss 3.1023 LearningRate 0.0715 Epoch: 12 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:59:36,980-Speed 4985.52 samples/sec Loss 3.1276 LearningRate 0.0714 Epoch: 12 Global Step: 51710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-17 09:59:45,057-Speed 5071.64 samples/sec Loss 3.0943 LearningRate 0.0714 Epoch: 12 Global Step: 51720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 09:59:53,148-Speed 5062.62 samples/sec Loss 3.1345 LearningRate 0.0713 Epoch: 12 Global Step: 51730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:01,288-Speed 5033.07 samples/sec Loss 3.1027 LearningRate 0.0713 Epoch: 12 Global Step: 51740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:09,405-Speed 5047.01 samples/sec Loss 3.1386 LearningRate 0.0712 Epoch: 12 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:17,489-Speed 5067.09 samples/sec Loss 3.1348 LearningRate 0.0712 Epoch: 12 Global Step: 51760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:25,579-Speed 5063.70 samples/sec Loss 3.1234 LearningRate 0.0711 Epoch: 12 Global Step: 51770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:33,641-Speed 5081.19 samples/sec Loss 3.0832 LearningRate 0.0711 Epoch: 12 Global Step: 51780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:41,948-Speed 4931.63 samples/sec Loss 3.1303 LearningRate 0.0711 Epoch: 12 Global Step: 51790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:50,038-Speed 5064.08 samples/sec Loss 3.0695 LearningRate 0.0710 Epoch: 12 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:00:58,144-Speed 5053.43 samples/sec Loss 3.1381 LearningRate 0.0710 Epoch: 12 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:01:06,213-Speed 5077.03 samples/sec Loss 3.1348 LearningRate 0.0709 Epoch: 12 Global Step: 51820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:01:14,310-Speed 5059.25 samples/sec Loss 3.0997 LearningRate 0.0709 Epoch: 12 Global Step: 51830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:01:22,461-Speed 5025.52 samples/sec Loss 3.1340 LearningRate 0.0708 Epoch: 12 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:01:30,553-Speed 5062.49 samples/sec Loss 3.0833 LearningRate 0.0708 Epoch: 12 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:01:38,676-Speed 5043.37 samples/sec Loss 3.1045 LearningRate 0.0707 Epoch: 12 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:01:46,474-Speed 5253.25 samples/sec Loss 3.1030 LearningRate 0.0707 Epoch: 12 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:01:54,337-Speed 5209.90 samples/sec Loss 3.0902 LearningRate 0.0706 Epoch: 12 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:02:02,596-Speed 4960.12 samples/sec Loss 3.1186 LearningRate 0.0706 Epoch: 12 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:02:10,726-Speed 5038.39 samples/sec Loss 3.1020 LearningRate 0.0706 Epoch: 12 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:02:18,846-Speed 5044.98 samples/sec Loss 3.1525 LearningRate 0.0705 Epoch: 12 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:02:26,880-Speed 5099.55 samples/sec Loss 3.1173 LearningRate 0.0705 Epoch: 12 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:02:34,979-Speed 5057.71 samples/sec Loss 3.1060 LearningRate 0.0704 Epoch: 12 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:02:43,184-Speed 4992.82 samples/sec Loss 3.1499 LearningRate 0.0704 Epoch: 12 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:02:51,244-Speed 5082.45 samples/sec Loss 3.1330 LearningRate 0.0703 Epoch: 12 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:02:59,348-Speed 5054.76 samples/sec Loss 3.0767 LearningRate 0.0703 Epoch: 12 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:03:07,519-Speed 5013.56 samples/sec Loss 3.0939 LearningRate 0.0702 Epoch: 12 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:03:15,814-Speed 4938.86 samples/sec Loss 3.1318 LearningRate 0.0702 Epoch: 12 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:03:23,940-Speed 5041.10 samples/sec Loss 3.0772 LearningRate 0.0702 Epoch: 12 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:03:32,016-Speed 5072.22 samples/sec Loss 3.0601 LearningRate 0.0701 Epoch: 12 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:03:40,215-Speed 4997.04 samples/sec Loss 3.0938 LearningRate 0.0701 Epoch: 12 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:03:48,321-Speed 5053.51 samples/sec Loss 3.0840 LearningRate 0.0700 Epoch: 12 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:03:56,443-Speed 5043.72 samples/sec Loss 3.0908 LearningRate 0.0700 Epoch: 12 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:04:04,620-Speed 5009.72 samples/sec Loss 3.0906 LearningRate 0.0699 Epoch: 12 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:04:12,655-Speed 5098.46 samples/sec Loss 3.1026 LearningRate 0.0699 Epoch: 12 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-17 10:04:20,791-Speed 5035.01 samples/sec Loss 3.1362 LearningRate 0.0698 Epoch: 12 Global Step: 52060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:04:28,888-Speed 5059.70 samples/sec Loss 3.0899 LearningRate 0.0698 Epoch: 12 Global Step: 52070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-17 10:04:36,984-Speed 5059.30 samples/sec Loss 3.0984 LearningRate 0.0698 Epoch: 12 Global Step: 52080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:04:45,090-Speed 5054.27 samples/sec Loss 3.1050 LearningRate 0.0697 Epoch: 12 Global Step: 52090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:04:53,211-Speed 5044.63 samples/sec Loss 3.1016 LearningRate 0.0697 Epoch: 12 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:01,431-Speed 4983.24 samples/sec Loss 3.0990 LearningRate 0.0696 Epoch: 12 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:09,661-Speed 4978.23 samples/sec Loss 3.0832 LearningRate 0.0696 Epoch: 12 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:17,769-Speed 5052.58 samples/sec Loss 3.0310 LearningRate 0.0695 Epoch: 12 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:25,907-Speed 5033.51 samples/sec Loss 3.0948 LearningRate 0.0695 Epoch: 12 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:33,957-Speed 5089.22 samples/sec Loss 3.0728 LearningRate 0.0694 Epoch: 12 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:42,029-Speed 5074.93 samples/sec Loss 3.0545 LearningRate 0.0694 Epoch: 12 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:50,072-Speed 5093.37 samples/sec Loss 3.0867 LearningRate 0.0694 Epoch: 12 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:05:58,141-Speed 5076.37 samples/sec Loss 3.0709 LearningRate 0.0693 Epoch: 12 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:06:06,245-Speed 5054.89 samples/sec Loss 3.0385 LearningRate 0.0693 Epoch: 12 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:06:14,305-Speed 5082.69 samples/sec Loss 3.0980 LearningRate 0.0692 Epoch: 12 Global Step: 52200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:06:22,419-Speed 5048.56 samples/sec Loss 3.0633 LearningRate 0.0692 Epoch: 12 Global Step: 52210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:06:30,632-Speed 4988.22 samples/sec Loss 3.0389 LearningRate 0.0691 Epoch: 12 Global Step: 52220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:06:39,265-Speed 4745.11 samples/sec Loss 3.1078 LearningRate 0.0691 Epoch: 12 Global Step: 52230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:06:47,961-Speed 4711.09 samples/sec Loss 3.0536 LearningRate 0.0690 Epoch: 12 Global Step: 52240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:06:56,451-Speed 4824.73 samples/sec Loss 3.0729 LearningRate 0.0690 Epoch: 12 Global Step: 52250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:07:04,542-Speed 5063.49 samples/sec Loss 3.0680 LearningRate 0.0690 Epoch: 12 Global Step: 52260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:07:12,648-Speed 5053.73 samples/sec Loss 3.0505 LearningRate 0.0689 Epoch: 12 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:07:20,714-Speed 5079.00 samples/sec Loss 3.0401 LearningRate 0.0689 Epoch: 12 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:07:28,777-Speed 5080.10 samples/sec Loss 3.0588 LearningRate 0.0688 Epoch: 12 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:07:36,805-Speed 5103.05 samples/sec Loss 3.0862 LearningRate 0.0688 Epoch: 12 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:07:44,866-Speed 5082.16 samples/sec Loss 3.0522 LearningRate 0.0687 Epoch: 12 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:07:52,961-Speed 5060.35 samples/sec Loss 3.0670 LearningRate 0.0687 Epoch: 12 Global Step: 52320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:08:01,153-Speed 5001.46 samples/sec Loss 3.0367 LearningRate 0.0686 Epoch: 12 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:08:09,462-Speed 4929.80 samples/sec Loss 3.0302 LearningRate 0.0686 Epoch: 12 Global Step: 52340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:08:17,561-Speed 5058.05 samples/sec Loss 3.1061 LearningRate 0.0686 Epoch: 12 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:08:25,560-Speed 5121.81 samples/sec Loss 3.0374 LearningRate 0.0685 Epoch: 12 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:08:33,749-Speed 5002.68 samples/sec Loss 3.0758 LearningRate 0.0685 Epoch: 12 Global Step: 52370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:08:41,787-Speed 5096.24 samples/sec Loss 3.0592 LearningRate 0.0684 Epoch: 12 Global Step: 52380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:08:49,878-Speed 5063.42 samples/sec Loss 3.0996 LearningRate 0.0684 Epoch: 12 Global Step: 52390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:08:57,927-Speed 5089.58 samples/sec Loss 3.1059 LearningRate 0.0683 Epoch: 12 Global Step: 52400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:09:06,261-Speed 4914.94 samples/sec Loss 3.0589 LearningRate 0.0683 Epoch: 12 Global Step: 52410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:09:14,307-Speed 5091.14 samples/sec Loss 3.0509 LearningRate 0.0683 Epoch: 12 Global Step: 52420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:09:22,411-Speed 5055.38 samples/sec Loss 3.0498 LearningRate 0.0682 Epoch: 12 Global Step: 52430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:09:30,550-Speed 5033.19 samples/sec Loss 3.0634 LearningRate 0.0682 Epoch: 12 Global Step: 52440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:09:38,601-Speed 5088.47 samples/sec Loss 3.0136 LearningRate 0.0681 Epoch: 12 Global Step: 52450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:09:46,646-Speed 5091.85 samples/sec Loss 3.0345 LearningRate 0.0681 Epoch: 12 Global Step: 52460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:09:54,727-Speed 5069.72 samples/sec Loss 3.0923 LearningRate 0.0680 Epoch: 12 Global Step: 52470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:10:02,873-Speed 5028.89 samples/sec Loss 3.0519 LearningRate 0.0680 Epoch: 12 Global Step: 52480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:10:10,958-Speed 5066.79 samples/sec Loss 3.0550 LearningRate 0.0679 Epoch: 12 Global Step: 52490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:10:19,137-Speed 5008.18 samples/sec Loss 3.0417 LearningRate 0.0679 Epoch: 12 Global Step: 52500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:10:27,284-Speed 5028.82 samples/sec Loss 3.0580 LearningRate 0.0679 Epoch: 12 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:10:35,448-Speed 5017.23 samples/sec Loss 3.0177 LearningRate 0.0678 Epoch: 12 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:10:43,769-Speed 4923.34 samples/sec Loss 3.0132 LearningRate 0.0678 Epoch: 12 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:10:52,189-Speed 4865.12 samples/sec Loss 3.0658 LearningRate 0.0677 Epoch: 12 Global Step: 52540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:00,581-Speed 4881.51 samples/sec Loss 3.0227 LearningRate 0.0677 Epoch: 12 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:08,817-Speed 4974.26 samples/sec Loss 3.0530 LearningRate 0.0676 Epoch: 12 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:16,898-Speed 5069.16 samples/sec Loss 3.0536 LearningRate 0.0676 Epoch: 12 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:24,964-Speed 5078.96 samples/sec Loss 3.0600 LearningRate 0.0675 Epoch: 12 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:33,115-Speed 5025.67 samples/sec Loss 3.0498 LearningRate 0.0675 Epoch: 12 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:41,349-Speed 4975.19 samples/sec Loss 3.0326 LearningRate 0.0675 Epoch: 12 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:49,542-Speed 4999.55 samples/sec Loss 3.0389 LearningRate 0.0674 Epoch: 12 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:11:57,718-Speed 5010.99 samples/sec Loss 3.0597 LearningRate 0.0674 Epoch: 12 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:12:05,943-Speed 4980.54 samples/sec Loss 3.0599 LearningRate 0.0673 Epoch: 12 Global Step: 52630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:12:14,115-Speed 5012.20 samples/sec Loss 3.0784 LearningRate 0.0673 Epoch: 12 Global Step: 52640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:12:22,288-Speed 5012.67 samples/sec Loss 2.9863 LearningRate 0.0672 Epoch: 12 Global Step: 52650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:12:30,413-Speed 5042.03 samples/sec Loss 3.0404 LearningRate 0.0672 Epoch: 12 Global Step: 52660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:12:38,687-Speed 4951.17 samples/sec Loss 3.0235 LearningRate 0.0672 Epoch: 12 Global Step: 52670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:12:46,827-Speed 5032.79 samples/sec Loss 3.0521 LearningRate 0.0671 Epoch: 12 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:12:54,908-Speed 5069.23 samples/sec Loss 3.0246 LearningRate 0.0671 Epoch: 12 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:13:03,095-Speed 5003.89 samples/sec Loss 3.0717 LearningRate 0.0670 Epoch: 12 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:13:11,181-Speed 5066.00 samples/sec Loss 3.0470 LearningRate 0.0670 Epoch: 12 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:13:19,416-Speed 4974.50 samples/sec Loss 3.0367 LearningRate 0.0669 Epoch: 12 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:13:27,556-Speed 5033.02 samples/sec Loss 3.0274 LearningRate 0.0669 Epoch: 12 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:13:35,672-Speed 5047.22 samples/sec Loss 3.0236 LearningRate 0.0669 Epoch: 12 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:13:43,934-Speed 4957.87 samples/sec Loss 3.0280 LearningRate 0.0668 Epoch: 12 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:13:51,971-Speed 5097.87 samples/sec Loss 3.0098 LearningRate 0.0668 Epoch: 12 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:14:00,272-Speed 4934.47 samples/sec Loss 3.0353 LearningRate 0.0667 Epoch: 12 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:14:08,926-Speed 4734.07 samples/sec Loss 2.9844 LearningRate 0.0667 Epoch: 12 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:14:17,085-Speed 5020.84 samples/sec Loss 2.9855 LearningRate 0.0666 Epoch: 12 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:14:25,326-Speed 4970.48 samples/sec Loss 3.0357 LearningRate 0.0666 Epoch: 12 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:14:33,578-Speed 4964.40 samples/sec Loss 3.0215 LearningRate 0.0665 Epoch: 12 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:14:41,690-Speed 5050.18 samples/sec Loss 3.0271 LearningRate 0.0665 Epoch: 12 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:14:49,836-Speed 5028.80 samples/sec Loss 2.9832 LearningRate 0.0665 Epoch: 12 Global Step: 52830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:14:58,105-Speed 4953.76 samples/sec Loss 3.0146 LearningRate 0.0664 Epoch: 12 Global Step: 52840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:15:06,511-Speed 4873.77 samples/sec Loss 2.9997 LearningRate 0.0664 Epoch: 12 Global Step: 52850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:15:15,340-Speed 4639.71 samples/sec Loss 3.0162 LearningRate 0.0663 Epoch: 12 Global Step: 52860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:15:24,047-Speed 4704.61 samples/sec Loss 3.0082 LearningRate 0.0663 Epoch: 12 Global Step: 52870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:15:32,828-Speed 4665.23 samples/sec Loss 2.9956 LearningRate 0.0662 Epoch: 12 Global Step: 52880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:15:41,583-Speed 4679.46 samples/sec Loss 2.9966 LearningRate 0.0662 Epoch: 12 Global Step: 52890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:15:50,333-Speed 4681.47 samples/sec Loss 2.9731 LearningRate 0.0662 Epoch: 12 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:15:59,025-Speed 4712.86 samples/sec Loss 2.9887 LearningRate 0.0661 Epoch: 12 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:16:07,882-Speed 4625.21 samples/sec Loss 2.9994 LearningRate 0.0661 Epoch: 12 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:16:16,590-Speed 4704.29 samples/sec Loss 2.9994 LearningRate 0.0660 Epoch: 12 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:16:25,282-Speed 4712.94 samples/sec Loss 3.0012 LearningRate 0.0660 Epoch: 12 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:16:34,165-Speed 4611.32 samples/sec Loss 2.9893 LearningRate 0.0659 Epoch: 12 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:16:42,969-Speed 4653.60 samples/sec Loss 3.0038 LearningRate 0.0659 Epoch: 12 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:16:51,703-Speed 4689.98 samples/sec Loss 3.0102 LearningRate 0.0659 Epoch: 12 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:17:00,421-Speed 4698.72 samples/sec Loss 3.0036 LearningRate 0.0658 Epoch: 12 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:17:09,176-Speed 4678.92 samples/sec Loss 3.0097 LearningRate 0.0658 Epoch: 12 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:17:17,906-Speed 4692.85 samples/sec Loss 3.0037 LearningRate 0.0657 Epoch: 12 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:17:26,657-Speed 4680.57 samples/sec Loss 3.0330 LearningRate 0.0657 Epoch: 12 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:17:35,355-Speed 4710.09 samples/sec Loss 3.0305 LearningRate 0.0656 Epoch: 12 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:17:44,079-Speed 4695.64 samples/sec Loss 3.0184 LearningRate 0.0656 Epoch: 12 Global Step: 53030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:17:52,819-Speed 4687.06 samples/sec Loss 2.9746 LearningRate 0.0656 Epoch: 12 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:18:01,653-Speed 4637.29 samples/sec Loss 3.0113 LearningRate 0.0655 Epoch: 12 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:18:10,362-Speed 4703.55 samples/sec Loss 2.9765 LearningRate 0.0655 Epoch: 12 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:18:19,168-Speed 4651.82 samples/sec Loss 2.9569 LearningRate 0.0654 Epoch: 12 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:18:27,893-Speed 4695.11 samples/sec Loss 2.9720 LearningRate 0.0654 Epoch: 12 Global Step: 53080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:18:36,552-Speed 4731.04 samples/sec Loss 3.0203 LearningRate 0.0653 Epoch: 12 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:18:44,596-Speed 5092.95 samples/sec Loss 2.9708 LearningRate 0.0653 Epoch: 12 Global Step: 53100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:18:52,675-Speed 5070.22 samples/sec Loss 2.9910 LearningRate 0.0652 Epoch: 12 Global Step: 53110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:19:00,750-Speed 5072.92 samples/sec Loss 3.0082 LearningRate 0.0652 Epoch: 12 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:19:08,855-Speed 5054.11 samples/sec Loss 3.0274 LearningRate 0.0652 Epoch: 12 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:19:16,960-Speed 5054.59 samples/sec Loss 3.0447 LearningRate 0.0651 Epoch: 12 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:19:25,018-Speed 5084.13 samples/sec Loss 2.9245 LearningRate 0.0651 Epoch: 12 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:19:33,065-Speed 5090.51 samples/sec Loss 2.9816 LearningRate 0.0650 Epoch: 12 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:19:41,163-Speed 5059.12 samples/sec Loss 2.9672 LearningRate 0.0650 Epoch: 12 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:19:49,269-Speed 5053.59 samples/sec Loss 3.0355 LearningRate 0.0649 Epoch: 12 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:19:57,327-Speed 5083.62 samples/sec Loss 2.9706 LearningRate 0.0649 Epoch: 12 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:20:05,506-Speed 5008.95 samples/sec Loss 2.9615 LearningRate 0.0649 Epoch: 12 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:20:13,574-Speed 5077.28 samples/sec Loss 2.9483 LearningRate 0.0648 Epoch: 12 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:20:21,681-Speed 5052.97 samples/sec Loss 2.9471 LearningRate 0.0648 Epoch: 12 Global Step: 53220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:20:29,788-Speed 5053.17 samples/sec Loss 2.9797 LearningRate 0.0647 Epoch: 12 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:20:37,948-Speed 5020.45 samples/sec Loss 2.9984 LearningRate 0.0647 Epoch: 12 Global Step: 53240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:20:46,091-Speed 5030.74 samples/sec Loss 2.9821 LearningRate 0.0646 Epoch: 12 Global Step: 53250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:20:54,183-Speed 5062.73 samples/sec Loss 3.0042 LearningRate 0.0646 Epoch: 12 Global Step: 53260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:02,420-Speed 4972.94 samples/sec Loss 2.9848 LearningRate 0.0646 Epoch: 12 Global Step: 53270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:10,551-Speed 5038.13 samples/sec Loss 2.9788 LearningRate 0.0645 Epoch: 12 Global Step: 53280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:18,649-Speed 5058.77 samples/sec Loss 2.9492 LearningRate 0.0645 Epoch: 12 Global Step: 53290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:26,696-Speed 5090.76 samples/sec Loss 2.9350 LearningRate 0.0644 Epoch: 12 Global Step: 53300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:34,768-Speed 5074.87 samples/sec Loss 2.9479 LearningRate 0.0644 Epoch: 12 Global Step: 53310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:42,816-Speed 5090.25 samples/sec Loss 2.9289 LearningRate 0.0643 Epoch: 12 Global Step: 53320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:50,917-Speed 5056.73 samples/sec Loss 2.9353 LearningRate 0.0643 Epoch: 12 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:21:58,988-Speed 5075.99 samples/sec Loss 2.9090 LearningRate 0.0643 Epoch: 12 Global Step: 53340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:22:07,095-Speed 5052.49 samples/sec Loss 2.9372 LearningRate 0.0642 Epoch: 12 Global Step: 53350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:22:15,150-Speed 5086.25 samples/sec Loss 2.9549 LearningRate 0.0642 Epoch: 12 Global Step: 53360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:22:23,195-Speed 5091.99 samples/sec Loss 2.9491 LearningRate 0.0641 Epoch: 12 Global Step: 53370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:22:31,232-Speed 5097.36 samples/sec Loss 2.9255 LearningRate 0.0641 Epoch: 12 Global Step: 53380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:22:39,407-Speed 5011.03 samples/sec Loss 2.9660 LearningRate 0.0640 Epoch: 12 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:22:47,524-Speed 5046.21 samples/sec Loss 2.9609 LearningRate 0.0640 Epoch: 12 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:22:55,630-Speed 5054.28 samples/sec Loss 2.9210 LearningRate 0.0640 Epoch: 12 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:23:03,712-Speed 5068.14 samples/sec Loss 2.9591 LearningRate 0.0639 Epoch: 12 Global Step: 53420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:23:11,800-Speed 5065.29 samples/sec Loss 2.8944 LearningRate 0.0639 Epoch: 12 Global Step: 53430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:23:19,897-Speed 5059.35 samples/sec Loss 2.9586 LearningRate 0.0638 Epoch: 12 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:23:27,952-Speed 5085.55 samples/sec Loss 2.9514 LearningRate 0.0638 Epoch: 12 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:23:36,010-Speed 5083.91 samples/sec Loss 2.9583 LearningRate 0.0638 Epoch: 12 Global Step: 53460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:23:44,045-Speed 5098.54 samples/sec Loss 2.9213 LearningRate 0.0637 Epoch: 12 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:23:52,133-Speed 5064.61 samples/sec Loss 2.9423 LearningRate 0.0637 Epoch: 12 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:24:00,229-Speed 5060.49 samples/sec Loss 2.9745 LearningRate 0.0636 Epoch: 12 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:24:08,233-Speed 5117.99 samples/sec Loss 2.9698 LearningRate 0.0636 Epoch: 12 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:24:16,292-Speed 5082.87 samples/sec Loss 2.9321 LearningRate 0.0635 Epoch: 12 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:24:24,398-Speed 5053.87 samples/sec Loss 2.9429 LearningRate 0.0635 Epoch: 12 Global Step: 53520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:24:32,424-Speed 5104.15 samples/sec Loss 2.9294 LearningRate 0.0635 Epoch: 12 Global Step: 53530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:24:40,452-Speed 5102.71 samples/sec Loss 2.9634 LearningRate 0.0634 Epoch: 12 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:24:48,533-Speed 5069.75 samples/sec Loss 2.9401 LearningRate 0.0634 Epoch: 12 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:24:56,896-Speed 4898.01 samples/sec Loss 2.8804 LearningRate 0.0633 Epoch: 12 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:25:05,211-Speed 4927.07 samples/sec Loss 2.9371 LearningRate 0.0633 Epoch: 12 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:25:13,334-Speed 5042.72 samples/sec Loss 2.9729 LearningRate 0.0632 Epoch: 12 Global Step: 53580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:25:21,425-Speed 5062.97 samples/sec Loss 2.9295 LearningRate 0.0632 Epoch: 12 Global Step: 53590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:25:29,633-Speed 4991.12 samples/sec Loss 2.9526 LearningRate 0.0632 Epoch: 12 Global Step: 53600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:25:37,723-Speed 5063.85 samples/sec Loss 2.9602 LearningRate 0.0631 Epoch: 12 Global Step: 53610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:25:45,778-Speed 5085.31 samples/sec Loss 2.9302 LearningRate 0.0631 Epoch: 12 Global Step: 53620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:25:53,912-Speed 5036.93 samples/sec Loss 2.9101 LearningRate 0.0630 Epoch: 12 Global Step: 53630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:26:01,963-Speed 5088.07 samples/sec Loss 2.9323 LearningRate 0.0630 Epoch: 12 Global Step: 53640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:26:10,047-Speed 5067.01 samples/sec Loss 2.9667 LearningRate 0.0629 Epoch: 12 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:26:18,141-Speed 5061.65 samples/sec Loss 2.9198 LearningRate 0.0629 Epoch: 12 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:26:26,218-Speed 5071.58 samples/sec Loss 2.9053 LearningRate 0.0629 Epoch: 12 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:26:34,391-Speed 5013.07 samples/sec Loss 2.9449 LearningRate 0.0628 Epoch: 12 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:26:42,470-Speed 5070.67 samples/sec Loss 2.9295 LearningRate 0.0628 Epoch: 12 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:26:50,562-Speed 5062.66 samples/sec Loss 2.9296 LearningRate 0.0627 Epoch: 12 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:26:58,659-Speed 5059.29 samples/sec Loss 2.9421 LearningRate 0.0627 Epoch: 12 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:27:06,684-Speed 5104.60 samples/sec Loss 2.9649 LearningRate 0.0627 Epoch: 12 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:27:14,800-Speed 5047.66 samples/sec Loss 2.8906 LearningRate 0.0626 Epoch: 12 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:27:22,847-Speed 5090.27 samples/sec Loss 2.8882 LearningRate 0.0626 Epoch: 12 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:27:30,860-Speed 5113.15 samples/sec Loss 2.8769 LearningRate 0.0625 Epoch: 12 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:27:40,071-Speed 4447.43 samples/sec Loss 2.8817 LearningRate 0.0625 Epoch: 12 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:27:49,023-Speed 4575.76 samples/sec Loss 2.9343 LearningRate 0.0624 Epoch: 12 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:27:57,097-Speed 5073.64 samples/sec Loss 2.9375 LearningRate 0.0624 Epoch: 12 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:28:06,401-Speed 4402.83 samples/sec Loss 2.8719 LearningRate 0.0624 Epoch: 12 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:28:16,344-Speed 4119.87 samples/sec Loss 2.8738 LearningRate 0.0623 Epoch: 12 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:28:24,512-Speed 5015.52 samples/sec Loss 2.9154 LearningRate 0.0623 Epoch: 12 Global Step: 53810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:28:32,529-Speed 5109.90 samples/sec Loss 2.9108 LearningRate 0.0622 Epoch: 12 Global Step: 53820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:28:40,528-Speed 5121.18 samples/sec Loss 2.8753 LearningRate 0.0622 Epoch: 12 Global Step: 53830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:28:48,592-Speed 5080.38 samples/sec Loss 2.9028 LearningRate 0.0621 Epoch: 12 Global Step: 53840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:28:56,607-Speed 5110.53 samples/sec Loss 2.9025 LearningRate 0.0621 Epoch: 12 Global Step: 53850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:29:04,624-Speed 5110.39 samples/sec Loss 2.9088 LearningRate 0.0621 Epoch: 12 Global Step: 53860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:29:12,635-Speed 5113.26 samples/sec Loss 2.9008 LearningRate 0.0620 Epoch: 12 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:29:20,633-Speed 5121.98 samples/sec Loss 2.9414 LearningRate 0.0620 Epoch: 12 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:29:28,670-Speed 5096.97 samples/sec Loss 2.9124 LearningRate 0.0619 Epoch: 12 Global Step: 53890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:29:36,852-Speed 5006.86 samples/sec Loss 2.8769 LearningRate 0.0619 Epoch: 12 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:29:45,033-Speed 5007.45 samples/sec Loss 2.8737 LearningRate 0.0619 Epoch: 12 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:29:53,020-Speed 5129.10 samples/sec Loss 2.8814 LearningRate 0.0618 Epoch: 12 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:30:01,152-Speed 5037.94 samples/sec Loss 2.8818 LearningRate 0.0618 Epoch: 12 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:30:09,366-Speed 4987.18 samples/sec Loss 2.8538 LearningRate 0.0617 Epoch: 12 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:30:17,607-Speed 4970.83 samples/sec Loss 2.9360 LearningRate 0.0617 Epoch: 12 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:30:25,895-Speed 4942.89 samples/sec Loss 2.9189 LearningRate 0.0616 Epoch: 12 Global Step: 53960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 10:30:34,079-Speed 5005.69 samples/sec Loss 2.8906 LearningRate 0.0616 Epoch: 12 Global Step: 53970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:30:42,309-Speed 4977.00 samples/sec Loss 2.8639 LearningRate 0.0616 Epoch: 12 Global Step: 53980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:30:50,793-Speed 4828.85 samples/sec Loss 2.8896 LearningRate 0.0615 Epoch: 12 Global Step: 53990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:30:59,016-Speed 4981.74 samples/sec Loss 2.8375 LearningRate 0.0615 Epoch: 12 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:31:07,282-Speed 4955.66 samples/sec Loss 2.8985 LearningRate 0.0614 Epoch: 12 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:31:15,529-Speed 4967.64 samples/sec Loss 2.8630 LearningRate 0.0614 Epoch: 12 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:31:23,803-Speed 4951.17 samples/sec Loss 2.8512 LearningRate 0.0614 Epoch: 12 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:31:32,149-Speed 4907.79 samples/sec Loss 2.8852 LearningRate 0.0613 Epoch: 12 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:31:40,462-Speed 4928.01 samples/sec Loss 2.9290 LearningRate 0.0613 Epoch: 12 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:31:48,726-Speed 4956.76 samples/sec Loss 2.8770 LearningRate 0.0612 Epoch: 12 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:31:57,113-Speed 4884.80 samples/sec Loss 2.8776 LearningRate 0.0612 Epoch: 12 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:32:05,347-Speed 4975.06 samples/sec Loss 2.8577 LearningRate 0.0611 Epoch: 12 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:32:13,719-Speed 4892.74 samples/sec Loss 2.8569 LearningRate 0.0611 Epoch: 12 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:32:22,063-Speed 4909.71 samples/sec Loss 2.8720 LearningRate 0.0611 Epoch: 12 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:32:32,028-Speed 4110.71 samples/sec Loss 2.8953 LearningRate 0.0610 Epoch: 12 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:32:40,324-Speed 4937.95 samples/sec Loss 2.8809 LearningRate 0.0610 Epoch: 12 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:32:48,646-Speed 4922.42 samples/sec Loss 2.8927 LearningRate 0.0609 Epoch: 12 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:32:57,306-Speed 4730.97 samples/sec Loss 2.8548 LearningRate 0.0609 Epoch: 12 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:33:06,382-Speed 4513.75 samples/sec Loss 2.9100 LearningRate 0.0609 Epoch: 12 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:33:15,088-Speed 4711.58 samples/sec Loss 2.8612 LearningRate 0.0608 Epoch: 12 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:33:23,747-Speed 4731.09 samples/sec Loss 2.8787 LearningRate 0.0608 Epoch: 12 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:33:32,353-Speed 4759.96 samples/sec Loss 2.8649 LearningRate 0.0607 Epoch: 12 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:33:40,874-Speed 4807.19 samples/sec Loss 2.8615 LearningRate 0.0607 Epoch: 12 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:33:49,283-Speed 4871.75 samples/sec Loss 2.8522 LearningRate 0.0606 Epoch: 12 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:33:57,455-Speed 5013.02 samples/sec Loss 2.8498 LearningRate 0.0606 Epoch: 12 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:34:05,625-Speed 5013.96 samples/sec Loss 2.8077 LearningRate 0.0606 Epoch: 12 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:34:13,899-Speed 4951.06 samples/sec Loss 2.8341 LearningRate 0.0605 Epoch: 12 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:34:50,168-Speed 1129.41 samples/sec Loss 2.6447 LearningRate 0.0605 Epoch: 13 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:34:58,363-Speed 4999.18 samples/sec Loss 2.3246 LearningRate 0.0604 Epoch: 13 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:35:06,409-Speed 5091.06 samples/sec Loss 2.3248 LearningRate 0.0604 Epoch: 13 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:35:14,529-Speed 5045.07 samples/sec Loss 2.3179 LearningRate 0.0604 Epoch: 13 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:35:22,616-Speed 5065.37 samples/sec Loss 2.2847 LearningRate 0.0603 Epoch: 13 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:35:30,654-Speed 5096.67 samples/sec Loss 2.3033 LearningRate 0.0603 Epoch: 13 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:35:39,030-Speed 4890.66 samples/sec Loss 2.3365 LearningRate 0.0602 Epoch: 13 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:35:47,346-Speed 4926.01 samples/sec Loss 2.3613 LearningRate 0.0602 Epoch: 13 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:35:55,736-Speed 4882.57 samples/sec Loss 2.3279 LearningRate 0.0601 Epoch: 13 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:36:03,946-Speed 4989.77 samples/sec Loss 2.3379 LearningRate 0.0601 Epoch: 13 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:36:11,998-Speed 5087.12 samples/sec Loss 2.3355 LearningRate 0.0601 Epoch: 13 Global Step: 54340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:36:20,255-Speed 4961.68 samples/sec Loss 2.3317 LearningRate 0.0600 Epoch: 13 Global Step: 54350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:36:28,447-Speed 5001.05 samples/sec Loss 2.3065 LearningRate 0.0600 Epoch: 13 Global Step: 54360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:36:36,694-Speed 4966.98 samples/sec Loss 2.3686 LearningRate 0.0599 Epoch: 13 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:36:44,946-Speed 4964.65 samples/sec Loss 2.3505 LearningRate 0.0599 Epoch: 13 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:36:53,290-Speed 4909.32 samples/sec Loss 2.3391 LearningRate 0.0599 Epoch: 13 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:37:01,513-Speed 4982.35 samples/sec Loss 2.3289 LearningRate 0.0598 Epoch: 13 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:37:09,730-Speed 4985.22 samples/sec Loss 2.3758 LearningRate 0.0598 Epoch: 13 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:37:17,954-Speed 4980.97 samples/sec Loss 2.3884 LearningRate 0.0597 Epoch: 13 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:37:26,144-Speed 5001.53 samples/sec Loss 2.3615 LearningRate 0.0597 Epoch: 13 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:37:34,285-Speed 5032.24 samples/sec Loss 2.3864 LearningRate 0.0597 Epoch: 13 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:37:42,454-Speed 5014.89 samples/sec Loss 2.4364 LearningRate 0.0596 Epoch: 13 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:37:50,484-Speed 5101.25 samples/sec Loss 2.3675 LearningRate 0.0596 Epoch: 13 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:37:58,539-Speed 5086.18 samples/sec Loss 2.3790 LearningRate 0.0595 Epoch: 13 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:38:06,692-Speed 5024.57 samples/sec Loss 2.3661 LearningRate 0.0595 Epoch: 13 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:38:14,812-Speed 5044.83 samples/sec Loss 2.4168 LearningRate 0.0594 Epoch: 13 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:38:22,908-Speed 5060.87 samples/sec Loss 2.4207 LearningRate 0.0594 Epoch: 13 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:38:30,936-Speed 5103.07 samples/sec Loss 2.4246 LearningRate 0.0594 Epoch: 13 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:38:39,061-Speed 5041.30 samples/sec Loss 2.4127 LearningRate 0.0593 Epoch: 13 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:38:47,363-Speed 4934.77 samples/sec Loss 2.4152 LearningRate 0.0593 Epoch: 13 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:38:55,927-Speed 4783.14 samples/sec Loss 2.4318 LearningRate 0.0592 Epoch: 13 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:39:04,122-Speed 4999.25 samples/sec Loss 2.4001 LearningRate 0.0592 Epoch: 13 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:39:12,093-Speed 5138.98 samples/sec Loss 2.4306 LearningRate 0.0592 Epoch: 13 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:39:20,112-Speed 5108.74 samples/sec Loss 2.4443 LearningRate 0.0591 Epoch: 13 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:39:28,276-Speed 5017.96 samples/sec Loss 2.4300 LearningRate 0.0591 Epoch: 13 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:39:36,209-Speed 5163.94 samples/sec Loss 2.4184 LearningRate 0.0590 Epoch: 13 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:39:44,201-Speed 5125.77 samples/sec Loss 2.4364 LearningRate 0.0590 Epoch: 13 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:39:52,151-Speed 5153.02 samples/sec Loss 2.4497 LearningRate 0.0590 Epoch: 13 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:40:00,155-Speed 5118.13 samples/sec Loss 2.4566 LearningRate 0.0589 Epoch: 13 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:40:08,137-Speed 5132.45 samples/sec Loss 2.4363 LearningRate 0.0589 Epoch: 13 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:40:16,300-Speed 5017.97 samples/sec Loss 2.4262 LearningRate 0.0588 Epoch: 13 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:40:24,477-Speed 5009.94 samples/sec Loss 2.4389 LearningRate 0.0588 Epoch: 13 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:40:32,540-Speed 5080.59 samples/sec Loss 2.4886 LearningRate 0.0588 Epoch: 13 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:40:40,510-Speed 5140.05 samples/sec Loss 2.4862 LearningRate 0.0587 Epoch: 13 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:40:48,579-Speed 5077.49 samples/sec Loss 2.4607 LearningRate 0.0587 Epoch: 13 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:40:56,724-Speed 5029.25 samples/sec Loss 2.4513 LearningRate 0.0586 Epoch: 13 Global Step: 54690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:41:04,772-Speed 5090.00 samples/sec Loss 2.4555 LearningRate 0.0586 Epoch: 13 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:41:12,883-Speed 5050.62 samples/sec Loss 2.4498 LearningRate 0.0585 Epoch: 13 Global Step: 54710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:41:21,038-Speed 5023.22 samples/sec Loss 2.4768 LearningRate 0.0585 Epoch: 13 Global Step: 54720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:41:29,043-Speed 5119.72 samples/sec Loss 2.5003 LearningRate 0.0585 Epoch: 13 Global Step: 54730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:41:37,244-Speed 4995.30 samples/sec Loss 2.5053 LearningRate 0.0584 Epoch: 13 Global Step: 54740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:41:45,198-Speed 5150.24 samples/sec Loss 2.4772 LearningRate 0.0584 Epoch: 13 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:41:53,250-Speed 5087.29 samples/sec Loss 2.4760 LearningRate 0.0583 Epoch: 13 Global Step: 54760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:42:01,401-Speed 5026.25 samples/sec Loss 2.4850 LearningRate 0.0583 Epoch: 13 Global Step: 54770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:42:09,478-Speed 5072.50 samples/sec Loss 2.5145 LearningRate 0.0583 Epoch: 13 Global Step: 54780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:42:17,436-Speed 5147.83 samples/sec Loss 2.4756 LearningRate 0.0582 Epoch: 13 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:42:25,427-Speed 5125.77 samples/sec Loss 2.4867 LearningRate 0.0582 Epoch: 13 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:42:33,504-Speed 5071.97 samples/sec Loss 2.5071 LearningRate 0.0581 Epoch: 13 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:42:41,499-Speed 5124.25 samples/sec Loss 2.5329 LearningRate 0.0581 Epoch: 13 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:42:49,469-Speed 5139.99 samples/sec Loss 2.5279 LearningRate 0.0581 Epoch: 13 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:42:57,522-Speed 5087.16 samples/sec Loss 2.5160 LearningRate 0.0580 Epoch: 13 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:43:05,491-Speed 5140.86 samples/sec Loss 2.5085 LearningRate 0.0580 Epoch: 13 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:43:13,592-Speed 5056.66 samples/sec Loss 2.5163 LearningRate 0.0579 Epoch: 13 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:43:21,608-Speed 5110.76 samples/sec Loss 2.5524 LearningRate 0.0579 Epoch: 13 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:43:29,599-Speed 5126.62 samples/sec Loss 2.5539 LearningRate 0.0579 Epoch: 13 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:43:37,643-Speed 5092.22 samples/sec Loss 2.5184 LearningRate 0.0578 Epoch: 13 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:43:45,657-Speed 5112.26 samples/sec Loss 2.5542 LearningRate 0.0578 Epoch: 13 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:43:53,654-Speed 5122.20 samples/sec Loss 2.5568 LearningRate 0.0577 Epoch: 13 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:44:01,724-Speed 5075.99 samples/sec Loss 2.5639 LearningRate 0.0577 Epoch: 13 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:44:09,716-Speed 5126.29 samples/sec Loss 2.5395 LearningRate 0.0577 Epoch: 13 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:44:17,756-Speed 5095.14 samples/sec Loss 2.5528 LearningRate 0.0576 Epoch: 13 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:44:25,719-Speed 5144.43 samples/sec Loss 2.5068 LearningRate 0.0576 Epoch: 13 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:44:33,781-Speed 5081.12 samples/sec Loss 2.5342 LearningRate 0.0575 Epoch: 13 Global Step: 54960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:44:41,869-Speed 5065.58 samples/sec Loss 2.5388 LearningRate 0.0575 Epoch: 13 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:44:49,961-Speed 5061.84 samples/sec Loss 2.5675 LearningRate 0.0575 Epoch: 13 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:44:58,236-Speed 4950.58 samples/sec Loss 2.4952 LearningRate 0.0574 Epoch: 13 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:45:06,332-Speed 5060.17 samples/sec Loss 2.5751 LearningRate 0.0574 Epoch: 13 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:45:52,996-[lfw][55000]XNorm: 22.803932 Training: 2022-01-17 10:45:52,996-[lfw][55000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-01-17 10:45:52,997-[lfw][55000]Accuracy-Highest: 0.99817 Training: 2022-01-17 10:46:47,195-[cfp_fp][55000]XNorm: 20.987495 Training: 2022-01-17 10:46:47,196-[cfp_fp][55000]Accuracy-Flip: 0.98843+-0.00544 Training: 2022-01-17 10:46:47,197-[cfp_fp][55000]Accuracy-Highest: 0.98843 Training: 2022-01-17 10:47:33,688-[agedb_30][55000]XNorm: 22.470503 Training: 2022-01-17 10:47:33,689-[agedb_30][55000]Accuracy-Flip: 0.98267+-0.00680 Training: 2022-01-17 10:47:33,690-[agedb_30][55000]Accuracy-Highest: 0.98267 Training: 2022-01-17 10:47:41,697-Speed 263.64 samples/sec Loss 2.5728 LearningRate 0.0573 Epoch: 13 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:47:49,710-Speed 5112.46 samples/sec Loss 2.5587 LearningRate 0.0573 Epoch: 13 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:47:57,748-Speed 5096.65 samples/sec Loss 2.5857 LearningRate 0.0572 Epoch: 13 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:48:05,764-Speed 5110.48 samples/sec Loss 2.5547 LearningRate 0.0572 Epoch: 13 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:48:13,875-Speed 5050.28 samples/sec Loss 2.5880 LearningRate 0.0572 Epoch: 13 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:48:21,924-Speed 5089.56 samples/sec Loss 2.5743 LearningRate 0.0571 Epoch: 13 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:48:29,991-Speed 5078.53 samples/sec Loss 2.5595 LearningRate 0.0571 Epoch: 13 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:48:38,205-Speed 4987.13 samples/sec Loss 2.6045 LearningRate 0.0570 Epoch: 13 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:48:46,373-Speed 5015.10 samples/sec Loss 2.5922 LearningRate 0.0570 Epoch: 13 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:48:54,422-Speed 5090.13 samples/sec Loss 2.5959 LearningRate 0.0570 Epoch: 13 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:49:02,464-Speed 5094.03 samples/sec Loss 2.6016 LearningRate 0.0569 Epoch: 13 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:49:10,519-Speed 5085.33 samples/sec Loss 2.5666 LearningRate 0.0569 Epoch: 13 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:49:18,579-Speed 5082.96 samples/sec Loss 2.5900 LearningRate 0.0568 Epoch: 13 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:49:26,659-Speed 5069.53 samples/sec Loss 2.5926 LearningRate 0.0568 Epoch: 13 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:49:34,725-Speed 5079.13 samples/sec Loss 2.5783 LearningRate 0.0568 Epoch: 13 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:49:42,742-Speed 5109.76 samples/sec Loss 2.5693 LearningRate 0.0567 Epoch: 13 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:49:50,780-Speed 5096.00 samples/sec Loss 2.6161 LearningRate 0.0567 Epoch: 13 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:49:58,860-Speed 5070.45 samples/sec Loss 2.5537 LearningRate 0.0566 Epoch: 13 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:50:07,129-Speed 4953.90 samples/sec Loss 2.5955 LearningRate 0.0566 Epoch: 13 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:50:15,169-Speed 5095.31 samples/sec Loss 2.5843 LearningRate 0.0566 Epoch: 13 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:50:23,270-Speed 5057.79 samples/sec Loss 2.5797 LearningRate 0.0565 Epoch: 13 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:50:31,307-Speed 5096.68 samples/sec Loss 2.6009 LearningRate 0.0565 Epoch: 13 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:50:39,478-Speed 5013.73 samples/sec Loss 2.5971 LearningRate 0.0564 Epoch: 13 Global Step: 55230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:50:47,656-Speed 5009.26 samples/sec Loss 2.5925 LearningRate 0.0564 Epoch: 13 Global Step: 55240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:50:55,735-Speed 5069.98 samples/sec Loss 2.6086 LearningRate 0.0564 Epoch: 13 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:51:03,819-Speed 5068.06 samples/sec Loss 2.6007 LearningRate 0.0563 Epoch: 13 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:51:12,014-Speed 4998.65 samples/sec Loss 2.6253 LearningRate 0.0563 Epoch: 13 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:51:20,060-Speed 5091.54 samples/sec Loss 2.6211 LearningRate 0.0562 Epoch: 13 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:51:28,084-Speed 5105.25 samples/sec Loss 2.6013 LearningRate 0.0562 Epoch: 13 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:51:36,131-Speed 5091.06 samples/sec Loss 2.6106 LearningRate 0.0562 Epoch: 13 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:51:44,372-Speed 4970.80 samples/sec Loss 2.6067 LearningRate 0.0561 Epoch: 13 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:51:52,419-Speed 5090.87 samples/sec Loss 2.6224 LearningRate 0.0561 Epoch: 13 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:00,480-Speed 5081.92 samples/sec Loss 2.5918 LearningRate 0.0560 Epoch: 13 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:08,662-Speed 5006.55 samples/sec Loss 2.5959 LearningRate 0.0560 Epoch: 13 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:16,757-Speed 5060.60 samples/sec Loss 2.5759 LearningRate 0.0560 Epoch: 13 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:25,184-Speed 4860.98 samples/sec Loss 2.6241 LearningRate 0.0559 Epoch: 13 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:33,340-Speed 5022.59 samples/sec Loss 2.5978 LearningRate 0.0559 Epoch: 13 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:41,556-Speed 4986.27 samples/sec Loss 2.6123 LearningRate 0.0558 Epoch: 13 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:49,590-Speed 5099.13 samples/sec Loss 2.6410 LearningRate 0.0558 Epoch: 13 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:52:57,613-Speed 5105.57 samples/sec Loss 2.6248 LearningRate 0.0558 Epoch: 13 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:53:05,699-Speed 5066.82 samples/sec Loss 2.5950 LearningRate 0.0557 Epoch: 13 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:53:13,742-Speed 5093.52 samples/sec Loss 2.5847 LearningRate 0.0557 Epoch: 13 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:53:21,822-Speed 5070.03 samples/sec Loss 2.5844 LearningRate 0.0556 Epoch: 13 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:53:29,957-Speed 5035.77 samples/sec Loss 2.5952 LearningRate 0.0556 Epoch: 13 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:53:37,987-Speed 5101.39 samples/sec Loss 2.6035 LearningRate 0.0556 Epoch: 13 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:53:46,122-Speed 5035.80 samples/sec Loss 2.6276 LearningRate 0.0555 Epoch: 13 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:53:54,200-Speed 5071.15 samples/sec Loss 2.6309 LearningRate 0.0555 Epoch: 13 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:02,294-Speed 5061.18 samples/sec Loss 2.6121 LearningRate 0.0554 Epoch: 13 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:10,479-Speed 5005.29 samples/sec Loss 2.6537 LearningRate 0.0554 Epoch: 13 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:18,548-Speed 5077.08 samples/sec Loss 2.5994 LearningRate 0.0554 Epoch: 13 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:26,759-Speed 4988.61 samples/sec Loss 2.6122 LearningRate 0.0553 Epoch: 13 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:34,895-Speed 5035.72 samples/sec Loss 2.5984 LearningRate 0.0553 Epoch: 13 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:42,930-Speed 5098.76 samples/sec Loss 2.6300 LearningRate 0.0553 Epoch: 13 Global Step: 55530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:51,290-Speed 4900.54 samples/sec Loss 2.6014 LearningRate 0.0552 Epoch: 13 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:54:59,318-Speed 5102.42 samples/sec Loss 2.6017 LearningRate 0.0552 Epoch: 13 Global Step: 55550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:55:07,365-Speed 5090.97 samples/sec Loss 2.5691 LearningRate 0.0551 Epoch: 13 Global Step: 55560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:55:15,509-Speed 5029.97 samples/sec Loss 2.6006 LearningRate 0.0551 Epoch: 13 Global Step: 55570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:55:23,708-Speed 4996.53 samples/sec Loss 2.5693 LearningRate 0.0551 Epoch: 13 Global Step: 55580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:55:31,843-Speed 5035.82 samples/sec Loss 2.5991 LearningRate 0.0550 Epoch: 13 Global Step: 55590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:55:39,954-Speed 5050.34 samples/sec Loss 2.6133 LearningRate 0.0550 Epoch: 13 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:55:48,122-Speed 5015.28 samples/sec Loss 2.6614 LearningRate 0.0549 Epoch: 13 Global Step: 55610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:55:56,188-Speed 5079.05 samples/sec Loss 2.6388 LearningRate 0.0549 Epoch: 13 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:56:04,325-Speed 5034.04 samples/sec Loss 2.6077 LearningRate 0.0549 Epoch: 13 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:56:12,403-Speed 5071.87 samples/sec Loss 2.6265 LearningRate 0.0548 Epoch: 13 Global Step: 55640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:56:20,604-Speed 4995.13 samples/sec Loss 2.5855 LearningRate 0.0548 Epoch: 13 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:56:28,697-Speed 5061.92 samples/sec Loss 2.6647 LearningRate 0.0547 Epoch: 13 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:56:36,763-Speed 5078.84 samples/sec Loss 2.6581 LearningRate 0.0547 Epoch: 13 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:56:44,892-Speed 5039.62 samples/sec Loss 2.6578 LearningRate 0.0547 Epoch: 13 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:56:53,025-Speed 5037.33 samples/sec Loss 2.6281 LearningRate 0.0546 Epoch: 13 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:57:01,074-Speed 5089.19 samples/sec Loss 2.6039 LearningRate 0.0546 Epoch: 13 Global Step: 55700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:57:09,038-Speed 5143.87 samples/sec Loss 2.6100 LearningRate 0.0545 Epoch: 13 Global Step: 55710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:57:17,083-Speed 5092.41 samples/sec Loss 2.6370 LearningRate 0.0545 Epoch: 13 Global Step: 55720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:57:25,094-Speed 5113.79 samples/sec Loss 2.6141 LearningRate 0.0545 Epoch: 13 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:57:33,204-Speed 5050.79 samples/sec Loss 2.6626 LearningRate 0.0544 Epoch: 13 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:57:41,385-Speed 5007.36 samples/sec Loss 2.6541 LearningRate 0.0544 Epoch: 13 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:57:49,440-Speed 5085.82 samples/sec Loss 2.6331 LearningRate 0.0543 Epoch: 13 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:57:57,714-Speed 4951.38 samples/sec Loss 2.6267 LearningRate 0.0543 Epoch: 13 Global Step: 55770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:58:06,121-Speed 4872.49 samples/sec Loss 2.6394 LearningRate 0.0543 Epoch: 13 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:58:14,172-Speed 5088.46 samples/sec Loss 2.6067 LearningRate 0.0542 Epoch: 13 Global Step: 55790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:58:22,150-Speed 5134.22 samples/sec Loss 2.6164 LearningRate 0.0542 Epoch: 13 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:58:30,157-Speed 5116.47 samples/sec Loss 2.5610 LearningRate 0.0541 Epoch: 13 Global Step: 55810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:58:38,175-Speed 5109.26 samples/sec Loss 2.6219 LearningRate 0.0541 Epoch: 13 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:58:46,401-Speed 4979.89 samples/sec Loss 2.5957 LearningRate 0.0541 Epoch: 13 Global Step: 55830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:58:54,589-Speed 5003.23 samples/sec Loss 2.6228 LearningRate 0.0540 Epoch: 13 Global Step: 55840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 10:59:02,639-Speed 5088.66 samples/sec Loss 2.5532 LearningRate 0.0540 Epoch: 13 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:59:10,672-Speed 5099.67 samples/sec Loss 2.6206 LearningRate 0.0540 Epoch: 13 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:59:18,708-Speed 5097.86 samples/sec Loss 2.5952 LearningRate 0.0539 Epoch: 13 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:59:26,832-Speed 5042.41 samples/sec Loss 2.5884 LearningRate 0.0539 Epoch: 13 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:59:34,828-Speed 5123.58 samples/sec Loss 2.5826 LearningRate 0.0538 Epoch: 13 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:59:42,930-Speed 5055.60 samples/sec Loss 2.5924 LearningRate 0.0538 Epoch: 13 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:59:51,352-Speed 4863.93 samples/sec Loss 2.6222 LearningRate 0.0538 Epoch: 13 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 10:59:59,473-Speed 5044.75 samples/sec Loss 2.6289 LearningRate 0.0537 Epoch: 13 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:00:07,508-Speed 5098.13 samples/sec Loss 2.6196 LearningRate 0.0537 Epoch: 13 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:00:15,615-Speed 5053.52 samples/sec Loss 2.6165 LearningRate 0.0536 Epoch: 13 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:00:23,630-Speed 5111.07 samples/sec Loss 2.6129 LearningRate 0.0536 Epoch: 13 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:00:31,692-Speed 5080.85 samples/sec Loss 2.6162 LearningRate 0.0536 Epoch: 13 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:00:39,724-Speed 5100.35 samples/sec Loss 2.6256 LearningRate 0.0535 Epoch: 13 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:00:47,707-Speed 5131.72 samples/sec Loss 2.6271 LearningRate 0.0535 Epoch: 13 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:00:55,695-Speed 5128.33 samples/sec Loss 2.6031 LearningRate 0.0534 Epoch: 13 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:01:03,711-Speed 5110.43 samples/sec Loss 2.6444 LearningRate 0.0534 Epoch: 13 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:01:11,826-Speed 5048.01 samples/sec Loss 2.6224 LearningRate 0.0534 Epoch: 13 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:01:20,033-Speed 4991.67 samples/sec Loss 2.6545 LearningRate 0.0533 Epoch: 13 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:01:28,055-Speed 5106.45 samples/sec Loss 2.6250 LearningRate 0.0533 Epoch: 13 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:01:36,095-Speed 5095.33 samples/sec Loss 2.6004 LearningRate 0.0533 Epoch: 13 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:01:44,450-Speed 4903.12 samples/sec Loss 2.6270 LearningRate 0.0532 Epoch: 13 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:01:52,416-Speed 5142.64 samples/sec Loss 2.5897 LearningRate 0.0532 Epoch: 13 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:02:00,483-Speed 5077.87 samples/sec Loss 2.5929 LearningRate 0.0531 Epoch: 13 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:02:08,680-Speed 4997.89 samples/sec Loss 2.5948 LearningRate 0.0531 Epoch: 13 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:02:17,148-Speed 4837.23 samples/sec Loss 2.5824 LearningRate 0.0531 Epoch: 13 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:02:25,434-Speed 4944.02 samples/sec Loss 2.5847 LearningRate 0.0530 Epoch: 13 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-17 11:02:33,445-Speed 5113.79 samples/sec Loss 2.6079 LearningRate 0.0530 Epoch: 13 Global Step: 56110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:02:41,533-Speed 5065.14 samples/sec Loss 2.6277 LearningRate 0.0529 Epoch: 13 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:02:49,573-Speed 5095.24 samples/sec Loss 2.6172 LearningRate 0.0529 Epoch: 13 Global Step: 56130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:02:57,584-Speed 5113.22 samples/sec Loss 2.6068 LearningRate 0.0529 Epoch: 13 Global Step: 56140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:03:05,686-Speed 5056.65 samples/sec Loss 2.6387 LearningRate 0.0528 Epoch: 13 Global Step: 56150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:03:13,710-Speed 5104.96 samples/sec Loss 2.6267 LearningRate 0.0528 Epoch: 13 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:03:21,966-Speed 4962.19 samples/sec Loss 2.6058 LearningRate 0.0527 Epoch: 13 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:03:30,063-Speed 5059.45 samples/sec Loss 2.5905 LearningRate 0.0527 Epoch: 13 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:03:38,086-Speed 5106.26 samples/sec Loss 2.6034 LearningRate 0.0527 Epoch: 13 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:03:46,099-Speed 5111.94 samples/sec Loss 2.5971 LearningRate 0.0526 Epoch: 13 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:03:54,133-Speed 5099.24 samples/sec Loss 2.6183 LearningRate 0.0526 Epoch: 13 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:04:02,208-Speed 5073.09 samples/sec Loss 2.5941 LearningRate 0.0526 Epoch: 13 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:04:10,354-Speed 5028.91 samples/sec Loss 2.6043 LearningRate 0.0525 Epoch: 13 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:04:18,378-Speed 5105.67 samples/sec Loss 2.5934 LearningRate 0.0525 Epoch: 13 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:04:26,551-Speed 5011.89 samples/sec Loss 2.5947 LearningRate 0.0524 Epoch: 13 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 11:04:34,608-Speed 5084.57 samples/sec Loss 2.6115 LearningRate 0.0524 Epoch: 13 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:04:42,563-Speed 5149.59 samples/sec Loss 2.6366 LearningRate 0.0524 Epoch: 13 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:04:50,771-Speed 4991.03 samples/sec Loss 2.6199 LearningRate 0.0523 Epoch: 13 Global Step: 56280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:04:58,831-Speed 5082.45 samples/sec Loss 2.6398 LearningRate 0.0523 Epoch: 13 Global Step: 56290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:05:07,013-Speed 5006.79 samples/sec Loss 2.5888 LearningRate 0.0522 Epoch: 13 Global Step: 56300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:05:15,316-Speed 4933.62 samples/sec Loss 2.5935 LearningRate 0.0522 Epoch: 13 Global Step: 56310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:05:23,580-Speed 4957.01 samples/sec Loss 2.6095 LearningRate 0.0522 Epoch: 13 Global Step: 56320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:05:31,866-Speed 4944.61 samples/sec Loss 2.5961 LearningRate 0.0521 Epoch: 13 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:05:39,979-Speed 5049.39 samples/sec Loss 2.5642 LearningRate 0.0521 Epoch: 13 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:05:48,022-Speed 5093.38 samples/sec Loss 2.6265 LearningRate 0.0521 Epoch: 13 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:05:56,116-Speed 5061.19 samples/sec Loss 2.5700 LearningRate 0.0520 Epoch: 13 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:06:04,205-Speed 5064.07 samples/sec Loss 2.6103 LearningRate 0.0520 Epoch: 13 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:06:12,245-Speed 5095.40 samples/sec Loss 2.5641 LearningRate 0.0519 Epoch: 13 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:06:20,292-Speed 5090.91 samples/sec Loss 2.5702 LearningRate 0.0519 Epoch: 13 Global Step: 56390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:06:28,440-Speed 5027.44 samples/sec Loss 2.5941 LearningRate 0.0519 Epoch: 13 Global Step: 56400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:06:36,814-Speed 4891.92 samples/sec Loss 2.5460 LearningRate 0.0518 Epoch: 13 Global Step: 56410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:06:44,930-Speed 5048.09 samples/sec Loss 2.5877 LearningRate 0.0518 Epoch: 13 Global Step: 56420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:06:52,994-Speed 5079.52 samples/sec Loss 2.5862 LearningRate 0.0517 Epoch: 13 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:07:00,989-Speed 5124.25 samples/sec Loss 2.5798 LearningRate 0.0517 Epoch: 13 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:07:09,210-Speed 4983.19 samples/sec Loss 2.6207 LearningRate 0.0517 Epoch: 13 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:07:17,396-Speed 5004.23 samples/sec Loss 2.5920 LearningRate 0.0516 Epoch: 13 Global Step: 56460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:07:25,461-Speed 5079.48 samples/sec Loss 2.5883 LearningRate 0.0516 Epoch: 13 Global Step: 56470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:07:33,565-Speed 5055.10 samples/sec Loss 2.5917 LearningRate 0.0516 Epoch: 13 Global Step: 56480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:07:41,697-Speed 5037.50 samples/sec Loss 2.5940 LearningRate 0.0515 Epoch: 13 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:07:49,737-Speed 5095.15 samples/sec Loss 2.5962 LearningRate 0.0515 Epoch: 13 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:07:57,739-Speed 5119.55 samples/sec Loss 2.5813 LearningRate 0.0514 Epoch: 13 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:08:06,029-Speed 4941.41 samples/sec Loss 2.5602 LearningRate 0.0514 Epoch: 13 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:08:14,352-Speed 4922.04 samples/sec Loss 2.5630 LearningRate 0.0514 Epoch: 13 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:08:22,446-Speed 5061.09 samples/sec Loss 2.5780 LearningRate 0.0513 Epoch: 13 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:08:30,545-Speed 5058.03 samples/sec Loss 2.5663 LearningRate 0.0513 Epoch: 13 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:08:38,742-Speed 4997.96 samples/sec Loss 2.6098 LearningRate 0.0512 Epoch: 13 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:08:46,972-Speed 4977.34 samples/sec Loss 2.6118 LearningRate 0.0512 Epoch: 13 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:08:55,003-Speed 5100.97 samples/sec Loss 2.5872 LearningRate 0.0512 Epoch: 13 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:09:03,054-Speed 5088.49 samples/sec Loss 2.5802 LearningRate 0.0511 Epoch: 13 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:09:11,072-Speed 5109.04 samples/sec Loss 2.5830 LearningRate 0.0511 Epoch: 13 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:09:19,103-Speed 5101.03 samples/sec Loss 2.5462 LearningRate 0.0511 Epoch: 13 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:09:27,360-Speed 4961.06 samples/sec Loss 2.5522 LearningRate 0.0510 Epoch: 13 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:09:35,782-Speed 4864.00 samples/sec Loss 2.5578 LearningRate 0.0510 Epoch: 13 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:09:44,162-Speed 4888.33 samples/sec Loss 2.5757 LearningRate 0.0509 Epoch: 13 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:09:52,447-Speed 4944.82 samples/sec Loss 2.5820 LearningRate 0.0509 Epoch: 13 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:00,587-Speed 5032.56 samples/sec Loss 2.5939 LearningRate 0.0509 Epoch: 13 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:08,817-Speed 4977.58 samples/sec Loss 2.5898 LearningRate 0.0508 Epoch: 13 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:16,926-Speed 5051.61 samples/sec Loss 2.5414 LearningRate 0.0508 Epoch: 13 Global Step: 56680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:25,008-Speed 5069.25 samples/sec Loss 2.5923 LearningRate 0.0508 Epoch: 13 Global Step: 56690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:33,175-Speed 5015.60 samples/sec Loss 2.6082 LearningRate 0.0507 Epoch: 13 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:41,235-Speed 5082.53 samples/sec Loss 2.5555 LearningRate 0.0507 Epoch: 13 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:49,313-Speed 5071.36 samples/sec Loss 2.5815 LearningRate 0.0506 Epoch: 13 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:10:57,499-Speed 5004.49 samples/sec Loss 2.5873 LearningRate 0.0506 Epoch: 13 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:11:05,535-Speed 5097.86 samples/sec Loss 2.5823 LearningRate 0.0506 Epoch: 13 Global Step: 56740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:11:13,554-Speed 5108.39 samples/sec Loss 2.5495 LearningRate 0.0505 Epoch: 13 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:11:21,547-Speed 5125.65 samples/sec Loss 2.5739 LearningRate 0.0505 Epoch: 13 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:11:29,549-Speed 5119.32 samples/sec Loss 2.5225 LearningRate 0.0505 Epoch: 13 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:11:37,561-Speed 5112.83 samples/sec Loss 2.5568 LearningRate 0.0504 Epoch: 13 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:11:45,528-Speed 5142.55 samples/sec Loss 2.5751 LearningRate 0.0504 Epoch: 13 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:11:53,531-Speed 5118.72 samples/sec Loss 2.5719 LearningRate 0.0503 Epoch: 13 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:12:01,601-Speed 5076.05 samples/sec Loss 2.5778 LearningRate 0.0503 Epoch: 13 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:12:09,685-Speed 5067.56 samples/sec Loss 2.5778 LearningRate 0.0503 Epoch: 13 Global Step: 56820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:12:17,743-Speed 5083.45 samples/sec Loss 2.5429 LearningRate 0.0502 Epoch: 13 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:12:25,828-Speed 5067.11 samples/sec Loss 2.5311 LearningRate 0.0502 Epoch: 13 Global Step: 56840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:12:33,885-Speed 5084.84 samples/sec Loss 2.5699 LearningRate 0.0501 Epoch: 13 Global Step: 56850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:12:42,020-Speed 5035.42 samples/sec Loss 2.5377 LearningRate 0.0501 Epoch: 13 Global Step: 56860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:12:50,108-Speed 5064.58 samples/sec Loss 2.5240 LearningRate 0.0501 Epoch: 13 Global Step: 56870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:12:58,143-Speed 5099.34 samples/sec Loss 2.5502 LearningRate 0.0500 Epoch: 13 Global Step: 56880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:13:06,153-Speed 5114.56 samples/sec Loss 2.5800 LearningRate 0.0500 Epoch: 13 Global Step: 56890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:13:14,531-Speed 4889.42 samples/sec Loss 2.5725 LearningRate 0.0500 Epoch: 13 Global Step: 56900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:13:22,584-Speed 5087.71 samples/sec Loss 2.5548 LearningRate 0.0499 Epoch: 13 Global Step: 56910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:13:30,620-Speed 5097.59 samples/sec Loss 2.5873 LearningRate 0.0499 Epoch: 13 Global Step: 56920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:13:38,702-Speed 5069.26 samples/sec Loss 2.5579 LearningRate 0.0498 Epoch: 13 Global Step: 56930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:13:46,758-Speed 5084.46 samples/sec Loss 2.5706 LearningRate 0.0498 Epoch: 13 Global Step: 56940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:13:54,899-Speed 5032.28 samples/sec Loss 2.5527 LearningRate 0.0498 Epoch: 13 Global Step: 56950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:14:03,273-Speed 4892.33 samples/sec Loss 2.5362 LearningRate 0.0497 Epoch: 13 Global Step: 56960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:14:11,384-Speed 5050.54 samples/sec Loss 2.5396 LearningRate 0.0497 Epoch: 13 Global Step: 56970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:14:19,453-Speed 5076.60 samples/sec Loss 2.5298 LearningRate 0.0497 Epoch: 13 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:14:27,484-Speed 5100.63 samples/sec Loss 2.5266 LearningRate 0.0496 Epoch: 13 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:14:35,636-Speed 5025.00 samples/sec Loss 2.5452 LearningRate 0.0496 Epoch: 13 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:14:43,757-Speed 5045.01 samples/sec Loss 2.5757 LearningRate 0.0495 Epoch: 13 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:14:51,909-Speed 5025.26 samples/sec Loss 2.5673 LearningRate 0.0495 Epoch: 13 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:00,036-Speed 5040.39 samples/sec Loss 2.5696 LearningRate 0.0495 Epoch: 13 Global Step: 57030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:08,060-Speed 5105.66 samples/sec Loss 2.5570 LearningRate 0.0494 Epoch: 13 Global Step: 57040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:16,220-Speed 5020.24 samples/sec Loss 2.5848 LearningRate 0.0494 Epoch: 13 Global Step: 57050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:24,933-Speed 4701.14 samples/sec Loss 2.5592 LearningRate 0.0494 Epoch: 13 Global Step: 57060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:33,614-Speed 4718.96 samples/sec Loss 2.4877 LearningRate 0.0493 Epoch: 13 Global Step: 57070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:42,217-Speed 4762.55 samples/sec Loss 2.5427 LearningRate 0.0493 Epoch: 13 Global Step: 57080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:51,007-Speed 4659.96 samples/sec Loss 2.5084 LearningRate 0.0492 Epoch: 13 Global Step: 57090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:15:59,803-Speed 4657.31 samples/sec Loss 2.5256 LearningRate 0.0492 Epoch: 13 Global Step: 57100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:16:08,209-Speed 4873.60 samples/sec Loss 2.5161 LearningRate 0.0492 Epoch: 13 Global Step: 57110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:16:16,420-Speed 4988.98 samples/sec Loss 2.5225 LearningRate 0.0491 Epoch: 13 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:16:24,754-Speed 4915.17 samples/sec Loss 2.5302 LearningRate 0.0491 Epoch: 13 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:16:33,124-Speed 4894.70 samples/sec Loss 2.5185 LearningRate 0.0491 Epoch: 13 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:16:41,278-Speed 5024.21 samples/sec Loss 2.5303 LearningRate 0.0490 Epoch: 13 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:16:49,509-Speed 4976.62 samples/sec Loss 2.5082 LearningRate 0.0490 Epoch: 13 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:16:57,602-Speed 5061.69 samples/sec Loss 2.5196 LearningRate 0.0489 Epoch: 13 Global Step: 57170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:17:05,672-Speed 5076.67 samples/sec Loss 2.5495 LearningRate 0.0489 Epoch: 13 Global Step: 57180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:17:13,885-Speed 4987.87 samples/sec Loss 2.5232 LearningRate 0.0489 Epoch: 13 Global Step: 57190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:17:21,982-Speed 5058.95 samples/sec Loss 2.5164 LearningRate 0.0488 Epoch: 13 Global Step: 57200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:17:30,037-Speed 5085.59 samples/sec Loss 2.5272 LearningRate 0.0488 Epoch: 13 Global Step: 57210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:17:38,217-Speed 5008.33 samples/sec Loss 2.4856 LearningRate 0.0488 Epoch: 13 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:17:46,311-Speed 5061.21 samples/sec Loss 2.5241 LearningRate 0.0487 Epoch: 13 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:17:54,584-Speed 4951.65 samples/sec Loss 2.5113 LearningRate 0.0487 Epoch: 13 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:18:02,670-Speed 5066.63 samples/sec Loss 2.5129 LearningRate 0.0487 Epoch: 13 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:18:10,844-Speed 5011.48 samples/sec Loss 2.5489 LearningRate 0.0486 Epoch: 13 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:18:18,881-Speed 5097.38 samples/sec Loss 2.5435 LearningRate 0.0486 Epoch: 13 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:18:27,007-Speed 5041.20 samples/sec Loss 2.5252 LearningRate 0.0485 Epoch: 13 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:18:35,081-Speed 5073.28 samples/sec Loss 2.5254 LearningRate 0.0485 Epoch: 13 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:18:43,727-Speed 4737.84 samples/sec Loss 2.4914 LearningRate 0.0485 Epoch: 13 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:18:52,461-Speed 4690.42 samples/sec Loss 2.5429 LearningRate 0.0484 Epoch: 13 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:01,193-Speed 4691.32 samples/sec Loss 2.4948 LearningRate 0.0484 Epoch: 13 Global Step: 57320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:09,629-Speed 4855.83 samples/sec Loss 2.5430 LearningRate 0.0484 Epoch: 13 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:17,808-Speed 5008.61 samples/sec Loss 2.4841 LearningRate 0.0483 Epoch: 13 Global Step: 57340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:25,894-Speed 5066.35 samples/sec Loss 2.5109 LearningRate 0.0483 Epoch: 13 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:33,948-Speed 5085.96 samples/sec Loss 2.5093 LearningRate 0.0482 Epoch: 13 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:41,950-Speed 5119.74 samples/sec Loss 2.5308 LearningRate 0.0482 Epoch: 13 Global Step: 57370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:50,134-Speed 5005.35 samples/sec Loss 2.5373 LearningRate 0.0482 Epoch: 13 Global Step: 57380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:19:58,149-Speed 5111.60 samples/sec Loss 2.5184 LearningRate 0.0481 Epoch: 13 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:20:06,472-Speed 4921.63 samples/sec Loss 2.4834 LearningRate 0.0481 Epoch: 13 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:20:14,534-Speed 5081.18 samples/sec Loss 2.5196 LearningRate 0.0481 Epoch: 13 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:20:22,551-Speed 5109.90 samples/sec Loss 2.5073 LearningRate 0.0480 Epoch: 13 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:20:30,717-Speed 5016.18 samples/sec Loss 2.5171 LearningRate 0.0480 Epoch: 13 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:20:38,791-Speed 5074.54 samples/sec Loss 2.5175 LearningRate 0.0479 Epoch: 13 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:20:47,049-Speed 4960.35 samples/sec Loss 2.4668 LearningRate 0.0479 Epoch: 13 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:20:55,110-Speed 5081.65 samples/sec Loss 2.4797 LearningRate 0.0479 Epoch: 13 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:03,100-Speed 5127.25 samples/sec Loss 2.4826 LearningRate 0.0478 Epoch: 13 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:11,221-Speed 5044.68 samples/sec Loss 2.4718 LearningRate 0.0478 Epoch: 13 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:19,366-Speed 5029.33 samples/sec Loss 2.5032 LearningRate 0.0478 Epoch: 13 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:27,432-Speed 5078.74 samples/sec Loss 2.4940 LearningRate 0.0477 Epoch: 13 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:35,416-Speed 5130.80 samples/sec Loss 2.4535 LearningRate 0.0477 Epoch: 13 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:43,471-Speed 5086.25 samples/sec Loss 2.4765 LearningRate 0.0477 Epoch: 13 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:51,558-Speed 5065.16 samples/sec Loss 2.4895 LearningRate 0.0476 Epoch: 13 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:21:59,625-Speed 5078.35 samples/sec Loss 2.5164 LearningRate 0.0476 Epoch: 13 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:22:08,004-Speed 4888.85 samples/sec Loss 2.4584 LearningRate 0.0475 Epoch: 13 Global Step: 57550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:22:16,155-Speed 5025.97 samples/sec Loss 2.5201 LearningRate 0.0475 Epoch: 13 Global Step: 57560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:22:24,220-Speed 5079.21 samples/sec Loss 2.4691 LearningRate 0.0475 Epoch: 13 Global Step: 57570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:22:32,308-Speed 5065.15 samples/sec Loss 2.5015 LearningRate 0.0474 Epoch: 13 Global Step: 57580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:22:40,403-Speed 5060.31 samples/sec Loss 2.5020 LearningRate 0.0474 Epoch: 13 Global Step: 57590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:22:48,409-Speed 5116.97 samples/sec Loss 2.4673 LearningRate 0.0474 Epoch: 13 Global Step: 57600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:22:56,531-Speed 5043.60 samples/sec Loss 2.5164 LearningRate 0.0473 Epoch: 13 Global Step: 57610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:23:04,583-Speed 5088.29 samples/sec Loss 2.4787 LearningRate 0.0473 Epoch: 13 Global Step: 57620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:23:12,643-Speed 5082.59 samples/sec Loss 2.4567 LearningRate 0.0473 Epoch: 13 Global Step: 57630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:23:20,993-Speed 4906.27 samples/sec Loss 2.5128 LearningRate 0.0472 Epoch: 13 Global Step: 57640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:23:29,433-Speed 4853.34 samples/sec Loss 2.4877 LearningRate 0.0472 Epoch: 13 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:23:37,849-Speed 4867.90 samples/sec Loss 2.4871 LearningRate 0.0471 Epoch: 13 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:23:46,065-Speed 4985.91 samples/sec Loss 2.4903 LearningRate 0.0471 Epoch: 13 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:23:54,286-Speed 4982.86 samples/sec Loss 2.4444 LearningRate 0.0471 Epoch: 13 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:24:02,664-Speed 4889.93 samples/sec Loss 2.5152 LearningRate 0.0470 Epoch: 13 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:24:10,808-Speed 5029.97 samples/sec Loss 2.5238 LearningRate 0.0470 Epoch: 13 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:24:18,791-Speed 5131.58 samples/sec Loss 2.4839 LearningRate 0.0470 Epoch: 13 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:24:26,828-Speed 5097.05 samples/sec Loss 2.4770 LearningRate 0.0469 Epoch: 13 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:24:34,804-Speed 5135.72 samples/sec Loss 2.4937 LearningRate 0.0469 Epoch: 13 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:24:42,771-Speed 5142.41 samples/sec Loss 2.4815 LearningRate 0.0468 Epoch: 13 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:24:50,749-Speed 5134.89 samples/sec Loss 2.4880 LearningRate 0.0468 Epoch: 13 Global Step: 57750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:24:58,897-Speed 5027.10 samples/sec Loss 2.4589 LearningRate 0.0468 Epoch: 13 Global Step: 57760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:25:06,929-Speed 5100.50 samples/sec Loss 2.5017 LearningRate 0.0467 Epoch: 13 Global Step: 57770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:25:15,036-Speed 5053.38 samples/sec Loss 2.4289 LearningRate 0.0467 Epoch: 13 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:25:23,067-Speed 5100.44 samples/sec Loss 2.4581 LearningRate 0.0467 Epoch: 13 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:25:31,120-Speed 5087.16 samples/sec Loss 2.4227 LearningRate 0.0466 Epoch: 13 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:25:39,141-Speed 5108.23 samples/sec Loss 2.4724 LearningRate 0.0466 Epoch: 13 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:25:47,167-Speed 5103.80 samples/sec Loss 2.4533 LearningRate 0.0466 Epoch: 13 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:25:55,217-Speed 5089.29 samples/sec Loss 2.4189 LearningRate 0.0465 Epoch: 13 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:26:03,243-Speed 5104.01 samples/sec Loss 2.4247 LearningRate 0.0465 Epoch: 13 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:26:11,512-Speed 4953.97 samples/sec Loss 2.4543 LearningRate 0.0464 Epoch: 13 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:26:19,516-Speed 5118.33 samples/sec Loss 2.4631 LearningRate 0.0464 Epoch: 13 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:26:27,657-Speed 5031.76 samples/sec Loss 2.4691 LearningRate 0.0464 Epoch: 13 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:26:35,734-Speed 5071.92 samples/sec Loss 2.4439 LearningRate 0.0463 Epoch: 13 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:26:43,788-Speed 5086.61 samples/sec Loss 2.4286 LearningRate 0.0463 Epoch: 13 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:26:51,871-Speed 5068.10 samples/sec Loss 2.4376 LearningRate 0.0463 Epoch: 13 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:26:59,949-Speed 5071.42 samples/sec Loss 2.4778 LearningRate 0.0462 Epoch: 13 Global Step: 57910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:27:08,199-Speed 4965.31 samples/sec Loss 2.4412 LearningRate 0.0462 Epoch: 13 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:27:16,356-Speed 5021.80 samples/sec Loss 2.4574 LearningRate 0.0462 Epoch: 13 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:27:24,498-Speed 5031.63 samples/sec Loss 2.4438 LearningRate 0.0461 Epoch: 13 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:27:32,529-Speed 5100.79 samples/sec Loss 2.4486 LearningRate 0.0461 Epoch: 13 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:27:40,628-Speed 5057.99 samples/sec Loss 2.4487 LearningRate 0.0460 Epoch: 13 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:27:48,711-Speed 5068.09 samples/sec Loss 2.4477 LearningRate 0.0460 Epoch: 13 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:27:56,736-Speed 5104.26 samples/sec Loss 2.4236 LearningRate 0.0460 Epoch: 13 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:28:04,923-Speed 5004.06 samples/sec Loss 2.4550 LearningRate 0.0459 Epoch: 13 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:28:13,000-Speed 5072.00 samples/sec Loss 2.4611 LearningRate 0.0459 Epoch: 13 Global Step: 58000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:28:21,135-Speed 5035.68 samples/sec Loss 2.4247 LearningRate 0.0459 Epoch: 13 Global Step: 58010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:28:29,255-Speed 5045.00 samples/sec Loss 2.4387 LearningRate 0.0458 Epoch: 13 Global Step: 58020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:28:37,381-Speed 5041.31 samples/sec Loss 2.4705 LearningRate 0.0458 Epoch: 13 Global Step: 58030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:28:45,554-Speed 5012.43 samples/sec Loss 2.4824 LearningRate 0.0458 Epoch: 13 Global Step: 58040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:28:54,014-Speed 4842.12 samples/sec Loss 2.4258 LearningRate 0.0457 Epoch: 13 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:29:02,401-Speed 4884.54 samples/sec Loss 2.4075 LearningRate 0.0457 Epoch: 13 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:29:10,703-Speed 4934.50 samples/sec Loss 2.4515 LearningRate 0.0457 Epoch: 13 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:29:18,874-Speed 5013.38 samples/sec Loss 2.4090 LearningRate 0.0456 Epoch: 13 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:29:26,938-Speed 5080.36 samples/sec Loss 2.4161 LearningRate 0.0456 Epoch: 13 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:29:35,043-Speed 5054.04 samples/sec Loss 2.4705 LearningRate 0.0455 Epoch: 13 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:29:43,242-Speed 4996.60 samples/sec Loss 2.4418 LearningRate 0.0455 Epoch: 13 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:29:51,307-Speed 5079.31 samples/sec Loss 2.4213 LearningRate 0.0455 Epoch: 13 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:29:59,465-Speed 5021.86 samples/sec Loss 2.4415 LearningRate 0.0454 Epoch: 13 Global Step: 58130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:30:07,648-Speed 5005.71 samples/sec Loss 2.4114 LearningRate 0.0454 Epoch: 13 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:30:15,705-Speed 5084.82 samples/sec Loss 2.4287 LearningRate 0.0454 Epoch: 13 Global Step: 58150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:30:23,919-Speed 4986.65 samples/sec Loss 2.4387 LearningRate 0.0453 Epoch: 13 Global Step: 58160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:30:31,978-Speed 5083.55 samples/sec Loss 2.4151 LearningRate 0.0453 Epoch: 13 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:30:40,024-Speed 5091.63 samples/sec Loss 2.4213 LearningRate 0.0453 Epoch: 13 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:30:48,039-Speed 5111.03 samples/sec Loss 2.4142 LearningRate 0.0452 Epoch: 13 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:30:56,154-Speed 5047.75 samples/sec Loss 2.4426 LearningRate 0.0452 Epoch: 13 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:31:04,251-Speed 5059.52 samples/sec Loss 2.4102 LearningRate 0.0452 Epoch: 13 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:31:12,389-Speed 5033.69 samples/sec Loss 2.4219 LearningRate 0.0451 Epoch: 13 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:31:20,514-Speed 5041.96 samples/sec Loss 2.4246 LearningRate 0.0451 Epoch: 13 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:31:28,558-Speed 5092.55 samples/sec Loss 2.4016 LearningRate 0.0450 Epoch: 13 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:31:36,542-Speed 5131.39 samples/sec Loss 2.4138 LearningRate 0.0450 Epoch: 13 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:31:44,572-Speed 5101.04 samples/sec Loss 2.4321 LearningRate 0.0450 Epoch: 13 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:31:52,640-Speed 5077.79 samples/sec Loss 2.4194 LearningRate 0.0449 Epoch: 13 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:32:00,802-Speed 5018.81 samples/sec Loss 2.4010 LearningRate 0.0449 Epoch: 13 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:32:08,878-Speed 5072.79 samples/sec Loss 2.3775 LearningRate 0.0449 Epoch: 13 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:32:17,303-Speed 4862.40 samples/sec Loss 2.4127 LearningRate 0.0448 Epoch: 13 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:32:25,787-Speed 4828.47 samples/sec Loss 2.3708 LearningRate 0.0448 Epoch: 13 Global Step: 58310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:32:33,797-Speed 5114.23 samples/sec Loss 2.3887 LearningRate 0.0448 Epoch: 13 Global Step: 58320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:32:42,047-Speed 4965.49 samples/sec Loss 2.4096 LearningRate 0.0447 Epoch: 13 Global Step: 58330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:32:50,426-Speed 4889.00 samples/sec Loss 2.3751 LearningRate 0.0447 Epoch: 13 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:32:58,520-Speed 5061.19 samples/sec Loss 2.4292 LearningRate 0.0447 Epoch: 13 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:33:06,599-Speed 5070.53 samples/sec Loss 2.3961 LearningRate 0.0446 Epoch: 13 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:33:14,754-Speed 5023.43 samples/sec Loss 2.4106 LearningRate 0.0446 Epoch: 13 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:33:22,811-Speed 5084.29 samples/sec Loss 2.4034 LearningRate 0.0445 Epoch: 13 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:33:30,996-Speed 5004.98 samples/sec Loss 2.4041 LearningRate 0.0445 Epoch: 13 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:33:39,301-Speed 4933.23 samples/sec Loss 2.3696 LearningRate 0.0445 Epoch: 13 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:34:14,860-Speed 1151.92 samples/sec Loss 2.2804 LearningRate 0.0444 Epoch: 14 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:34:23,075-Speed 4987.09 samples/sec Loss 1.8751 LearningRate 0.0444 Epoch: 14 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:34:31,256-Speed 5008.32 samples/sec Loss 1.8514 LearningRate 0.0444 Epoch: 14 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:34:39,456-Speed 4995.26 samples/sec Loss 1.8815 LearningRate 0.0443 Epoch: 14 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:34:47,693-Speed 4973.88 samples/sec Loss 1.8701 LearningRate 0.0443 Epoch: 14 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:34:55,947-Speed 4962.81 samples/sec Loss 1.8782 LearningRate 0.0443 Epoch: 14 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:35:04,130-Speed 5006.26 samples/sec Loss 1.8787 LearningRate 0.0442 Epoch: 14 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:35:12,166-Speed 5097.96 samples/sec Loss 1.9031 LearningRate 0.0442 Epoch: 14 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:35:20,182-Speed 5110.04 samples/sec Loss 1.9048 LearningRate 0.0442 Epoch: 14 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:35:28,303-Speed 5044.62 samples/sec Loss 1.8579 LearningRate 0.0441 Epoch: 14 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:35:36,410-Speed 5053.16 samples/sec Loss 1.8614 LearningRate 0.0441 Epoch: 14 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:35:44,412-Speed 5118.99 samples/sec Loss 1.9153 LearningRate 0.0440 Epoch: 14 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:35:52,558-Speed 5029.00 samples/sec Loss 1.8670 LearningRate 0.0440 Epoch: 14 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:36:00,548-Speed 5127.29 samples/sec Loss 1.8673 LearningRate 0.0440 Epoch: 14 Global Step: 58540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 11:36:08,921-Speed 4892.67 samples/sec Loss 1.8951 LearningRate 0.0439 Epoch: 14 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:36:16,880-Speed 5147.37 samples/sec Loss 1.8994 LearningRate 0.0439 Epoch: 14 Global Step: 58560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:36:24,856-Speed 5135.64 samples/sec Loss 1.9309 LearningRate 0.0439 Epoch: 14 Global Step: 58570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:36:32,946-Speed 5064.16 samples/sec Loss 1.8964 LearningRate 0.0438 Epoch: 14 Global Step: 58580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:36:41,071-Speed 5042.01 samples/sec Loss 1.9204 LearningRate 0.0438 Epoch: 14 Global Step: 58590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:36:49,294-Speed 4981.59 samples/sec Loss 1.9167 LearningRate 0.0438 Epoch: 14 Global Step: 58600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:36:57,325-Speed 5100.75 samples/sec Loss 1.9520 LearningRate 0.0437 Epoch: 14 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:37:05,440-Speed 5047.72 samples/sec Loss 1.9247 LearningRate 0.0437 Epoch: 14 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:37:13,526-Speed 5066.69 samples/sec Loss 1.9096 LearningRate 0.0437 Epoch: 14 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:37:21,649-Speed 5042.84 samples/sec Loss 1.8930 LearningRate 0.0436 Epoch: 14 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:37:29,590-Speed 5158.83 samples/sec Loss 1.8939 LearningRate 0.0436 Epoch: 14 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:37:37,536-Speed 5155.63 samples/sec Loss 1.9308 LearningRate 0.0436 Epoch: 14 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:37:45,578-Speed 5093.84 samples/sec Loss 1.9219 LearningRate 0.0435 Epoch: 14 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:37:53,712-Speed 5036.74 samples/sec Loss 1.9215 LearningRate 0.0435 Epoch: 14 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:38:01,988-Speed 4949.91 samples/sec Loss 1.9472 LearningRate 0.0434 Epoch: 14 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:38:09,961-Speed 5137.91 samples/sec Loss 1.9754 LearningRate 0.0434 Epoch: 14 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:38:17,974-Speed 5112.56 samples/sec Loss 1.9464 LearningRate 0.0434 Epoch: 14 Global Step: 58710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:38:25,991-Speed 5109.93 samples/sec Loss 1.9081 LearningRate 0.0433 Epoch: 14 Global Step: 58720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:38:34,044-Speed 5087.14 samples/sec Loss 1.9446 LearningRate 0.0433 Epoch: 14 Global Step: 58730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:38:42,264-Speed 4983.93 samples/sec Loss 1.9386 LearningRate 0.0433 Epoch: 14 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:38:50,383-Speed 5044.88 samples/sec Loss 1.9515 LearningRate 0.0432 Epoch: 14 Global Step: 58750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 11:38:58,406-Speed 5106.43 samples/sec Loss 1.9358 LearningRate 0.0432 Epoch: 14 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:39:06,408-Speed 5119.24 samples/sec Loss 1.9474 LearningRate 0.0432 Epoch: 14 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:39:14,549-Speed 5032.02 samples/sec Loss 1.9793 LearningRate 0.0431 Epoch: 14 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:39:22,648-Speed 5058.64 samples/sec Loss 1.9209 LearningRate 0.0431 Epoch: 14 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:39:30,685-Speed 5096.99 samples/sec Loss 1.9569 LearningRate 0.0431 Epoch: 14 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:39:38,755-Speed 5076.23 samples/sec Loss 2.0157 LearningRate 0.0430 Epoch: 14 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:39:46,699-Speed 5157.02 samples/sec Loss 1.9592 LearningRate 0.0430 Epoch: 14 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:39:54,672-Speed 5138.06 samples/sec Loss 1.9901 LearningRate 0.0430 Epoch: 14 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:02,809-Speed 5034.86 samples/sec Loss 1.9711 LearningRate 0.0429 Epoch: 14 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:10,804-Speed 5123.30 samples/sec Loss 1.9977 LearningRate 0.0429 Epoch: 14 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:18,790-Speed 5129.70 samples/sec Loss 2.0006 LearningRate 0.0429 Epoch: 14 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:26,738-Speed 5154.75 samples/sec Loss 1.9736 LearningRate 0.0428 Epoch: 14 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:34,793-Speed 5085.26 samples/sec Loss 1.9949 LearningRate 0.0428 Epoch: 14 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:42,912-Speed 5045.96 samples/sec Loss 1.9864 LearningRate 0.0427 Epoch: 14 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:50,897-Speed 5130.06 samples/sec Loss 2.0150 LearningRate 0.0427 Epoch: 14 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:40:58,992-Speed 5060.63 samples/sec Loss 2.0036 LearningRate 0.0427 Epoch: 14 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:41:07,075-Speed 5068.31 samples/sec Loss 2.0201 LearningRate 0.0426 Epoch: 14 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:41:15,217-Speed 5031.34 samples/sec Loss 2.0173 LearningRate 0.0426 Epoch: 14 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:41:23,559-Speed 4910.67 samples/sec Loss 2.0332 LearningRate 0.0426 Epoch: 14 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:41:31,830-Speed 4952.98 samples/sec Loss 1.9943 LearningRate 0.0425 Epoch: 14 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:41:39,788-Speed 5147.33 samples/sec Loss 2.0156 LearningRate 0.0425 Epoch: 14 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:41:47,754-Speed 5143.16 samples/sec Loss 2.0257 LearningRate 0.0425 Epoch: 14 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:41:55,687-Speed 5163.51 samples/sec Loss 1.9812 LearningRate 0.0424 Epoch: 14 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:42:03,889-Speed 4994.72 samples/sec Loss 2.0158 LearningRate 0.0424 Epoch: 14 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:42:12,080-Speed 5000.82 samples/sec Loss 2.0253 LearningRate 0.0424 Epoch: 14 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:42:20,184-Speed 5055.09 samples/sec Loss 2.0182 LearningRate 0.0423 Epoch: 14 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:42:28,135-Speed 5152.54 samples/sec Loss 2.0388 LearningRate 0.0423 Epoch: 14 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:42:36,181-Speed 5091.31 samples/sec Loss 2.0361 LearningRate 0.0423 Epoch: 14 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:42:44,118-Speed 5160.70 samples/sec Loss 2.0209 LearningRate 0.0422 Epoch: 14 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:42:52,063-Speed 5156.42 samples/sec Loss 2.0621 LearningRate 0.0422 Epoch: 14 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:43:00,017-Speed 5150.45 samples/sec Loss 2.0258 LearningRate 0.0422 Epoch: 14 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:43:08,056-Speed 5095.51 samples/sec Loss 2.0296 LearningRate 0.0421 Epoch: 14 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:43:16,021-Speed 5143.79 samples/sec Loss 2.0218 LearningRate 0.0421 Epoch: 14 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:43:24,193-Speed 5013.29 samples/sec Loss 2.0433 LearningRate 0.0421 Epoch: 14 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:43:32,367-Speed 5011.97 samples/sec Loss 2.0249 LearningRate 0.0420 Epoch: 14 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:43:40,311-Speed 5156.73 samples/sec Loss 2.0346 LearningRate 0.0420 Epoch: 14 Global Step: 59110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:43:48,379-Speed 5077.36 samples/sec Loss 2.0508 LearningRate 0.0420 Epoch: 14 Global Step: 59120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:43:56,324-Speed 5155.93 samples/sec Loss 2.0299 LearningRate 0.0419 Epoch: 14 Global Step: 59130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:44:04,391-Speed 5078.17 samples/sec Loss 2.0354 LearningRate 0.0419 Epoch: 14 Global Step: 59140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:44:12,415-Speed 5105.65 samples/sec Loss 2.0400 LearningRate 0.0418 Epoch: 14 Global Step: 59150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:44:20,809-Speed 4879.94 samples/sec Loss 2.0408 LearningRate 0.0418 Epoch: 14 Global Step: 59160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:44:29,156-Speed 4907.85 samples/sec Loss 2.0604 LearningRate 0.0418 Epoch: 14 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:44:37,340-Speed 5006.11 samples/sec Loss 2.0446 LearningRate 0.0417 Epoch: 14 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:44:45,284-Speed 5156.11 samples/sec Loss 2.0265 LearningRate 0.0417 Epoch: 14 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:44:53,289-Speed 5118.06 samples/sec Loss 2.0401 LearningRate 0.0417 Epoch: 14 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:45:01,285-Speed 5123.19 samples/sec Loss 2.0739 LearningRate 0.0416 Epoch: 14 Global Step: 59210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:45:09,196-Speed 5178.48 samples/sec Loss 2.0524 LearningRate 0.0416 Epoch: 14 Global Step: 59220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:45:17,158-Speed 5144.95 samples/sec Loss 2.0378 LearningRate 0.0416 Epoch: 14 Global Step: 59230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:45:25,053-Speed 5188.70 samples/sec Loss 2.0323 LearningRate 0.0415 Epoch: 14 Global Step: 59240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:45:32,980-Speed 5168.11 samples/sec Loss 2.0386 LearningRate 0.0415 Epoch: 14 Global Step: 59250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:45:40,930-Speed 5153.07 samples/sec Loss 2.0203 LearningRate 0.0415 Epoch: 14 Global Step: 59260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:45:49,005-Speed 5072.70 samples/sec Loss 2.0799 LearningRate 0.0414 Epoch: 14 Global Step: 59270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:45:57,068-Speed 5080.88 samples/sec Loss 2.0505 LearningRate 0.0414 Epoch: 14 Global Step: 59280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:46:05,043-Speed 5136.91 samples/sec Loss 2.0708 LearningRate 0.0414 Epoch: 14 Global Step: 59290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:46:13,114-Speed 5075.49 samples/sec Loss 2.0538 LearningRate 0.0413 Epoch: 14 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:46:21,086-Speed 5138.84 samples/sec Loss 2.0353 LearningRate 0.0413 Epoch: 14 Global Step: 59310 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 11:46:29,030-Speed 5156.88 samples/sec Loss 2.0600 LearningRate 0.0413 Epoch: 14 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:46:36,965-Speed 5162.60 samples/sec Loss 2.0876 LearningRate 0.0412 Epoch: 14 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:46:44,961-Speed 5123.37 samples/sec Loss 2.0674 LearningRate 0.0412 Epoch: 14 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:46:52,964-Speed 5118.81 samples/sec Loss 2.0956 LearningRate 0.0412 Epoch: 14 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:01,052-Speed 5065.03 samples/sec Loss 2.0998 LearningRate 0.0411 Epoch: 14 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:08,997-Speed 5156.27 samples/sec Loss 2.0909 LearningRate 0.0411 Epoch: 14 Global Step: 59370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:17,024-Speed 5102.96 samples/sec Loss 2.0731 LearningRate 0.0411 Epoch: 14 Global Step: 59380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:25,028-Speed 5118.58 samples/sec Loss 2.0660 LearningRate 0.0410 Epoch: 14 Global Step: 59390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:33,053-Speed 5104.27 samples/sec Loss 2.0880 LearningRate 0.0410 Epoch: 14 Global Step: 59400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:41,104-Speed 5088.76 samples/sec Loss 2.0689 LearningRate 0.0410 Epoch: 14 Global Step: 59410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:49,174-Speed 5076.18 samples/sec Loss 2.0669 LearningRate 0.0409 Epoch: 14 Global Step: 59420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:47:57,672-Speed 4820.32 samples/sec Loss 2.0696 LearningRate 0.0409 Epoch: 14 Global Step: 59430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:48:06,075-Speed 4875.15 samples/sec Loss 2.0802 LearningRate 0.0409 Epoch: 14 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:48:13,994-Speed 5173.31 samples/sec Loss 2.0761 LearningRate 0.0408 Epoch: 14 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:48:21,954-Speed 5146.53 samples/sec Loss 2.0733 LearningRate 0.0408 Epoch: 14 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:48:29,978-Speed 5105.27 samples/sec Loss 2.0891 LearningRate 0.0408 Epoch: 14 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:48:37,998-Speed 5107.99 samples/sec Loss 2.0789 LearningRate 0.0407 Epoch: 14 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:48:46,021-Speed 5106.43 samples/sec Loss 2.0933 LearningRate 0.0407 Epoch: 14 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:48:54,074-Speed 5086.43 samples/sec Loss 2.0661 LearningRate 0.0407 Epoch: 14 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:49:02,131-Speed 5084.66 samples/sec Loss 2.0895 LearningRate 0.0406 Epoch: 14 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:49:10,175-Speed 5092.39 samples/sec Loss 2.0766 LearningRate 0.0406 Epoch: 14 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:49:18,222-Speed 5091.08 samples/sec Loss 2.1064 LearningRate 0.0405 Epoch: 14 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:49:26,299-Speed 5071.41 samples/sec Loss 2.1078 LearningRate 0.0405 Epoch: 14 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:49:34,422-Speed 5043.29 samples/sec Loss 2.1343 LearningRate 0.0405 Epoch: 14 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:49:42,377-Speed 5149.59 samples/sec Loss 2.1047 LearningRate 0.0404 Epoch: 14 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:49:50,596-Speed 4984.34 samples/sec Loss 2.0881 LearningRate 0.0404 Epoch: 14 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:49:58,558-Speed 5145.32 samples/sec Loss 2.1042 LearningRate 0.0404 Epoch: 14 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:50:06,564-Speed 5117.26 samples/sec Loss 2.0834 LearningRate 0.0403 Epoch: 14 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:50:14,539-Speed 5136.66 samples/sec Loss 2.0914 LearningRate 0.0403 Epoch: 14 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:50:22,551-Speed 5113.01 samples/sec Loss 2.0770 LearningRate 0.0403 Epoch: 14 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:50:30,497-Speed 5155.60 samples/sec Loss 2.1121 LearningRate 0.0402 Epoch: 14 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:50:38,511-Speed 5111.68 samples/sec Loss 2.1007 LearningRate 0.0402 Epoch: 14 Global Step: 59630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:50:46,559-Speed 5090.28 samples/sec Loss 2.0980 LearningRate 0.0402 Epoch: 14 Global Step: 59640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:50:54,658-Speed 5057.96 samples/sec Loss 2.1057 LearningRate 0.0401 Epoch: 14 Global Step: 59650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:51:02,771-Speed 5049.14 samples/sec Loss 2.1128 LearningRate 0.0401 Epoch: 14 Global Step: 59660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:51:11,105-Speed 4915.65 samples/sec Loss 2.1415 LearningRate 0.0401 Epoch: 14 Global Step: 59670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:51:19,121-Speed 5110.32 samples/sec Loss 2.0955 LearningRate 0.0400 Epoch: 14 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:51:27,217-Speed 5060.00 samples/sec Loss 2.1516 LearningRate 0.0400 Epoch: 14 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:51:35,269-Speed 5087.76 samples/sec Loss 2.0988 LearningRate 0.0400 Epoch: 14 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:51:43,261-Speed 5125.73 samples/sec Loss 2.0967 LearningRate 0.0399 Epoch: 14 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:51:51,315-Speed 5086.73 samples/sec Loss 2.1048 LearningRate 0.0399 Epoch: 14 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:51:59,418-Speed 5055.18 samples/sec Loss 2.1186 LearningRate 0.0399 Epoch: 14 Global Step: 59730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:52:07,538-Speed 5045.37 samples/sec Loss 2.1000 LearningRate 0.0398 Epoch: 14 Global Step: 59740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:52:15,583-Speed 5092.07 samples/sec Loss 2.1182 LearningRate 0.0398 Epoch: 14 Global Step: 59750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:52:23,651-Speed 5077.63 samples/sec Loss 2.1257 LearningRate 0.0398 Epoch: 14 Global Step: 59760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:52:31,623-Speed 5138.30 samples/sec Loss 2.1150 LearningRate 0.0397 Epoch: 14 Global Step: 59770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-17 11:52:39,569-Speed 5155.42 samples/sec Loss 2.1046 LearningRate 0.0397 Epoch: 14 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:52:47,547-Speed 5134.60 samples/sec Loss 2.1189 LearningRate 0.0397 Epoch: 14 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:52:55,824-Speed 4949.97 samples/sec Loss 2.0735 LearningRate 0.0396 Epoch: 14 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:53:03,834-Speed 5113.93 samples/sec Loss 2.1188 LearningRate 0.0396 Epoch: 14 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:53:11,843-Speed 5114.92 samples/sec Loss 2.1466 LearningRate 0.0396 Epoch: 14 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:53:19,794-Speed 5152.80 samples/sec Loss 2.0927 LearningRate 0.0395 Epoch: 14 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:53:27,914-Speed 5044.57 samples/sec Loss 2.1314 LearningRate 0.0395 Epoch: 14 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:53:35,871-Speed 5148.53 samples/sec Loss 2.0962 LearningRate 0.0395 Epoch: 14 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:53:43,914-Speed 5093.15 samples/sec Loss 2.1039 LearningRate 0.0394 Epoch: 14 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:53:52,088-Speed 5011.95 samples/sec Loss 2.1123 LearningRate 0.0394 Epoch: 14 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:54:00,279-Speed 5001.14 samples/sec Loss 2.1150 LearningRate 0.0394 Epoch: 14 Global Step: 59880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:54:08,409-Speed 5038.82 samples/sec Loss 2.1492 LearningRate 0.0393 Epoch: 14 Global Step: 59890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:54:16,551-Speed 5031.18 samples/sec Loss 2.1108 LearningRate 0.0393 Epoch: 14 Global Step: 59900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:54:24,635-Speed 5067.66 samples/sec Loss 2.0746 LearningRate 0.0393 Epoch: 14 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:54:32,659-Speed 5105.00 samples/sec Loss 2.1247 LearningRate 0.0392 Epoch: 14 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:54:40,805-Speed 5029.20 samples/sec Loss 2.1362 LearningRate 0.0392 Epoch: 14 Global Step: 59930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:54:49,353-Speed 4792.82 samples/sec Loss 2.0959 LearningRate 0.0392 Epoch: 14 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:54:57,882-Speed 4802.89 samples/sec Loss 2.1116 LearningRate 0.0391 Epoch: 14 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:55:06,124-Speed 4970.19 samples/sec Loss 2.1390 LearningRate 0.0391 Epoch: 14 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:55:14,568-Speed 4851.27 samples/sec Loss 2.1338 LearningRate 0.0391 Epoch: 14 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:55:23,048-Speed 4831.17 samples/sec Loss 2.1126 LearningRate 0.0390 Epoch: 14 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:55:31,401-Speed 4904.16 samples/sec Loss 2.1116 LearningRate 0.0390 Epoch: 14 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:55:39,628-Speed 4979.22 samples/sec Loss 2.1278 LearningRate 0.0390 Epoch: 14 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:56:30,439-[lfw][60000]XNorm: 23.184466 Training: 2022-01-17 11:56:30,439-[lfw][60000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-01-17 11:56:30,440-[lfw][60000]Accuracy-Highest: 0.99817 Training: 2022-01-17 11:57:25,297-[cfp_fp][60000]XNorm: 21.475831 Training: 2022-01-17 11:57:25,297-[cfp_fp][60000]Accuracy-Flip: 0.98829+-0.00451 Training: 2022-01-17 11:57:25,298-[cfp_fp][60000]Accuracy-Highest: 0.98843 Training: 2022-01-17 11:58:12,203-[agedb_30][60000]XNorm: 22.865230 Training: 2022-01-17 11:58:12,203-[agedb_30][60000]Accuracy-Flip: 0.98050+-0.00615 Training: 2022-01-17 11:58:12,204-[agedb_30][60000]Accuracy-Highest: 0.98267 Training: 2022-01-17 11:58:20,791-Speed 254.15 samples/sec Loss 2.1152 LearningRate 0.0389 Epoch: 14 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 11:58:28,962-Speed 5013.51 samples/sec Loss 2.1253 LearningRate 0.0389 Epoch: 14 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:58:36,876-Speed 5176.42 samples/sec Loss 2.0964 LearningRate 0.0389 Epoch: 14 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:58:44,863-Speed 5129.16 samples/sec Loss 2.1065 LearningRate 0.0388 Epoch: 14 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:58:52,822-Speed 5147.10 samples/sec Loss 2.1825 LearningRate 0.0388 Epoch: 14 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:59:00,752-Speed 5165.89 samples/sec Loss 2.1010 LearningRate 0.0388 Epoch: 14 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:59:08,801-Speed 5089.97 samples/sec Loss 2.1175 LearningRate 0.0387 Epoch: 14 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:59:16,889-Speed 5064.41 samples/sec Loss 2.0758 LearningRate 0.0387 Epoch: 14 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:59:24,973-Speed 5067.42 samples/sec Loss 2.1080 LearningRate 0.0387 Epoch: 14 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:59:33,033-Speed 5083.08 samples/sec Loss 2.1144 LearningRate 0.0386 Epoch: 14 Global Step: 60100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:59:40,978-Speed 5155.89 samples/sec Loss 2.1219 LearningRate 0.0386 Epoch: 14 Global Step: 60110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 11:59:48,965-Speed 5129.02 samples/sec Loss 2.1258 LearningRate 0.0386 Epoch: 14 Global Step: 60120 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 11:59:57,088-Speed 5043.14 samples/sec Loss 2.1218 LearningRate 0.0385 Epoch: 14 Global Step: 60130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:00:05,073-Speed 5130.64 samples/sec Loss 2.1163 LearningRate 0.0385 Epoch: 14 Global Step: 60140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:00:13,005-Speed 5164.38 samples/sec Loss 2.1445 LearningRate 0.0385 Epoch: 14 Global Step: 60150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:00:21,436-Speed 4858.98 samples/sec Loss 2.1085 LearningRate 0.0384 Epoch: 14 Global Step: 60160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:00:29,524-Speed 5065.07 samples/sec Loss 2.0690 LearningRate 0.0384 Epoch: 14 Global Step: 60170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:00:37,611-Speed 5065.71 samples/sec Loss 2.0859 LearningRate 0.0384 Epoch: 14 Global Step: 60180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:00:45,614-Speed 5119.15 samples/sec Loss 2.0960 LearningRate 0.0383 Epoch: 14 Global Step: 60190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:00:53,778-Speed 5017.65 samples/sec Loss 2.0936 LearningRate 0.0383 Epoch: 14 Global Step: 60200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:01,750-Speed 5138.96 samples/sec Loss 2.1239 LearningRate 0.0383 Epoch: 14 Global Step: 60210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:09,777-Speed 5103.12 samples/sec Loss 2.1043 LearningRate 0.0382 Epoch: 14 Global Step: 60220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:17,777-Speed 5120.74 samples/sec Loss 2.0926 LearningRate 0.0382 Epoch: 14 Global Step: 60230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:25,868-Speed 5063.20 samples/sec Loss 2.1096 LearningRate 0.0382 Epoch: 14 Global Step: 60240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:33,875-Speed 5115.65 samples/sec Loss 2.1317 LearningRate 0.0381 Epoch: 14 Global Step: 60250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:42,007-Speed 5037.90 samples/sec Loss 2.1238 LearningRate 0.0381 Epoch: 14 Global Step: 60260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:49,973-Speed 5142.61 samples/sec Loss 2.0863 LearningRate 0.0381 Epoch: 14 Global Step: 60270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:01:57,911-Speed 5160.35 samples/sec Loss 2.1032 LearningRate 0.0380 Epoch: 14 Global Step: 60280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:02:05,872-Speed 5145.79 samples/sec Loss 2.1412 LearningRate 0.0380 Epoch: 14 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:02:13,948-Speed 5072.50 samples/sec Loss 2.1125 LearningRate 0.0380 Epoch: 14 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:02:21,863-Speed 5175.95 samples/sec Loss 2.1212 LearningRate 0.0379 Epoch: 14 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:02:29,795-Speed 5164.48 samples/sec Loss 2.1284 LearningRate 0.0379 Epoch: 14 Global Step: 60320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 12:02:37,719-Speed 5169.99 samples/sec Loss 2.1202 LearningRate 0.0379 Epoch: 14 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:02:45,676-Speed 5148.08 samples/sec Loss 2.1375 LearningRate 0.0378 Epoch: 14 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:02:53,705-Speed 5102.46 samples/sec Loss 2.1795 LearningRate 0.0378 Epoch: 14 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:01,914-Speed 4989.92 samples/sec Loss 2.1002 LearningRate 0.0378 Epoch: 14 Global Step: 60360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:09,858-Speed 5157.18 samples/sec Loss 2.1050 LearningRate 0.0378 Epoch: 14 Global Step: 60370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:17,901-Speed 5093.32 samples/sec Loss 2.1216 LearningRate 0.0377 Epoch: 14 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:25,936-Speed 5097.90 samples/sec Loss 2.0966 LearningRate 0.0377 Epoch: 14 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:33,975-Speed 5096.32 samples/sec Loss 2.1350 LearningRate 0.0377 Epoch: 14 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:41,978-Speed 5119.01 samples/sec Loss 2.1080 LearningRate 0.0376 Epoch: 14 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:50,050-Speed 5074.78 samples/sec Loss 2.1104 LearningRate 0.0376 Epoch: 14 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-17 12:03:58,113-Speed 5080.74 samples/sec Loss 2.1193 LearningRate 0.0376 Epoch: 14 Global Step: 60430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:04:06,289-Speed 5010.56 samples/sec Loss 2.1254 LearningRate 0.0375 Epoch: 14 Global Step: 60440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:04:14,321-Speed 5100.13 samples/sec Loss 2.1458 LearningRate 0.0375 Epoch: 14 Global Step: 60450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:04:22,235-Speed 5176.77 samples/sec Loss 2.1470 LearningRate 0.0375 Epoch: 14 Global Step: 60460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:04:30,198-Speed 5144.45 samples/sec Loss 2.1270 LearningRate 0.0374 Epoch: 14 Global Step: 60470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:04:38,157-Speed 5146.65 samples/sec Loss 2.0966 LearningRate 0.0374 Epoch: 14 Global Step: 60480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:04:46,113-Speed 5149.09 samples/sec Loss 2.1056 LearningRate 0.0374 Epoch: 14 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:04:54,188-Speed 5072.84 samples/sec Loss 2.1177 LearningRate 0.0373 Epoch: 14 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:05:02,126-Speed 5161.09 samples/sec Loss 2.1194 LearningRate 0.0373 Epoch: 14 Global Step: 60510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:05:10,057-Speed 5165.15 samples/sec Loss 2.0967 LearningRate 0.0373 Epoch: 14 Global Step: 60520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:05:18,127-Speed 5076.10 samples/sec Loss 2.1412 LearningRate 0.0372 Epoch: 14 Global Step: 60530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:05:26,251-Speed 5042.48 samples/sec Loss 2.1321 LearningRate 0.0372 Epoch: 14 Global Step: 60540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:05:34,255-Speed 5117.92 samples/sec Loss 2.0815 LearningRate 0.0372 Epoch: 14 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:05:42,203-Speed 5154.55 samples/sec Loss 2.1135 LearningRate 0.0371 Epoch: 14 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:05:50,182-Speed 5134.15 samples/sec Loss 2.1096 LearningRate 0.0371 Epoch: 14 Global Step: 60570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:05:58,185-Speed 5118.98 samples/sec Loss 2.1295 LearningRate 0.0371 Epoch: 14 Global Step: 60580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:06:06,216-Speed 5100.82 samples/sec Loss 2.1358 LearningRate 0.0370 Epoch: 14 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:06:14,142-Speed 5168.07 samples/sec Loss 2.1231 LearningRate 0.0370 Epoch: 14 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:06:22,150-Speed 5115.90 samples/sec Loss 2.1216 LearningRate 0.0370 Epoch: 14 Global Step: 60610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:06:30,169-Speed 5108.36 samples/sec Loss 2.1331 LearningRate 0.0369 Epoch: 14 Global Step: 60620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:06:38,162-Speed 5125.58 samples/sec Loss 2.0934 LearningRate 0.0369 Epoch: 14 Global Step: 60630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:06:46,341-Speed 5008.78 samples/sec Loss 2.1355 LearningRate 0.0369 Epoch: 14 Global Step: 60640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:06:54,314-Speed 5137.86 samples/sec Loss 2.1580 LearningRate 0.0368 Epoch: 14 Global Step: 60650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:07:02,361-Speed 5090.88 samples/sec Loss 2.1282 LearningRate 0.0368 Epoch: 14 Global Step: 60660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:07:10,611-Speed 4965.12 samples/sec Loss 2.1074 LearningRate 0.0368 Epoch: 14 Global Step: 60670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:07:18,637-Speed 5104.07 samples/sec Loss 2.0681 LearningRate 0.0367 Epoch: 14 Global Step: 60680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:07:26,563-Speed 5168.64 samples/sec Loss 2.1046 LearningRate 0.0367 Epoch: 14 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:07:34,581-Speed 5109.37 samples/sec Loss 2.1084 LearningRate 0.0367 Epoch: 14 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:07:42,728-Speed 5028.27 samples/sec Loss 2.0913 LearningRate 0.0366 Epoch: 14 Global Step: 60710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:07:50,812-Speed 5067.65 samples/sec Loss 2.1165 LearningRate 0.0366 Epoch: 14 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:07:59,005-Speed 4999.60 samples/sec Loss 2.1040 LearningRate 0.0366 Epoch: 14 Global Step: 60730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:08:07,131-Speed 5041.11 samples/sec Loss 2.0618 LearningRate 0.0365 Epoch: 14 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:08:15,269-Speed 5033.83 samples/sec Loss 2.0931 LearningRate 0.0365 Epoch: 14 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:08:23,514-Speed 4969.06 samples/sec Loss 2.0958 LearningRate 0.0365 Epoch: 14 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:08:31,726-Speed 4988.00 samples/sec Loss 2.1240 LearningRate 0.0365 Epoch: 14 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:08:40,141-Speed 4868.19 samples/sec Loss 2.0926 LearningRate 0.0364 Epoch: 14 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:08:48,742-Speed 4762.76 samples/sec Loss 2.0981 LearningRate 0.0364 Epoch: 14 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:08:56,699-Speed 5148.20 samples/sec Loss 2.1414 LearningRate 0.0364 Epoch: 14 Global Step: 60800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:09:04,869-Speed 5014.46 samples/sec Loss 2.1457 LearningRate 0.0363 Epoch: 14 Global Step: 60810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:09:12,892-Speed 5105.57 samples/sec Loss 2.0832 LearningRate 0.0363 Epoch: 14 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:09:20,929-Speed 5097.48 samples/sec Loss 2.0842 LearningRate 0.0363 Epoch: 14 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:09:28,904-Speed 5136.59 samples/sec Loss 2.0527 LearningRate 0.0362 Epoch: 14 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:09:36,872-Speed 5141.43 samples/sec Loss 2.1290 LearningRate 0.0362 Epoch: 14 Global Step: 60850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:09:44,966-Speed 5061.30 samples/sec Loss 2.0781 LearningRate 0.0362 Epoch: 14 Global Step: 60860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:09:53,266-Speed 4935.72 samples/sec Loss 2.0818 LearningRate 0.0361 Epoch: 14 Global Step: 60870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:10:01,332-Speed 5079.12 samples/sec Loss 2.1122 LearningRate 0.0361 Epoch: 14 Global Step: 60880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:10:09,287-Speed 5149.31 samples/sec Loss 2.1093 LearningRate 0.0361 Epoch: 14 Global Step: 60890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:10:17,239-Speed 5151.52 samples/sec Loss 2.1071 LearningRate 0.0360 Epoch: 14 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:10:25,639-Speed 4877.08 samples/sec Loss 2.0632 LearningRate 0.0360 Epoch: 14 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:10:34,575-Speed 4584.45 samples/sec Loss 2.0778 LearningRate 0.0360 Epoch: 14 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:10:43,484-Speed 4598.02 samples/sec Loss 2.0940 LearningRate 0.0359 Epoch: 14 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:10:52,507-Speed 4540.13 samples/sec Loss 2.1039 LearningRate 0.0359 Epoch: 14 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:11:01,397-Speed 4607.75 samples/sec Loss 2.0670 LearningRate 0.0359 Epoch: 14 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:11:10,203-Speed 4651.88 samples/sec Loss 2.0918 LearningRate 0.0358 Epoch: 14 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:11:18,990-Speed 4662.16 samples/sec Loss 2.0839 LearningRate 0.0358 Epoch: 14 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:11:27,080-Speed 5063.70 samples/sec Loss 2.0675 LearningRate 0.0358 Epoch: 14 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:11:35,268-Speed 5003.07 samples/sec Loss 2.0929 LearningRate 0.0357 Epoch: 14 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:11:43,255-Speed 5129.39 samples/sec Loss 2.0693 LearningRate 0.0357 Epoch: 14 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:11:51,306-Speed 5088.11 samples/sec Loss 2.0722 LearningRate 0.0357 Epoch: 14 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:11:59,406-Speed 5057.58 samples/sec Loss 2.0986 LearningRate 0.0357 Epoch: 14 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:12:07,330-Speed 5169.86 samples/sec Loss 2.1149 LearningRate 0.0356 Epoch: 14 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:12:15,248-Speed 5173.41 samples/sec Loss 2.0925 LearningRate 0.0356 Epoch: 14 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:12:23,234-Speed 5129.52 samples/sec Loss 2.0987 LearningRate 0.0356 Epoch: 14 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:12:31,175-Speed 5158.85 samples/sec Loss 2.0897 LearningRate 0.0355 Epoch: 14 Global Step: 61060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:12:39,735-Speed 4785.90 samples/sec Loss 2.0903 LearningRate 0.0355 Epoch: 14 Global Step: 61070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:12:48,463-Speed 4693.35 samples/sec Loss 2.0872 LearningRate 0.0355 Epoch: 14 Global Step: 61080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:12:56,897-Speed 4857.06 samples/sec Loss 2.1066 LearningRate 0.0354 Epoch: 14 Global Step: 61090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:13:05,165-Speed 4954.72 samples/sec Loss 2.0800 LearningRate 0.0354 Epoch: 14 Global Step: 61100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:13:13,224-Speed 5083.47 samples/sec Loss 2.0905 LearningRate 0.0354 Epoch: 14 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:13:21,223-Speed 5120.76 samples/sec Loss 2.0779 LearningRate 0.0353 Epoch: 14 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:13:29,326-Speed 5055.83 samples/sec Loss 2.0600 LearningRate 0.0353 Epoch: 14 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:13:37,533-Speed 4991.32 samples/sec Loss 2.0742 LearningRate 0.0353 Epoch: 14 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:13:45,638-Speed 5054.72 samples/sec Loss 2.0799 LearningRate 0.0352 Epoch: 14 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:13:53,731-Speed 5061.52 samples/sec Loss 2.0376 LearningRate 0.0352 Epoch: 14 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:14:01,763-Speed 5100.47 samples/sec Loss 2.0838 LearningRate 0.0352 Epoch: 14 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:14:09,849-Speed 5066.03 samples/sec Loss 2.0555 LearningRate 0.0351 Epoch: 14 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:14:17,871-Speed 5106.62 samples/sec Loss 2.0895 LearningRate 0.0351 Epoch: 14 Global Step: 61190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:14:25,979-Speed 5053.22 samples/sec Loss 2.0942 LearningRate 0.0351 Epoch: 14 Global Step: 61200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:14:33,995-Speed 5110.23 samples/sec Loss 2.0779 LearningRate 0.0351 Epoch: 14 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:14:42,312-Speed 4925.18 samples/sec Loss 2.1217 LearningRate 0.0350 Epoch: 14 Global Step: 61220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:14:50,282-Speed 5140.17 samples/sec Loss 2.0728 LearningRate 0.0350 Epoch: 14 Global Step: 61230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:14:58,233-Speed 5152.37 samples/sec Loss 2.0815 LearningRate 0.0350 Epoch: 14 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:15:06,174-Speed 5158.86 samples/sec Loss 2.0995 LearningRate 0.0349 Epoch: 14 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:15:14,144-Speed 5140.04 samples/sec Loss 2.0679 LearningRate 0.0349 Epoch: 14 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:15:22,099-Speed 5149.45 samples/sec Loss 2.0937 LearningRate 0.0349 Epoch: 14 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:15:30,111-Speed 5113.22 samples/sec Loss 2.0966 LearningRate 0.0348 Epoch: 14 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:15:38,105-Speed 5125.22 samples/sec Loss 2.0682 LearningRate 0.0348 Epoch: 14 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:15:46,289-Speed 5005.75 samples/sec Loss 2.0558 LearningRate 0.0348 Epoch: 14 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:15:54,391-Speed 5056.27 samples/sec Loss 2.0696 LearningRate 0.0347 Epoch: 14 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:16:02,378-Speed 5128.85 samples/sec Loss 2.0407 LearningRate 0.0347 Epoch: 14 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:16:10,465-Speed 5065.65 samples/sec Loss 2.0605 LearningRate 0.0347 Epoch: 14 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:16:18,639-Speed 5011.70 samples/sec Loss 2.0674 LearningRate 0.0346 Epoch: 14 Global Step: 61340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:16:26,838-Speed 4996.05 samples/sec Loss 2.0730 LearningRate 0.0346 Epoch: 14 Global Step: 61350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:16:35,108-Speed 4953.20 samples/sec Loss 2.0674 LearningRate 0.0346 Epoch: 14 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:16:43,160-Speed 5087.76 samples/sec Loss 2.0654 LearningRate 0.0345 Epoch: 14 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:16:51,252-Speed 5062.81 samples/sec Loss 2.0528 LearningRate 0.0345 Epoch: 14 Global Step: 61380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:16:59,379-Speed 5040.28 samples/sec Loss 2.0389 LearningRate 0.0345 Epoch: 14 Global Step: 61390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:17:07,431-Speed 5088.25 samples/sec Loss 2.0490 LearningRate 0.0345 Epoch: 14 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:17:15,560-Speed 5039.20 samples/sec Loss 2.0229 LearningRate 0.0344 Epoch: 14 Global Step: 61410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:17:23,716-Speed 5022.32 samples/sec Loss 2.0555 LearningRate 0.0344 Epoch: 14 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:17:31,826-Speed 5051.41 samples/sec Loss 2.0505 LearningRate 0.0344 Epoch: 14 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:17:39,798-Speed 5138.83 samples/sec Loss 2.0188 LearningRate 0.0343 Epoch: 14 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:17:47,901-Speed 5055.80 samples/sec Loss 2.0597 LearningRate 0.0343 Epoch: 14 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:17:55,850-Speed 5153.14 samples/sec Loss 2.0378 LearningRate 0.0343 Epoch: 14 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:18:03,953-Speed 5055.79 samples/sec Loss 2.0688 LearningRate 0.0342 Epoch: 14 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:18:12,254-Speed 4935.16 samples/sec Loss 2.0594 LearningRate 0.0342 Epoch: 14 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:18:20,569-Speed 4926.40 samples/sec Loss 2.0625 LearningRate 0.0342 Epoch: 14 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:18:28,911-Speed 4910.52 samples/sec Loss 2.0419 LearningRate 0.0341 Epoch: 14 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:18:37,164-Speed 4963.99 samples/sec Loss 2.0643 LearningRate 0.0341 Epoch: 14 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:18:45,141-Speed 5135.36 samples/sec Loss 2.0189 LearningRate 0.0341 Epoch: 14 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:18:53,210-Speed 5076.98 samples/sec Loss 2.0523 LearningRate 0.0340 Epoch: 14 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:19:01,264-Speed 5086.17 samples/sec Loss 2.0221 LearningRate 0.0340 Epoch: 14 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:19:09,481-Speed 4985.26 samples/sec Loss 2.0188 LearningRate 0.0340 Epoch: 14 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:19:17,866-Speed 4885.65 samples/sec Loss 2.0488 LearningRate 0.0340 Epoch: 14 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:19:25,854-Speed 5128.55 samples/sec Loss 2.0427 LearningRate 0.0339 Epoch: 14 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:19:33,916-Speed 5081.19 samples/sec Loss 2.0328 LearningRate 0.0339 Epoch: 14 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:19:41,900-Speed 5130.56 samples/sec Loss 2.0553 LearningRate 0.0339 Epoch: 14 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:19:49,906-Speed 5117.19 samples/sec Loss 2.0464 LearningRate 0.0338 Epoch: 14 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:19:57,986-Speed 5069.93 samples/sec Loss 2.0526 LearningRate 0.0338 Epoch: 14 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:20:05,903-Speed 5174.62 samples/sec Loss 2.0222 LearningRate 0.0338 Epoch: 14 Global Step: 61620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:20:13,901-Speed 5121.86 samples/sec Loss 2.0340 LearningRate 0.0337 Epoch: 14 Global Step: 61630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:20:21,931-Speed 5101.39 samples/sec Loss 2.0197 LearningRate 0.0337 Epoch: 14 Global Step: 61640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:20:29,988-Speed 5084.51 samples/sec Loss 2.0331 LearningRate 0.0337 Epoch: 14 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:20:38,073-Speed 5066.75 samples/sec Loss 2.0319 LearningRate 0.0336 Epoch: 14 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:20:46,160-Speed 5065.84 samples/sec Loss 2.0229 LearningRate 0.0336 Epoch: 14 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:20:54,284-Speed 5042.29 samples/sec Loss 2.0203 LearningRate 0.0336 Epoch: 14 Global Step: 61680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:21:02,273-Speed 5127.71 samples/sec Loss 2.0662 LearningRate 0.0336 Epoch: 14 Global Step: 61690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:21:10,248-Speed 5137.00 samples/sec Loss 2.0546 LearningRate 0.0335 Epoch: 14 Global Step: 61700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:21:18,279-Speed 5100.70 samples/sec Loss 2.0519 LearningRate 0.0335 Epoch: 14 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:21:26,315-Speed 5098.05 samples/sec Loss 2.0094 LearningRate 0.0335 Epoch: 14 Global Step: 61720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:21:34,377-Speed 5081.20 samples/sec Loss 2.0424 LearningRate 0.0334 Epoch: 14 Global Step: 61730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:21:42,383-Speed 5116.46 samples/sec Loss 2.0410 LearningRate 0.0334 Epoch: 14 Global Step: 61740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:21:50,360-Speed 5135.56 samples/sec Loss 2.0286 LearningRate 0.0334 Epoch: 14 Global Step: 61750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:21:58,430-Speed 5076.23 samples/sec Loss 2.0187 LearningRate 0.0333 Epoch: 14 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:22:06,403-Speed 5138.19 samples/sec Loss 2.0529 LearningRate 0.0333 Epoch: 14 Global Step: 61770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:22:14,386-Speed 5131.76 samples/sec Loss 2.0527 LearningRate 0.0333 Epoch: 14 Global Step: 61780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:22:22,355-Speed 5140.91 samples/sec Loss 2.0251 LearningRate 0.0332 Epoch: 14 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:22:30,280-Speed 5169.24 samples/sec Loss 2.0070 LearningRate 0.0332 Epoch: 14 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:22:38,281-Speed 5119.89 samples/sec Loss 2.0024 LearningRate 0.0332 Epoch: 14 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:22:46,312-Speed 5101.48 samples/sec Loss 2.0292 LearningRate 0.0332 Epoch: 14 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:22:54,363-Speed 5087.96 samples/sec Loss 2.0272 LearningRate 0.0331 Epoch: 14 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:23:02,665-Speed 4934.16 samples/sec Loss 1.9935 LearningRate 0.0331 Epoch: 14 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:23:10,994-Speed 4918.65 samples/sec Loss 2.0143 LearningRate 0.0331 Epoch: 14 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:23:19,277-Speed 4945.96 samples/sec Loss 2.0404 LearningRate 0.0330 Epoch: 14 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:23:27,552-Speed 4950.14 samples/sec Loss 2.0160 LearningRate 0.0330 Epoch: 14 Global Step: 61870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:23:35,683-Speed 5038.43 samples/sec Loss 2.0026 LearningRate 0.0330 Epoch: 14 Global Step: 61880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:23:43,700-Speed 5110.26 samples/sec Loss 2.0263 LearningRate 0.0329 Epoch: 14 Global Step: 61890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:23:51,701-Speed 5120.02 samples/sec Loss 2.0034 LearningRate 0.0329 Epoch: 14 Global Step: 61900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:23:59,971-Speed 4953.71 samples/sec Loss 2.0411 LearningRate 0.0329 Epoch: 14 Global Step: 61910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:24:08,486-Speed 4811.16 samples/sec Loss 2.0174 LearningRate 0.0328 Epoch: 14 Global Step: 61920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:24:17,074-Speed 4769.75 samples/sec Loss 2.0079 LearningRate 0.0328 Epoch: 14 Global Step: 61930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:24:26,123-Speed 4527.12 samples/sec Loss 2.0247 LearningRate 0.0328 Epoch: 14 Global Step: 61940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:24:34,670-Speed 4793.30 samples/sec Loss 2.0104 LearningRate 0.0328 Epoch: 14 Global Step: 61950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:24:42,699-Speed 5102.15 samples/sec Loss 1.9976 LearningRate 0.0327 Epoch: 14 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:24:50,774-Speed 5072.99 samples/sec Loss 1.9921 LearningRate 0.0327 Epoch: 14 Global Step: 61970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:24:58,766-Speed 5125.15 samples/sec Loss 2.0093 LearningRate 0.0327 Epoch: 14 Global Step: 61980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:25:06,736-Speed 5140.48 samples/sec Loss 1.9822 LearningRate 0.0326 Epoch: 14 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:25:14,732-Speed 5123.60 samples/sec Loss 1.9985 LearningRate 0.0326 Epoch: 14 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:25:22,677-Speed 5155.84 samples/sec Loss 2.0076 LearningRate 0.0326 Epoch: 14 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:25:30,669-Speed 5125.64 samples/sec Loss 1.9983 LearningRate 0.0325 Epoch: 14 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:25:38,679-Speed 5114.53 samples/sec Loss 2.0023 LearningRate 0.0325 Epoch: 14 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:25:46,646-Speed 5141.96 samples/sec Loss 2.0096 LearningRate 0.0325 Epoch: 14 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:25:54,643-Speed 5122.57 samples/sec Loss 1.9713 LearningRate 0.0325 Epoch: 14 Global Step: 62050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:26:02,671-Speed 5103.10 samples/sec Loss 2.0426 LearningRate 0.0324 Epoch: 14 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:26:10,628-Speed 5147.62 samples/sec Loss 1.9953 LearningRate 0.0324 Epoch: 14 Global Step: 62070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:26:18,856-Speed 4979.36 samples/sec Loss 1.9821 LearningRate 0.0324 Epoch: 14 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:26:26,817-Speed 5145.59 samples/sec Loss 1.9813 LearningRate 0.0323 Epoch: 14 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:26:34,855-Speed 5096.46 samples/sec Loss 1.9759 LearningRate 0.0323 Epoch: 14 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:26:42,809-Speed 5150.50 samples/sec Loss 2.0014 LearningRate 0.0323 Epoch: 14 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:26:50,776-Speed 5141.94 samples/sec Loss 2.0017 LearningRate 0.0322 Epoch: 14 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:26:58,831-Speed 5085.33 samples/sec Loss 2.0183 LearningRate 0.0322 Epoch: 14 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:27:06,869-Speed 5096.63 samples/sec Loss 1.9942 LearningRate 0.0322 Epoch: 14 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:27:14,884-Speed 5111.26 samples/sec Loss 1.9864 LearningRate 0.0321 Epoch: 14 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:27:22,897-Speed 5112.61 samples/sec Loss 2.0062 LearningRate 0.0321 Epoch: 14 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:27:30,919-Speed 5106.56 samples/sec Loss 1.9823 LearningRate 0.0321 Epoch: 14 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:27:39,017-Speed 5058.73 samples/sec Loss 2.0009 LearningRate 0.0321 Epoch: 14 Global Step: 62180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:27:46,968-Speed 5152.16 samples/sec Loss 2.0053 LearningRate 0.0320 Epoch: 14 Global Step: 62190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:27:55,132-Speed 5018.18 samples/sec Loss 2.0086 LearningRate 0.0320 Epoch: 14 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:28:03,324-Speed 5000.24 samples/sec Loss 1.9866 LearningRate 0.0320 Epoch: 14 Global Step: 62210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:28:11,588-Speed 4956.89 samples/sec Loss 1.9959 LearningRate 0.0319 Epoch: 14 Global Step: 62220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:28:19,628-Speed 5095.54 samples/sec Loss 1.9673 LearningRate 0.0319 Epoch: 14 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:28:27,676-Speed 5090.05 samples/sec Loss 1.9835 LearningRate 0.0319 Epoch: 14 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:28:35,713-Speed 5097.58 samples/sec Loss 1.9691 LearningRate 0.0318 Epoch: 14 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:28:43,774-Speed 5081.62 samples/sec Loss 1.9752 LearningRate 0.0318 Epoch: 14 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:28:51,836-Speed 5081.42 samples/sec Loss 2.0069 LearningRate 0.0318 Epoch: 14 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:28:59,951-Speed 5048.68 samples/sec Loss 1.9942 LearningRate 0.0318 Epoch: 14 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:29:07,900-Speed 5152.75 samples/sec Loss 1.9552 LearningRate 0.0317 Epoch: 14 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:29:16,022-Speed 5044.19 samples/sec Loss 1.9609 LearningRate 0.0317 Epoch: 14 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:29:24,032-Speed 5113.97 samples/sec Loss 1.9947 LearningRate 0.0317 Epoch: 14 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:29:32,094-Speed 5081.60 samples/sec Loss 1.9652 LearningRate 0.0316 Epoch: 14 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:29:40,140-Speed 5091.39 samples/sec Loss 1.9748 LearningRate 0.0316 Epoch: 14 Global Step: 62330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:29:48,154-Speed 5111.68 samples/sec Loss 1.9796 LearningRate 0.0316 Epoch: 14 Global Step: 62340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:29:56,084-Speed 5165.55 samples/sec Loss 1.9875 LearningRate 0.0315 Epoch: 14 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:04,031-Speed 5155.40 samples/sec Loss 1.9520 LearningRate 0.0315 Epoch: 14 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:11,997-Speed 5142.26 samples/sec Loss 1.9684 LearningRate 0.0315 Epoch: 14 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:20,023-Speed 5104.07 samples/sec Loss 1.9545 LearningRate 0.0315 Epoch: 14 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:28,063-Speed 5095.44 samples/sec Loss 1.9489 LearningRate 0.0314 Epoch: 14 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:36,106-Speed 5093.01 samples/sec Loss 1.9595 LearningRate 0.0314 Epoch: 14 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:44,173-Speed 5078.25 samples/sec Loss 1.9381 LearningRate 0.0314 Epoch: 14 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:52,424-Speed 4964.62 samples/sec Loss 1.9796 LearningRate 0.0313 Epoch: 14 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:30:59,914-Speed 5469.70 samples/sec Loss 1.9583 LearningRate 0.0313 Epoch: 14 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:31:07,613-Speed 5320.93 samples/sec Loss 1.9755 LearningRate 0.0313 Epoch: 14 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:31:15,703-Speed 5063.27 samples/sec Loss 1.9733 LearningRate 0.0313 Epoch: 14 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:31:23,752-Speed 5089.33 samples/sec Loss 1.9329 LearningRate 0.0312 Epoch: 14 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:31:31,860-Speed 5052.81 samples/sec Loss 1.9686 LearningRate 0.0312 Epoch: 14 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:31:39,838-Speed 5134.95 samples/sec Loss 1.9551 LearningRate 0.0312 Epoch: 14 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:31:47,782-Speed 5156.15 samples/sec Loss 1.9696 LearningRate 0.0311 Epoch: 14 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:31:55,829-Speed 5091.56 samples/sec Loss 1.9222 LearningRate 0.0311 Epoch: 14 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:32:03,829-Speed 5121.03 samples/sec Loss 1.9117 LearningRate 0.0311 Epoch: 14 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:32:11,817-Speed 5128.70 samples/sec Loss 1.9467 LearningRate 0.0310 Epoch: 14 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:32:19,789-Speed 5138.56 samples/sec Loss 1.9891 LearningRate 0.0310 Epoch: 14 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:32:27,820-Speed 5100.82 samples/sec Loss 1.9703 LearningRate 0.0310 Epoch: 14 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:32:35,848-Speed 5102.55 samples/sec Loss 1.9542 LearningRate 0.0310 Epoch: 14 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:32:43,938-Speed 5063.80 samples/sec Loss 1.9197 LearningRate 0.0309 Epoch: 14 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:32:52,216-Speed 4948.96 samples/sec Loss 1.9437 LearningRate 0.0309 Epoch: 14 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:33:00,202-Speed 5129.57 samples/sec Loss 1.9360 LearningRate 0.0309 Epoch: 14 Global Step: 62580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:33:36,119-Speed 1140.45 samples/sec Loss 1.4532 LearningRate 0.0308 Epoch: 15 Global Step: 62590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:33:44,137-Speed 5109.54 samples/sec Loss 1.4939 LearningRate 0.0308 Epoch: 15 Global Step: 62600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:33:52,208-Speed 5075.89 samples/sec Loss 1.4535 LearningRate 0.0308 Epoch: 15 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:34:00,284-Speed 5072.10 samples/sec Loss 1.4213 LearningRate 0.0307 Epoch: 15 Global Step: 62620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:34:08,350-Speed 5079.08 samples/sec Loss 1.4324 LearningRate 0.0307 Epoch: 15 Global Step: 62630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:34:16,594-Speed 4969.10 samples/sec Loss 1.4499 LearningRate 0.0307 Epoch: 15 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:34:24,778-Speed 5005.90 samples/sec Loss 1.4473 LearningRate 0.0307 Epoch: 15 Global Step: 62650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:34:32,854-Speed 5072.39 samples/sec Loss 1.4484 LearningRate 0.0306 Epoch: 15 Global Step: 62660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:34:40,941-Speed 5064.98 samples/sec Loss 1.4590 LearningRate 0.0306 Epoch: 15 Global Step: 62670 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 12:34:49,334-Speed 4881.43 samples/sec Loss 1.4608 LearningRate 0.0306 Epoch: 15 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:34:57,346-Speed 5112.98 samples/sec Loss 1.4714 LearningRate 0.0305 Epoch: 15 Global Step: 62690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:35:05,427-Speed 5068.80 samples/sec Loss 1.4548 LearningRate 0.0305 Epoch: 15 Global Step: 62700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:35:13,505-Speed 5071.68 samples/sec Loss 1.4488 LearningRate 0.0305 Epoch: 15 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:35:21,459-Speed 5150.73 samples/sec Loss 1.4572 LearningRate 0.0305 Epoch: 15 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:35:29,467-Speed 5115.11 samples/sec Loss 1.4528 LearningRate 0.0304 Epoch: 15 Global Step: 62730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:35:37,481-Speed 5111.94 samples/sec Loss 1.4299 LearningRate 0.0304 Epoch: 15 Global Step: 62740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:35:45,476-Speed 5123.79 samples/sec Loss 1.4504 LearningRate 0.0304 Epoch: 15 Global Step: 62750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:35:53,901-Speed 4862.86 samples/sec Loss 1.4699 LearningRate 0.0303 Epoch: 15 Global Step: 62760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:36:02,018-Speed 5046.86 samples/sec Loss 1.4767 LearningRate 0.0303 Epoch: 15 Global Step: 62770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:36:09,991-Speed 5137.92 samples/sec Loss 1.4655 LearningRate 0.0303 Epoch: 15 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:36:17,995-Speed 5118.06 samples/sec Loss 1.4724 LearningRate 0.0302 Epoch: 15 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:36:26,495-Speed 4819.84 samples/sec Loss 1.4410 LearningRate 0.0302 Epoch: 15 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:36:34,479-Speed 5130.19 samples/sec Loss 1.4641 LearningRate 0.0302 Epoch: 15 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:36:42,422-Speed 5157.91 samples/sec Loss 1.4892 LearningRate 0.0302 Epoch: 15 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:36:50,391-Speed 5140.65 samples/sec Loss 1.4818 LearningRate 0.0301 Epoch: 15 Global Step: 62830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:36:58,647-Speed 4961.61 samples/sec Loss 1.4804 LearningRate 0.0301 Epoch: 15 Global Step: 62840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:37:07,467-Speed 4644.53 samples/sec Loss 1.4673 LearningRate 0.0301 Epoch: 15 Global Step: 62850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:37:16,218-Speed 4681.62 samples/sec Loss 1.4997 LearningRate 0.0300 Epoch: 15 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:37:25,026-Speed 4650.71 samples/sec Loss 1.4879 LearningRate 0.0300 Epoch: 15 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:37:33,832-Speed 4652.00 samples/sec Loss 1.5067 LearningRate 0.0300 Epoch: 15 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:37:42,635-Speed 4653.19 samples/sec Loss 1.5153 LearningRate 0.0300 Epoch: 15 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:37:51,481-Speed 4631.22 samples/sec Loss 1.4676 LearningRate 0.0299 Epoch: 15 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:38:00,327-Speed 4630.86 samples/sec Loss 1.4853 LearningRate 0.0299 Epoch: 15 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:38:09,178-Speed 4628.24 samples/sec Loss 1.5148 LearningRate 0.0299 Epoch: 15 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:38:17,936-Speed 4677.57 samples/sec Loss 1.4885 LearningRate 0.0298 Epoch: 15 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:38:26,683-Speed 4682.83 samples/sec Loss 1.4911 LearningRate 0.0298 Epoch: 15 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:38:35,421-Speed 4688.45 samples/sec Loss 1.4869 LearningRate 0.0298 Epoch: 15 Global Step: 62950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:38:44,224-Speed 4653.22 samples/sec Loss 1.4992 LearningRate 0.0297 Epoch: 15 Global Step: 62960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:38:52,137-Speed 5177.23 samples/sec Loss 1.5036 LearningRate 0.0297 Epoch: 15 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:39:00,225-Speed 5065.07 samples/sec Loss 1.5151 LearningRate 0.0297 Epoch: 15 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:39:08,592-Speed 4896.17 samples/sec Loss 1.5050 LearningRate 0.0297 Epoch: 15 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:39:16,870-Speed 4948.59 samples/sec Loss 1.5012 LearningRate 0.0296 Epoch: 15 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:39:24,934-Speed 5079.94 samples/sec Loss 1.5128 LearningRate 0.0296 Epoch: 15 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:39:33,115-Speed 5007.43 samples/sec Loss 1.5433 LearningRate 0.0296 Epoch: 15 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:39:41,251-Speed 5035.41 samples/sec Loss 1.5228 LearningRate 0.0295 Epoch: 15 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:39:49,304-Speed 5086.50 samples/sec Loss 1.5159 LearningRate 0.0295 Epoch: 15 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:39:57,230-Speed 5168.72 samples/sec Loss 1.5442 LearningRate 0.0295 Epoch: 15 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:40:05,197-Speed 5142.27 samples/sec Loss 1.5310 LearningRate 0.0295 Epoch: 15 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:40:13,254-Speed 5084.40 samples/sec Loss 1.5275 LearningRate 0.0294 Epoch: 15 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:40:21,439-Speed 5004.92 samples/sec Loss 1.5197 LearningRate 0.0294 Epoch: 15 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:40:29,430-Speed 5126.20 samples/sec Loss 1.5008 LearningRate 0.0294 Epoch: 15 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:40:37,424-Speed 5124.29 samples/sec Loss 1.5283 LearningRate 0.0293 Epoch: 15 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:40:45,400-Speed 5136.66 samples/sec Loss 1.5068 LearningRate 0.0293 Epoch: 15 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:40:53,333-Speed 5163.76 samples/sec Loss 1.5316 LearningRate 0.0293 Epoch: 15 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:41:01,314-Speed 5132.80 samples/sec Loss 1.5472 LearningRate 0.0293 Epoch: 15 Global Step: 63130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:41:09,293-Speed 5134.27 samples/sec Loss 1.5211 LearningRate 0.0292 Epoch: 15 Global Step: 63140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:41:16,844-Speed 5425.26 samples/sec Loss 1.5439 LearningRate 0.0292 Epoch: 15 Global Step: 63150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:41:24,910-Speed 5079.15 samples/sec Loss 1.5462 LearningRate 0.0292 Epoch: 15 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:41:32,997-Speed 5065.45 samples/sec Loss 1.5391 LearningRate 0.0291 Epoch: 15 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:41:41,206-Speed 4990.12 samples/sec Loss 1.5370 LearningRate 0.0291 Epoch: 15 Global Step: 63180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:41:49,202-Speed 5124.08 samples/sec Loss 1.5643 LearningRate 0.0291 Epoch: 15 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:41:57,174-Speed 5138.14 samples/sec Loss 1.5645 LearningRate 0.0291 Epoch: 15 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:42:05,111-Speed 5161.79 samples/sec Loss 1.5645 LearningRate 0.0290 Epoch: 15 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:42:13,027-Speed 5175.32 samples/sec Loss 1.5536 LearningRate 0.0290 Epoch: 15 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:42:21,083-Speed 5084.65 samples/sec Loss 1.5549 LearningRate 0.0290 Epoch: 15 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:42:29,084-Speed 5120.38 samples/sec Loss 1.5524 LearningRate 0.0289 Epoch: 15 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:42:37,124-Speed 5095.06 samples/sec Loss 1.5286 LearningRate 0.0289 Epoch: 15 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:42:45,106-Speed 5131.66 samples/sec Loss 1.5581 LearningRate 0.0289 Epoch: 15 Global Step: 63260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:42:53,213-Speed 5053.64 samples/sec Loss 1.5491 LearningRate 0.0289 Epoch: 15 Global Step: 63270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:43:01,158-Speed 5156.04 samples/sec Loss 1.5410 LearningRate 0.0288 Epoch: 15 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:43:09,132-Speed 5137.63 samples/sec Loss 1.5773 LearningRate 0.0288 Epoch: 15 Global Step: 63290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:43:17,049-Speed 5174.35 samples/sec Loss 1.5556 LearningRate 0.0288 Epoch: 15 Global Step: 63300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:43:25,027-Speed 5134.76 samples/sec Loss 1.5778 LearningRate 0.0287 Epoch: 15 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:43:33,105-Speed 5070.90 samples/sec Loss 1.5881 LearningRate 0.0287 Epoch: 15 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:43:41,031-Speed 5169.15 samples/sec Loss 1.5560 LearningRate 0.0287 Epoch: 15 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:43:48,993-Speed 5145.42 samples/sec Loss 1.5560 LearningRate 0.0287 Epoch: 15 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:43:56,946-Speed 5150.40 samples/sec Loss 1.5342 LearningRate 0.0286 Epoch: 15 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:44:05,097-Speed 5025.85 samples/sec Loss 1.5483 LearningRate 0.0286 Epoch: 15 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:44:13,041-Speed 5157.34 samples/sec Loss 1.5496 LearningRate 0.0286 Epoch: 15 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:44:21,322-Speed 4947.09 samples/sec Loss 1.5605 LearningRate 0.0285 Epoch: 15 Global Step: 63380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:44:29,619-Speed 4937.29 samples/sec Loss 1.6134 LearningRate 0.0285 Epoch: 15 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:44:37,932-Speed 4927.67 samples/sec Loss 1.5682 LearningRate 0.0285 Epoch: 15 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:44:45,967-Speed 5098.96 samples/sec Loss 1.6084 LearningRate 0.0285 Epoch: 15 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:44:54,013-Speed 5091.50 samples/sec Loss 1.5746 LearningRate 0.0284 Epoch: 15 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:45:01,981-Speed 5140.58 samples/sec Loss 1.5873 LearningRate 0.0284 Epoch: 15 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:45:10,035-Speed 5086.39 samples/sec Loss 1.5643 LearningRate 0.0284 Epoch: 15 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:45:18,051-Speed 5111.35 samples/sec Loss 1.5648 LearningRate 0.0283 Epoch: 15 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:45:25,981-Speed 5165.46 samples/sec Loss 1.5864 LearningRate 0.0283 Epoch: 15 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:45:33,916-Speed 5162.48 samples/sec Loss 1.5591 LearningRate 0.0283 Epoch: 15 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:45:41,869-Speed 5151.46 samples/sec Loss 1.5879 LearningRate 0.0283 Epoch: 15 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:45:49,950-Speed 5069.05 samples/sec Loss 1.5718 LearningRate 0.0282 Epoch: 15 Global Step: 63490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:45:58,008-Speed 5083.96 samples/sec Loss 1.5886 LearningRate 0.0282 Epoch: 15 Global Step: 63500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:46:06,138-Speed 5038.70 samples/sec Loss 1.5916 LearningRate 0.0282 Epoch: 15 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:46:14,160-Speed 5106.43 samples/sec Loss 1.6090 LearningRate 0.0281 Epoch: 15 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:46:22,198-Speed 5096.41 samples/sec Loss 1.5957 LearningRate 0.0281 Epoch: 15 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:46:30,109-Speed 5178.34 samples/sec Loss 1.5951 LearningRate 0.0281 Epoch: 15 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:46:38,350-Speed 4970.92 samples/sec Loss 1.5886 LearningRate 0.0281 Epoch: 15 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:46:46,275-Speed 5169.76 samples/sec Loss 1.6040 LearningRate 0.0280 Epoch: 15 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:46:54,223-Speed 5153.82 samples/sec Loss 1.6012 LearningRate 0.0280 Epoch: 15 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:02,217-Speed 5124.92 samples/sec Loss 1.5902 LearningRate 0.0280 Epoch: 15 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:10,176-Speed 5146.66 samples/sec Loss 1.6111 LearningRate 0.0279 Epoch: 15 Global Step: 63590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:18,253-Speed 5072.25 samples/sec Loss 1.5665 LearningRate 0.0279 Epoch: 15 Global Step: 63600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:26,226-Speed 5138.09 samples/sec Loss 1.6087 LearningRate 0.0279 Epoch: 15 Global Step: 63610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:34,144-Speed 5173.62 samples/sec Loss 1.5999 LearningRate 0.0279 Epoch: 15 Global Step: 63620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:42,326-Speed 5006.54 samples/sec Loss 1.6080 LearningRate 0.0278 Epoch: 15 Global Step: 63630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:50,726-Speed 4876.55 samples/sec Loss 1.5772 LearningRate 0.0278 Epoch: 15 Global Step: 63640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:47:58,733-Speed 5116.03 samples/sec Loss 1.5748 LearningRate 0.0278 Epoch: 15 Global Step: 63650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:48:06,565-Speed 5231.29 samples/sec Loss 1.6094 LearningRate 0.0278 Epoch: 15 Global Step: 63660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-17 12:48:14,695-Speed 5038.25 samples/sec Loss 1.6347 LearningRate 0.0277 Epoch: 15 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:48:22,671-Speed 5136.07 samples/sec Loss 1.5873 LearningRate 0.0277 Epoch: 15 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:48:30,660-Speed 5128.24 samples/sec Loss 1.5942 LearningRate 0.0277 Epoch: 15 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:48:38,714-Speed 5086.40 samples/sec Loss 1.5862 LearningRate 0.0276 Epoch: 15 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:48:46,680-Speed 5142.93 samples/sec Loss 1.5988 LearningRate 0.0276 Epoch: 15 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:48:54,654-Speed 5137.31 samples/sec Loss 1.6131 LearningRate 0.0276 Epoch: 15 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:49:02,609-Speed 5149.03 samples/sec Loss 1.6066 LearningRate 0.0276 Epoch: 15 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:49:10,548-Speed 5160.26 samples/sec Loss 1.5842 LearningRate 0.0275 Epoch: 15 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:49:18,486-Speed 5160.67 samples/sec Loss 1.5978 LearningRate 0.0275 Epoch: 15 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:49:26,399-Speed 5177.10 samples/sec Loss 1.5867 LearningRate 0.0275 Epoch: 15 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:49:34,516-Speed 5046.56 samples/sec Loss 1.5708 LearningRate 0.0274 Epoch: 15 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:49:42,756-Speed 4971.79 samples/sec Loss 1.6061 LearningRate 0.0274 Epoch: 15 Global Step: 63780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:49:50,674-Speed 5173.80 samples/sec Loss 1.6217 LearningRate 0.0274 Epoch: 15 Global Step: 63790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:49:58,766-Speed 5063.55 samples/sec Loss 1.5998 LearningRate 0.0274 Epoch: 15 Global Step: 63800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:50:06,628-Speed 5210.06 samples/sec Loss 1.5843 LearningRate 0.0273 Epoch: 15 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:50:14,603-Speed 5137.31 samples/sec Loss 1.5798 LearningRate 0.0273 Epoch: 15 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:50:22,561-Speed 5147.55 samples/sec Loss 1.6093 LearningRate 0.0273 Epoch: 15 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:50:30,594-Speed 5099.70 samples/sec Loss 1.5958 LearningRate 0.0272 Epoch: 15 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:50:38,582-Speed 5128.22 samples/sec Loss 1.6182 LearningRate 0.0272 Epoch: 15 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:50:46,875-Speed 4939.86 samples/sec Loss 1.6135 LearningRate 0.0272 Epoch: 15 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:50:54,855-Speed 5133.38 samples/sec Loss 1.6261 LearningRate 0.0272 Epoch: 15 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:02,723-Speed 5206.59 samples/sec Loss 1.5776 LearningRate 0.0271 Epoch: 15 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:10,667-Speed 5157.04 samples/sec Loss 1.6213 LearningRate 0.0271 Epoch: 15 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:18,673-Speed 5117.06 samples/sec Loss 1.5850 LearningRate 0.0271 Epoch: 15 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:26,590-Speed 5173.66 samples/sec Loss 1.6231 LearningRate 0.0271 Epoch: 15 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:34,674-Speed 5068.12 samples/sec Loss 1.5998 LearningRate 0.0270 Epoch: 15 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:42,653-Speed 5133.90 samples/sec Loss 1.5995 LearningRate 0.0270 Epoch: 15 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:50,789-Speed 5035.51 samples/sec Loss 1.6085 LearningRate 0.0270 Epoch: 15 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:51:58,782-Speed 5124.71 samples/sec Loss 1.5993 LearningRate 0.0269 Epoch: 15 Global Step: 63950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:52:06,760-Speed 5134.72 samples/sec Loss 1.6201 LearningRate 0.0269 Epoch: 15 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:52:14,706-Speed 5155.95 samples/sec Loss 1.6269 LearningRate 0.0269 Epoch: 15 Global Step: 63970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:52:22,721-Speed 5110.76 samples/sec Loss 1.6005 LearningRate 0.0269 Epoch: 15 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:52:30,616-Speed 5188.89 samples/sec Loss 1.6352 LearningRate 0.0268 Epoch: 15 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:52:38,533-Speed 5174.16 samples/sec Loss 1.6454 LearningRate 0.0268 Epoch: 15 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:52:46,422-Speed 5192.89 samples/sec Loss 1.6029 LearningRate 0.0268 Epoch: 15 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:52:54,383-Speed 5145.89 samples/sec Loss 1.6096 LearningRate 0.0268 Epoch: 15 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:53:02,325-Speed 5157.70 samples/sec Loss 1.6113 LearningRate 0.0267 Epoch: 15 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:53:10,224-Speed 5186.63 samples/sec Loss 1.6160 LearningRate 0.0267 Epoch: 15 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:53:18,117-Speed 5190.11 samples/sec Loss 1.6413 LearningRate 0.0267 Epoch: 15 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:53:26,055-Speed 5160.71 samples/sec Loss 1.6541 LearningRate 0.0266 Epoch: 15 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:53:34,109-Speed 5086.30 samples/sec Loss 1.5994 LearningRate 0.0266 Epoch: 15 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:53:42,048-Speed 5160.10 samples/sec Loss 1.6190 LearningRate 0.0266 Epoch: 15 Global Step: 64080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:53:49,930-Speed 5197.20 samples/sec Loss 1.6287 LearningRate 0.0266 Epoch: 15 Global Step: 64090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:53:57,952-Speed 5107.09 samples/sec Loss 1.6036 LearningRate 0.0265 Epoch: 15 Global Step: 64100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:54:06,139-Speed 5003.31 samples/sec Loss 1.6281 LearningRate 0.0265 Epoch: 15 Global Step: 64110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:54:14,334-Speed 4998.54 samples/sec Loss 1.6006 LearningRate 0.0265 Epoch: 15 Global Step: 64120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:54:22,549-Speed 4986.58 samples/sec Loss 1.5915 LearningRate 0.0264 Epoch: 15 Global Step: 64130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:54:30,567-Speed 5109.61 samples/sec Loss 1.6452 LearningRate 0.0264 Epoch: 15 Global Step: 64140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:54:39,106-Speed 4797.32 samples/sec Loss 1.6481 LearningRate 0.0264 Epoch: 15 Global Step: 64150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:54:47,144-Speed 5096.82 samples/sec Loss 1.6173 LearningRate 0.0264 Epoch: 15 Global Step: 64160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:54:55,056-Speed 5178.01 samples/sec Loss 1.6025 LearningRate 0.0263 Epoch: 15 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:03,037-Speed 5132.82 samples/sec Loss 1.6142 LearningRate 0.0263 Epoch: 15 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:11,056-Speed 5108.25 samples/sec Loss 1.5999 LearningRate 0.0263 Epoch: 15 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:19,108-Speed 5087.79 samples/sec Loss 1.6068 LearningRate 0.0263 Epoch: 15 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:27,089-Speed 5132.83 samples/sec Loss 1.5926 LearningRate 0.0262 Epoch: 15 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:34,921-Speed 5230.75 samples/sec Loss 1.6251 LearningRate 0.0262 Epoch: 15 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:42,871-Speed 5152.67 samples/sec Loss 1.6325 LearningRate 0.0262 Epoch: 15 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:50,851-Speed 5133.58 samples/sec Loss 1.6108 LearningRate 0.0261 Epoch: 15 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:55:58,863-Speed 5113.81 samples/sec Loss 1.6017 LearningRate 0.0261 Epoch: 15 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:56:06,809-Speed 5155.70 samples/sec Loss 1.6089 LearningRate 0.0261 Epoch: 15 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:56:14,739-Speed 5165.29 samples/sec Loss 1.6086 LearningRate 0.0261 Epoch: 15 Global Step: 64270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:56:22,935-Speed 4998.30 samples/sec Loss 1.6101 LearningRate 0.0260 Epoch: 15 Global Step: 64280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:56:30,959-Speed 5105.67 samples/sec Loss 1.6043 LearningRate 0.0260 Epoch: 15 Global Step: 64290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:56:39,014-Speed 5085.76 samples/sec Loss 1.6229 LearningRate 0.0260 Epoch: 15 Global Step: 64300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:56:47,155-Speed 5031.73 samples/sec Loss 1.6232 LearningRate 0.0260 Epoch: 15 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:56:55,285-Speed 5039.10 samples/sec Loss 1.6434 LearningRate 0.0259 Epoch: 15 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:57:03,656-Speed 4893.47 samples/sec Loss 1.6242 LearningRate 0.0259 Epoch: 15 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:57:11,931-Speed 4950.65 samples/sec Loss 1.6177 LearningRate 0.0259 Epoch: 15 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:57:19,909-Speed 5135.03 samples/sec Loss 1.6168 LearningRate 0.0258 Epoch: 15 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:57:28,011-Speed 5055.93 samples/sec Loss 1.5994 LearningRate 0.0258 Epoch: 15 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:57:36,112-Speed 5056.83 samples/sec Loss 1.6526 LearningRate 0.0258 Epoch: 15 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:57:44,049-Speed 5161.66 samples/sec Loss 1.5969 LearningRate 0.0258 Epoch: 15 Global Step: 64380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:57:52,403-Speed 4903.21 samples/sec Loss 1.6289 LearningRate 0.0257 Epoch: 15 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:00,657-Speed 4963.53 samples/sec Loss 1.5981 LearningRate 0.0257 Epoch: 15 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:08,670-Speed 5112.23 samples/sec Loss 1.6043 LearningRate 0.0257 Epoch: 15 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:16,611-Speed 5158.62 samples/sec Loss 1.5896 LearningRate 0.0257 Epoch: 15 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:24,562-Speed 5152.65 samples/sec Loss 1.6505 LearningRate 0.0256 Epoch: 15 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:32,540-Speed 5134.68 samples/sec Loss 1.6206 LearningRate 0.0256 Epoch: 15 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:40,626-Speed 5065.53 samples/sec Loss 1.6397 LearningRate 0.0256 Epoch: 15 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:48,665-Speed 5096.08 samples/sec Loss 1.6242 LearningRate 0.0256 Epoch: 15 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:58:56,712-Speed 5091.13 samples/sec Loss 1.6479 LearningRate 0.0255 Epoch: 15 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:59:04,736-Speed 5105.32 samples/sec Loss 1.6054 LearningRate 0.0255 Epoch: 15 Global Step: 64480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:59:12,796-Speed 5082.74 samples/sec Loss 1.6312 LearningRate 0.0255 Epoch: 15 Global Step: 64490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:59:20,835-Speed 5095.97 samples/sec Loss 1.6271 LearningRate 0.0254 Epoch: 15 Global Step: 64500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 12:59:28,790-Speed 5149.31 samples/sec Loss 1.6349 LearningRate 0.0254 Epoch: 15 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:59:36,742-Speed 5151.89 samples/sec Loss 1.6188 LearningRate 0.0254 Epoch: 15 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:59:44,696-Speed 5150.04 samples/sec Loss 1.6225 LearningRate 0.0254 Epoch: 15 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 12:59:52,602-Speed 5182.05 samples/sec Loss 1.6202 LearningRate 0.0253 Epoch: 15 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 13:00:00,520-Speed 5173.08 samples/sec Loss 1.6287 LearningRate 0.0253 Epoch: 15 Global Step: 64550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 13:00:08,502-Speed 5132.54 samples/sec Loss 1.6187 LearningRate 0.0253 Epoch: 15 Global Step: 64560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:00:16,550-Speed 5090.25 samples/sec Loss 1.6168 LearningRate 0.0253 Epoch: 15 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:00:24,582-Speed 5100.65 samples/sec Loss 1.6067 LearningRate 0.0252 Epoch: 15 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:00:32,605-Speed 5105.77 samples/sec Loss 1.6268 LearningRate 0.0252 Epoch: 15 Global Step: 64590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:00:41,025-Speed 4864.98 samples/sec Loss 1.6190 LearningRate 0.0252 Epoch: 15 Global Step: 64600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:00:49,308-Speed 4945.43 samples/sec Loss 1.6151 LearningRate 0.0251 Epoch: 15 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:00:57,344-Speed 5098.38 samples/sec Loss 1.6418 LearningRate 0.0251 Epoch: 15 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:01:05,453-Speed 5051.22 samples/sec Loss 1.6097 LearningRate 0.0251 Epoch: 15 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:01:13,396-Speed 5158.07 samples/sec Loss 1.6097 LearningRate 0.0251 Epoch: 15 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:01:21,426-Speed 5101.52 samples/sec Loss 1.6166 LearningRate 0.0250 Epoch: 15 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:01:29,405-Speed 5133.91 samples/sec Loss 1.6111 LearningRate 0.0250 Epoch: 15 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:01:37,144-Speed 5293.61 samples/sec Loss 1.6237 LearningRate 0.0250 Epoch: 15 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:01:44,998-Speed 5215.52 samples/sec Loss 1.6452 LearningRate 0.0250 Epoch: 15 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:01:53,020-Speed 5106.98 samples/sec Loss 1.6422 LearningRate 0.0249 Epoch: 15 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:02:00,974-Speed 5150.06 samples/sec Loss 1.6120 LearningRate 0.0249 Epoch: 15 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:02:09,017-Speed 5093.75 samples/sec Loss 1.6405 LearningRate 0.0249 Epoch: 15 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:02:17,212-Speed 4998.69 samples/sec Loss 1.6160 LearningRate 0.0249 Epoch: 15 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:02:25,349-Speed 5034.59 samples/sec Loss 1.5983 LearningRate 0.0248 Epoch: 15 Global Step: 64730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:02:33,295-Speed 5155.22 samples/sec Loss 1.5982 LearningRate 0.0248 Epoch: 15 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:02:41,322-Speed 5103.68 samples/sec Loss 1.6104 LearningRate 0.0248 Epoch: 15 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:02:49,426-Speed 5054.65 samples/sec Loss 1.6048 LearningRate 0.0248 Epoch: 15 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:02:57,600-Speed 5011.87 samples/sec Loss 1.6124 LearningRate 0.0247 Epoch: 15 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:03:06,182-Speed 4773.48 samples/sec Loss 1.6247 LearningRate 0.0247 Epoch: 15 Global Step: 64780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:03:14,759-Speed 4776.33 samples/sec Loss 1.6202 LearningRate 0.0247 Epoch: 15 Global Step: 64790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:03:23,413-Speed 4733.35 samples/sec Loss 1.5889 LearningRate 0.0246 Epoch: 15 Global Step: 64800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:03:32,012-Speed 4764.63 samples/sec Loss 1.6411 LearningRate 0.0246 Epoch: 15 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:03:40,679-Speed 4726.49 samples/sec Loss 1.5816 LearningRate 0.0246 Epoch: 15 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:03:49,349-Speed 4724.93 samples/sec Loss 1.5938 LearningRate 0.0246 Epoch: 15 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:03:57,924-Speed 4777.21 samples/sec Loss 1.6145 LearningRate 0.0245 Epoch: 15 Global Step: 64840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:04:06,777-Speed 4627.47 samples/sec Loss 1.5966 LearningRate 0.0245 Epoch: 15 Global Step: 64850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:04:15,441-Speed 4727.91 samples/sec Loss 1.5984 LearningRate 0.0245 Epoch: 15 Global Step: 64860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:04:24,064-Speed 4750.82 samples/sec Loss 1.5946 LearningRate 0.0245 Epoch: 15 Global Step: 64870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:04:32,660-Speed 4765.81 samples/sec Loss 1.6300 LearningRate 0.0244 Epoch: 15 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:04:41,246-Speed 4771.09 samples/sec Loss 1.5886 LearningRate 0.0244 Epoch: 15 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:04:49,846-Speed 4763.58 samples/sec Loss 1.6134 LearningRate 0.0244 Epoch: 15 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:04:58,334-Speed 4825.62 samples/sec Loss 1.6082 LearningRate 0.0244 Epoch: 15 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:05:06,382-Speed 5090.65 samples/sec Loss 1.6156 LearningRate 0.0243 Epoch: 15 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:05:14,416-Speed 5098.75 samples/sec Loss 1.6086 LearningRate 0.0243 Epoch: 15 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:05:22,478-Speed 5081.39 samples/sec Loss 1.5745 LearningRate 0.0243 Epoch: 15 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:05:30,438-Speed 5146.62 samples/sec Loss 1.6339 LearningRate 0.0242 Epoch: 15 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:05:38,649-Speed 4989.19 samples/sec Loss 1.5847 LearningRate 0.0242 Epoch: 15 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:05:46,909-Speed 4959.60 samples/sec Loss 1.6080 LearningRate 0.0242 Epoch: 15 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:05:55,021-Speed 5049.64 samples/sec Loss 1.6104 LearningRate 0.0242 Epoch: 15 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:06:03,092-Speed 5075.14 samples/sec Loss 1.5908 LearningRate 0.0241 Epoch: 15 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:06:11,168-Speed 5073.30 samples/sec Loss 1.6110 LearningRate 0.0241 Epoch: 15 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:06:57,797-[lfw][65000]XNorm: 21.732957 Training: 2022-01-17 13:06:57,798-[lfw][65000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-01-17 13:06:57,799-[lfw][65000]Accuracy-Highest: 0.99817 Training: 2022-01-17 13:07:51,968-[cfp_fp][65000]XNorm: 20.671926 Training: 2022-01-17 13:07:51,969-[cfp_fp][65000]Accuracy-Flip: 0.98957+-0.00483 Training: 2022-01-17 13:07:51,969-[cfp_fp][65000]Accuracy-Highest: 0.98957 Training: 2022-01-17 13:08:38,606-[agedb_30][65000]XNorm: 22.084170 Training: 2022-01-17 13:08:38,607-[agedb_30][65000]Accuracy-Flip: 0.98333+-0.00730 Training: 2022-01-17 13:08:38,607-[agedb_30][65000]Accuracy-Highest: 0.98333 Training: 2022-01-17 13:08:46,676-Speed 263.40 samples/sec Loss 1.6245 LearningRate 0.0241 Epoch: 15 Global Step: 65010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:08:54,756-Speed 5069.64 samples/sec Loss 1.5948 LearningRate 0.0241 Epoch: 15 Global Step: 65020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:09:02,733-Speed 5135.70 samples/sec Loss 1.6137 LearningRate 0.0240 Epoch: 15 Global Step: 65030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:09:10,657-Speed 5169.57 samples/sec Loss 1.6079 LearningRate 0.0240 Epoch: 15 Global Step: 65040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:09:18,611-Speed 5150.43 samples/sec Loss 1.5930 LearningRate 0.0240 Epoch: 15 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:09:26,651-Speed 5095.31 samples/sec Loss 1.5951 LearningRate 0.0240 Epoch: 15 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:09:34,753-Speed 5056.17 samples/sec Loss 1.6216 LearningRate 0.0239 Epoch: 15 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:09:42,792-Speed 5095.88 samples/sec Loss 1.6250 LearningRate 0.0239 Epoch: 15 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:09:50,598-Speed 5247.93 samples/sec Loss 1.6081 LearningRate 0.0239 Epoch: 15 Global Step: 65090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:09:58,545-Speed 5154.81 samples/sec Loss 1.6148 LearningRate 0.0239 Epoch: 15 Global Step: 65100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:10:06,727-Speed 5006.82 samples/sec Loss 1.6060 LearningRate 0.0238 Epoch: 15 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:10:14,878-Speed 5025.70 samples/sec Loss 1.5899 LearningRate 0.0238 Epoch: 15 Global Step: 65120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:10:23,160-Speed 4946.38 samples/sec Loss 1.5835 LearningRate 0.0238 Epoch: 15 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:10:31,247-Speed 5065.51 samples/sec Loss 1.6272 LearningRate 0.0238 Epoch: 15 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:10:39,472-Speed 4980.68 samples/sec Loss 1.6106 LearningRate 0.0237 Epoch: 15 Global Step: 65150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:10:47,608-Speed 5034.73 samples/sec Loss 1.5920 LearningRate 0.0237 Epoch: 15 Global Step: 65160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:10:55,639-Speed 5101.04 samples/sec Loss 1.6086 LearningRate 0.0237 Epoch: 15 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:11:03,731-Speed 5062.30 samples/sec Loss 1.5828 LearningRate 0.0236 Epoch: 15 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:11:11,735-Speed 5118.20 samples/sec Loss 1.5958 LearningRate 0.0236 Epoch: 15 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:11:19,771-Speed 5097.65 samples/sec Loss 1.6294 LearningRate 0.0236 Epoch: 15 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:11:27,801-Speed 5101.70 samples/sec Loss 1.6476 LearningRate 0.0236 Epoch: 15 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:11:35,853-Speed 5087.82 samples/sec Loss 1.6135 LearningRate 0.0235 Epoch: 15 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:11:43,920-Speed 5077.93 samples/sec Loss 1.5774 LearningRate 0.0235 Epoch: 15 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:11:52,290-Speed 4894.33 samples/sec Loss 1.5665 LearningRate 0.0235 Epoch: 15 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:12:00,464-Speed 5011.99 samples/sec Loss 1.6149 LearningRate 0.0235 Epoch: 15 Global Step: 65250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:12:08,690-Speed 4980.37 samples/sec Loss 1.5906 LearningRate 0.0234 Epoch: 15 Global Step: 65260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:12:16,767-Speed 5071.53 samples/sec Loss 1.5810 LearningRate 0.0234 Epoch: 15 Global Step: 65270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:12:24,816-Speed 5089.80 samples/sec Loss 1.5746 LearningRate 0.0234 Epoch: 15 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:12:33,089-Speed 4951.79 samples/sec Loss 1.5924 LearningRate 0.0234 Epoch: 15 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:12:41,851-Speed 4675.32 samples/sec Loss 1.6076 LearningRate 0.0233 Epoch: 15 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:12:50,610-Speed 4676.47 samples/sec Loss 1.5986 LearningRate 0.0233 Epoch: 15 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:12:59,420-Speed 4650.17 samples/sec Loss 1.6069 LearningRate 0.0233 Epoch: 15 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:13:08,275-Speed 4626.44 samples/sec Loss 1.5752 LearningRate 0.0233 Epoch: 15 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:13:17,104-Speed 4639.56 samples/sec Loss 1.6076 LearningRate 0.0232 Epoch: 15 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:13:25,913-Speed 4650.26 samples/sec Loss 1.5989 LearningRate 0.0232 Epoch: 15 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:13:34,806-Speed 4606.27 samples/sec Loss 1.5702 LearningRate 0.0232 Epoch: 15 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:13:43,630-Speed 4642.55 samples/sec Loss 1.5858 LearningRate 0.0232 Epoch: 15 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:13:52,022-Speed 4881.69 samples/sec Loss 1.6131 LearningRate 0.0231 Epoch: 15 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:13:59,986-Speed 5143.75 samples/sec Loss 1.5886 LearningRate 0.0231 Epoch: 15 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:14:08,223-Speed 4973.17 samples/sec Loss 1.5893 LearningRate 0.0231 Epoch: 15 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:14:16,395-Speed 5013.00 samples/sec Loss 1.6173 LearningRate 0.0231 Epoch: 15 Global Step: 65410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:14:24,401-Speed 5117.72 samples/sec Loss 1.6000 LearningRate 0.0230 Epoch: 15 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:14:32,386-Speed 5130.12 samples/sec Loss 1.5785 LearningRate 0.0230 Epoch: 15 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:14:40,374-Speed 5128.05 samples/sec Loss 1.5515 LearningRate 0.0230 Epoch: 15 Global Step: 65440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:14:48,404-Speed 5101.80 samples/sec Loss 1.5987 LearningRate 0.0230 Epoch: 15 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:14:56,627-Speed 4981.70 samples/sec Loss 1.5898 LearningRate 0.0229 Epoch: 15 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:15:04,639-Speed 5113.72 samples/sec Loss 1.5680 LearningRate 0.0229 Epoch: 15 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:15:12,707-Speed 5077.65 samples/sec Loss 1.5901 LearningRate 0.0229 Epoch: 15 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:15:20,775-Speed 5077.76 samples/sec Loss 1.5808 LearningRate 0.0229 Epoch: 15 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:15:28,851-Speed 5072.41 samples/sec Loss 1.5795 LearningRate 0.0228 Epoch: 15 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:15:36,989-Speed 5033.41 samples/sec Loss 1.5832 LearningRate 0.0228 Epoch: 15 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:15:45,084-Speed 5061.05 samples/sec Loss 1.5674 LearningRate 0.0228 Epoch: 15 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:15:53,150-Speed 5078.38 samples/sec Loss 1.5851 LearningRate 0.0228 Epoch: 15 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:16:01,223-Speed 5074.18 samples/sec Loss 1.5731 LearningRate 0.0227 Epoch: 15 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:16:09,231-Speed 5115.54 samples/sec Loss 1.5714 LearningRate 0.0227 Epoch: 15 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:16:17,165-Speed 5163.46 samples/sec Loss 1.5930 LearningRate 0.0227 Epoch: 15 Global Step: 65560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:16:25,247-Speed 5068.56 samples/sec Loss 1.5705 LearningRate 0.0227 Epoch: 15 Global Step: 65570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:16:33,181-Speed 5163.32 samples/sec Loss 1.6041 LearningRate 0.0226 Epoch: 15 Global Step: 65580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:16:41,184-Speed 5119.10 samples/sec Loss 1.5881 LearningRate 0.0226 Epoch: 15 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:16:48,989-Speed 5248.78 samples/sec Loss 1.5717 LearningRate 0.0226 Epoch: 15 Global Step: 65600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:16:56,918-Speed 5166.19 samples/sec Loss 1.5716 LearningRate 0.0225 Epoch: 15 Global Step: 65610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:17:04,885-Speed 5142.14 samples/sec Loss 1.5852 LearningRate 0.0225 Epoch: 15 Global Step: 65620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:17:12,856-Speed 5139.28 samples/sec Loss 1.5973 LearningRate 0.0225 Epoch: 15 Global Step: 65630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:17:20,908-Speed 5087.88 samples/sec Loss 1.5826 LearningRate 0.0225 Epoch: 15 Global Step: 65640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:17:28,917-Speed 5114.84 samples/sec Loss 1.5866 LearningRate 0.0224 Epoch: 15 Global Step: 65650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:17:37,030-Speed 5049.38 samples/sec Loss 1.5739 LearningRate 0.0224 Epoch: 15 Global Step: 65660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 13:17:45,299-Speed 4953.76 samples/sec Loss 1.5801 LearningRate 0.0224 Epoch: 15 Global Step: 65670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:17:53,322-Speed 5106.31 samples/sec Loss 1.6009 LearningRate 0.0224 Epoch: 15 Global Step: 65680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:18:01,450-Speed 5039.79 samples/sec Loss 1.5833 LearningRate 0.0223 Epoch: 15 Global Step: 65690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:18:09,280-Speed 5231.65 samples/sec Loss 1.5551 LearningRate 0.0223 Epoch: 15 Global Step: 65700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:18:17,368-Speed 5065.17 samples/sec Loss 1.5812 LearningRate 0.0223 Epoch: 15 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:18:25,402-Speed 5099.23 samples/sec Loss 1.5817 LearningRate 0.0223 Epoch: 15 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:18:33,323-Speed 5171.15 samples/sec Loss 1.5873 LearningRate 0.0222 Epoch: 15 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:18:41,598-Speed 4950.52 samples/sec Loss 1.5521 LearningRate 0.0222 Epoch: 15 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:18:49,802-Speed 4993.42 samples/sec Loss 1.5701 LearningRate 0.0222 Epoch: 15 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:18:58,010-Speed 4991.07 samples/sec Loss 1.5718 LearningRate 0.0222 Epoch: 15 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:19:06,034-Speed 5105.05 samples/sec Loss 1.5748 LearningRate 0.0221 Epoch: 15 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:19:14,096-Speed 5081.73 samples/sec Loss 1.6080 LearningRate 0.0221 Epoch: 15 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:19:22,042-Speed 5155.63 samples/sec Loss 1.5782 LearningRate 0.0221 Epoch: 15 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:19:30,185-Speed 5030.73 samples/sec Loss 1.5493 LearningRate 0.0221 Epoch: 15 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:19:38,158-Speed 5137.80 samples/sec Loss 1.5863 LearningRate 0.0220 Epoch: 15 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:19:46,124-Speed 5142.24 samples/sec Loss 1.5551 LearningRate 0.0220 Epoch: 15 Global Step: 65820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:19:54,009-Speed 5196.06 samples/sec Loss 1.5523 LearningRate 0.0220 Epoch: 15 Global Step: 65830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:20:02,029-Speed 5107.95 samples/sec Loss 1.5596 LearningRate 0.0220 Epoch: 15 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:20:10,163-Speed 5035.58 samples/sec Loss 1.5602 LearningRate 0.0219 Epoch: 15 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:20:18,138-Speed 5137.09 samples/sec Loss 1.5899 LearningRate 0.0219 Epoch: 15 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:20:26,750-Speed 4757.23 samples/sec Loss 1.5563 LearningRate 0.0219 Epoch: 15 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:20:35,683-Speed 4585.51 samples/sec Loss 1.5950 LearningRate 0.0219 Epoch: 15 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:20:44,552-Speed 4619.30 samples/sec Loss 1.5653 LearningRate 0.0218 Epoch: 15 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:20:53,419-Speed 4619.93 samples/sec Loss 1.5576 LearningRate 0.0218 Epoch: 15 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:21:02,263-Speed 4631.83 samples/sec Loss 1.5709 LearningRate 0.0218 Epoch: 15 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:21:11,135-Speed 4617.69 samples/sec Loss 1.5661 LearningRate 0.0218 Epoch: 15 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:21:19,521-Speed 4884.92 samples/sec Loss 1.5751 LearningRate 0.0217 Epoch: 15 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:21:27,548-Speed 5103.62 samples/sec Loss 1.5400 LearningRate 0.0217 Epoch: 15 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:21:35,592-Speed 5092.36 samples/sec Loss 1.5589 LearningRate 0.0217 Epoch: 15 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:21:43,630-Speed 5096.34 samples/sec Loss 1.5652 LearningRate 0.0217 Epoch: 15 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:21:51,624-Speed 5124.68 samples/sec Loss 1.5437 LearningRate 0.0216 Epoch: 15 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:21:59,761-Speed 5034.48 samples/sec Loss 1.5572 LearningRate 0.0216 Epoch: 15 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:22:07,790-Speed 5102.59 samples/sec Loss 1.5471 LearningRate 0.0216 Epoch: 15 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:22:15,760-Speed 5139.76 samples/sec Loss 1.5759 LearningRate 0.0216 Epoch: 15 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:22:23,856-Speed 5059.79 samples/sec Loss 1.5336 LearningRate 0.0215 Epoch: 15 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:22:32,218-Speed 4899.54 samples/sec Loss 1.5406 LearningRate 0.0215 Epoch: 15 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:22:40,626-Speed 4872.26 samples/sec Loss 1.5396 LearningRate 0.0215 Epoch: 15 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:22:48,884-Speed 4960.57 samples/sec Loss 1.5363 LearningRate 0.0215 Epoch: 15 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:22:56,854-Speed 5140.43 samples/sec Loss 1.5735 LearningRate 0.0214 Epoch: 15 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:23:04,927-Speed 5074.01 samples/sec Loss 1.5789 LearningRate 0.0214 Epoch: 15 Global Step: 66060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:23:12,980-Speed 5087.18 samples/sec Loss 1.5945 LearningRate 0.0214 Epoch: 15 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:23:20,963-Speed 5131.74 samples/sec Loss 1.5582 LearningRate 0.0214 Epoch: 15 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:23:29,066-Speed 5055.29 samples/sec Loss 1.5282 LearningRate 0.0214 Epoch: 15 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:23:37,250-Speed 5006.07 samples/sec Loss 1.5523 LearningRate 0.0213 Epoch: 15 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:23:45,402-Speed 5025.52 samples/sec Loss 1.5681 LearningRate 0.0213 Epoch: 15 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:23:53,371-Speed 5140.56 samples/sec Loss 1.5550 LearningRate 0.0213 Epoch: 15 Global Step: 66120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:24:01,322-Speed 5151.90 samples/sec Loss 1.5248 LearningRate 0.0213 Epoch: 15 Global Step: 66130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:24:09,452-Speed 5039.20 samples/sec Loss 1.5368 LearningRate 0.0212 Epoch: 15 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:24:17,951-Speed 4820.07 samples/sec Loss 1.5408 LearningRate 0.0212 Epoch: 15 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:24:26,722-Speed 4670.46 samples/sec Loss 1.5473 LearningRate 0.0212 Epoch: 15 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:24:35,486-Speed 4674.03 samples/sec Loss 1.5348 LearningRate 0.0212 Epoch: 15 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:24:43,814-Speed 4919.24 samples/sec Loss 1.5312 LearningRate 0.0211 Epoch: 15 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:24:52,067-Speed 4963.57 samples/sec Loss 1.5417 LearningRate 0.0211 Epoch: 15 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:25:00,207-Speed 5032.40 samples/sec Loss 1.5337 LearningRate 0.0211 Epoch: 15 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:25:08,183-Speed 5136.29 samples/sec Loss 1.5437 LearningRate 0.0211 Epoch: 15 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:25:16,589-Speed 4873.52 samples/sec Loss 1.5433 LearningRate 0.0210 Epoch: 15 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:25:24,913-Speed 4921.50 samples/sec Loss 1.5098 LearningRate 0.0210 Epoch: 15 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:25:32,851-Speed 5160.42 samples/sec Loss 1.5326 LearningRate 0.0210 Epoch: 15 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:25:40,969-Speed 5046.08 samples/sec Loss 1.5339 LearningRate 0.0210 Epoch: 15 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:25:49,011-Speed 5094.38 samples/sec Loss 1.5409 LearningRate 0.0209 Epoch: 15 Global Step: 66260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:25:57,047-Speed 5097.06 samples/sec Loss 1.5510 LearningRate 0.0209 Epoch: 15 Global Step: 66270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:26:05,332-Speed 4944.58 samples/sec Loss 1.5380 LearningRate 0.0209 Epoch: 15 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:26:13,308-Speed 5136.46 samples/sec Loss 1.5264 LearningRate 0.0209 Epoch: 15 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:26:21,323-Speed 5111.26 samples/sec Loss 1.5084 LearningRate 0.0208 Epoch: 15 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:26:29,471-Speed 5027.90 samples/sec Loss 1.5468 LearningRate 0.0208 Epoch: 15 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:26:37,388-Speed 5174.49 samples/sec Loss 1.5263 LearningRate 0.0208 Epoch: 15 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:26:45,706-Speed 4924.61 samples/sec Loss 1.5192 LearningRate 0.0208 Epoch: 15 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:26:53,725-Speed 5108.82 samples/sec Loss 1.5404 LearningRate 0.0207 Epoch: 15 Global Step: 66340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:27:01,746-Speed 5107.33 samples/sec Loss 1.5098 LearningRate 0.0207 Epoch: 15 Global Step: 66350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:27:09,869-Speed 5042.84 samples/sec Loss 1.5342 LearningRate 0.0207 Epoch: 15 Global Step: 66360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:27:18,003-Speed 5036.43 samples/sec Loss 1.5079 LearningRate 0.0207 Epoch: 15 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:27:26,124-Speed 5044.29 samples/sec Loss 1.5499 LearningRate 0.0206 Epoch: 15 Global Step: 66380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:27:34,147-Speed 5105.78 samples/sec Loss 1.5388 LearningRate 0.0206 Epoch: 15 Global Step: 66390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:27:42,138-Speed 5126.24 samples/sec Loss 1.5132 LearningRate 0.0206 Epoch: 15 Global Step: 66400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:27:50,297-Speed 5021.35 samples/sec Loss 1.5296 LearningRate 0.0206 Epoch: 15 Global Step: 66410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:27:58,388-Speed 5062.98 samples/sec Loss 1.5220 LearningRate 0.0205 Epoch: 15 Global Step: 66420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:28:06,359-Speed 5139.17 samples/sec Loss 1.5051 LearningRate 0.0205 Epoch: 15 Global Step: 66430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:28:14,265-Speed 5181.30 samples/sec Loss 1.5242 LearningRate 0.0205 Epoch: 15 Global Step: 66440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:28:22,424-Speed 5021.44 samples/sec Loss 1.5083 LearningRate 0.0205 Epoch: 15 Global Step: 66450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:28:30,664-Speed 4971.48 samples/sec Loss 1.5209 LearningRate 0.0205 Epoch: 15 Global Step: 66460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-17 13:28:38,741-Speed 5071.26 samples/sec Loss 1.5548 LearningRate 0.0204 Epoch: 15 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:28:46,703-Speed 5145.82 samples/sec Loss 1.5275 LearningRate 0.0204 Epoch: 15 Global Step: 66480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:28:54,970-Speed 4955.25 samples/sec Loss 1.5015 LearningRate 0.0204 Epoch: 15 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:03,174-Speed 4992.86 samples/sec Loss 1.4974 LearningRate 0.0204 Epoch: 15 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:11,190-Speed 5110.48 samples/sec Loss 1.4902 LearningRate 0.0203 Epoch: 15 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:19,292-Speed 5056.56 samples/sec Loss 1.5054 LearningRate 0.0203 Epoch: 15 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:27,254-Speed 5145.20 samples/sec Loss 1.4772 LearningRate 0.0203 Epoch: 15 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:35,417-Speed 5018.41 samples/sec Loss 1.4790 LearningRate 0.0203 Epoch: 15 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:43,478-Speed 5082.28 samples/sec Loss 1.5174 LearningRate 0.0202 Epoch: 15 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:51,553-Speed 5073.03 samples/sec Loss 1.4839 LearningRate 0.0202 Epoch: 15 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:29:59,760-Speed 4991.56 samples/sec Loss 1.4767 LearningRate 0.0202 Epoch: 15 Global Step: 66570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:30:08,058-Speed 4936.78 samples/sec Loss 1.4902 LearningRate 0.0202 Epoch: 15 Global Step: 66580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:30:16,044-Speed 5128.80 samples/sec Loss 1.5071 LearningRate 0.0201 Epoch: 15 Global Step: 66590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:30:24,061-Speed 5110.18 samples/sec Loss 1.4958 LearningRate 0.0201 Epoch: 15 Global Step: 66600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:30:32,110-Speed 5089.76 samples/sec Loss 1.4850 LearningRate 0.0201 Epoch: 15 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:30:41,020-Speed 4597.56 samples/sec Loss 1.5000 LearningRate 0.0201 Epoch: 15 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:30:49,839-Speed 4644.90 samples/sec Loss 1.4894 LearningRate 0.0200 Epoch: 15 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:30:58,595-Speed 4678.92 samples/sec Loss 1.4956 LearningRate 0.0200 Epoch: 15 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:31:07,380-Speed 4662.74 samples/sec Loss 1.5180 LearningRate 0.0200 Epoch: 15 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:31:16,293-Speed 4596.27 samples/sec Loss 1.5000 LearningRate 0.0200 Epoch: 15 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:31:25,278-Speed 4559.18 samples/sec Loss 1.4967 LearningRate 0.0199 Epoch: 15 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:31:34,038-Speed 4676.81 samples/sec Loss 1.4773 LearningRate 0.0199 Epoch: 15 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:31:42,907-Speed 4618.36 samples/sec Loss 1.5090 LearningRate 0.0199 Epoch: 15 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:31:51,680-Speed 4669.24 samples/sec Loss 1.5046 LearningRate 0.0199 Epoch: 15 Global Step: 66700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:32:00,375-Speed 4711.67 samples/sec Loss 1.4859 LearningRate 0.0199 Epoch: 15 Global Step: 66710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:32:08,374-Speed 5121.26 samples/sec Loss 1.4686 LearningRate 0.0198 Epoch: 15 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:32:16,345-Speed 5139.51 samples/sec Loss 1.4976 LearningRate 0.0198 Epoch: 15 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:32:24,470-Speed 5041.55 samples/sec Loss 1.4948 LearningRate 0.0198 Epoch: 15 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:32:32,705-Speed 4974.69 samples/sec Loss 1.4879 LearningRate 0.0198 Epoch: 15 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:33:07,235-Speed 1186.28 samples/sec Loss 1.1564 LearningRate 0.0197 Epoch: 16 Global Step: 66760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:33:15,225-Speed 5127.48 samples/sec Loss 1.0527 LearningRate 0.0197 Epoch: 16 Global Step: 66770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:33:23,204-Speed 5134.01 samples/sec Loss 1.0354 LearningRate 0.0197 Epoch: 16 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:33:31,164-Speed 5146.54 samples/sec Loss 1.0284 LearningRate 0.0197 Epoch: 16 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:33:39,101-Speed 5161.21 samples/sec Loss 1.0173 LearningRate 0.0196 Epoch: 16 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:33:47,165-Speed 5080.56 samples/sec Loss 1.0192 LearningRate 0.0196 Epoch: 16 Global Step: 66810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:33:55,153-Speed 5128.56 samples/sec Loss 1.0259 LearningRate 0.0196 Epoch: 16 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:34:03,117-Speed 5144.43 samples/sec Loss 1.0441 LearningRate 0.0196 Epoch: 16 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:34:11,127-Speed 5114.24 samples/sec Loss 1.0180 LearningRate 0.0195 Epoch: 16 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:34:19,235-Speed 5052.19 samples/sec Loss 1.0365 LearningRate 0.0195 Epoch: 16 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:34:27,257-Speed 5107.05 samples/sec Loss 1.0242 LearningRate 0.0195 Epoch: 16 Global Step: 66860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:34:35,257-Speed 5120.79 samples/sec Loss 1.0404 LearningRate 0.0195 Epoch: 16 Global Step: 66870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:34:43,443-Speed 5004.56 samples/sec Loss 1.0307 LearningRate 0.0195 Epoch: 16 Global Step: 66880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 13:34:51,419-Speed 5136.21 samples/sec Loss 1.0216 LearningRate 0.0194 Epoch: 16 Global Step: 66890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 13:34:59,397-Speed 5134.22 samples/sec Loss 1.0473 LearningRate 0.0194 Epoch: 16 Global Step: 66900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:35:07,555-Speed 5022.03 samples/sec Loss 1.0416 LearningRate 0.0194 Epoch: 16 Global Step: 66910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:35:15,776-Speed 4982.85 samples/sec Loss 1.0303 LearningRate 0.0194 Epoch: 16 Global Step: 66920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:35:23,936-Speed 5020.47 samples/sec Loss 1.0105 LearningRate 0.0193 Epoch: 16 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:35:31,923-Speed 5129.17 samples/sec Loss 1.0377 LearningRate 0.0193 Epoch: 16 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:35:39,849-Speed 5168.64 samples/sec Loss 1.0506 LearningRate 0.0193 Epoch: 16 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:35:47,807-Speed 5147.88 samples/sec Loss 1.0514 LearningRate 0.0193 Epoch: 16 Global Step: 66960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:35:55,748-Speed 5159.06 samples/sec Loss 1.0182 LearningRate 0.0192 Epoch: 16 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:36:03,849-Speed 5056.36 samples/sec Loss 1.0258 LearningRate 0.0192 Epoch: 16 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:36:11,929-Speed 5070.40 samples/sec Loss 1.0426 LearningRate 0.0192 Epoch: 16 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:36:19,880-Speed 5152.41 samples/sec Loss 1.0656 LearningRate 0.0192 Epoch: 16 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:36:27,908-Speed 5103.17 samples/sec Loss 1.0393 LearningRate 0.0191 Epoch: 16 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:36:36,194-Speed 4944.14 samples/sec Loss 1.0341 LearningRate 0.0191 Epoch: 16 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:36:44,370-Speed 5013.49 samples/sec Loss 1.0591 LearningRate 0.0191 Epoch: 16 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:36:52,511-Speed 5032.40 samples/sec Loss 1.0121 LearningRate 0.0191 Epoch: 16 Global Step: 67040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:00,514-Speed 5118.61 samples/sec Loss 1.0456 LearningRate 0.0191 Epoch: 16 Global Step: 67050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:08,547-Speed 5099.80 samples/sec Loss 1.0553 LearningRate 0.0190 Epoch: 16 Global Step: 67060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:16,545-Speed 5121.86 samples/sec Loss 1.0691 LearningRate 0.0190 Epoch: 16 Global Step: 67070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:24,476-Speed 5165.03 samples/sec Loss 1.0721 LearningRate 0.0190 Epoch: 16 Global Step: 67080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:32,545-Speed 5076.85 samples/sec Loss 1.0443 LearningRate 0.0190 Epoch: 16 Global Step: 67090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:40,654-Speed 5052.09 samples/sec Loss 1.0564 LearningRate 0.0189 Epoch: 16 Global Step: 67100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:48,477-Speed 5237.15 samples/sec Loss 1.0581 LearningRate 0.0189 Epoch: 16 Global Step: 67110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:37:56,467-Speed 5126.66 samples/sec Loss 1.0576 LearningRate 0.0189 Epoch: 16 Global Step: 67120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:38:04,487-Speed 5108.00 samples/sec Loss 1.0629 LearningRate 0.0189 Epoch: 16 Global Step: 67130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:38:12,298-Speed 5245.00 samples/sec Loss 1.0706 LearningRate 0.0188 Epoch: 16 Global Step: 67140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:38:20,128-Speed 5232.08 samples/sec Loss 1.0631 LearningRate 0.0188 Epoch: 16 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:38:28,264-Speed 5034.51 samples/sec Loss 1.0821 LearningRate 0.0188 Epoch: 16 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:38:36,248-Speed 5130.99 samples/sec Loss 1.0678 LearningRate 0.0188 Epoch: 16 Global Step: 67170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:38:44,589-Speed 4911.25 samples/sec Loss 1.0477 LearningRate 0.0188 Epoch: 16 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:38:52,304-Speed 5310.12 samples/sec Loss 1.0640 LearningRate 0.0187 Epoch: 16 Global Step: 67190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:00,222-Speed 5174.05 samples/sec Loss 1.0652 LearningRate 0.0187 Epoch: 16 Global Step: 67200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:07,989-Speed 5273.84 samples/sec Loss 1.0892 LearningRate 0.0187 Epoch: 16 Global Step: 67210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:16,094-Speed 5054.43 samples/sec Loss 1.0635 LearningRate 0.0187 Epoch: 16 Global Step: 67220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:24,091-Speed 5122.75 samples/sec Loss 1.0641 LearningRate 0.0186 Epoch: 16 Global Step: 67230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:32,323-Speed 4976.15 samples/sec Loss 1.0647 LearningRate 0.0186 Epoch: 16 Global Step: 67240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:40,585-Speed 4958.60 samples/sec Loss 1.0718 LearningRate 0.0186 Epoch: 16 Global Step: 67250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:48,918-Speed 4916.23 samples/sec Loss 1.0706 LearningRate 0.0186 Epoch: 16 Global Step: 67260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:39:57,043-Speed 5041.65 samples/sec Loss 1.0709 LearningRate 0.0185 Epoch: 16 Global Step: 67270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:40:05,249-Speed 4992.11 samples/sec Loss 1.0935 LearningRate 0.0185 Epoch: 16 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:40:13,264-Speed 5111.28 samples/sec Loss 1.0888 LearningRate 0.0185 Epoch: 16 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:40:21,229-Speed 5143.26 samples/sec Loss 1.0797 LearningRate 0.0185 Epoch: 16 Global Step: 67300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:40:29,177-Speed 5153.94 samples/sec Loss 1.0791 LearningRate 0.0185 Epoch: 16 Global Step: 67310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:40:37,196-Speed 5109.12 samples/sec Loss 1.0752 LearningRate 0.0184 Epoch: 16 Global Step: 67320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:40:45,201-Speed 5117.49 samples/sec Loss 1.0872 LearningRate 0.0184 Epoch: 16 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:40:53,188-Speed 5128.86 samples/sec Loss 1.0930 LearningRate 0.0184 Epoch: 16 Global Step: 67340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:41:01,183-Speed 5124.13 samples/sec Loss 1.0919 LearningRate 0.0184 Epoch: 16 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:41:09,208-Speed 5104.71 samples/sec Loss 1.0956 LearningRate 0.0183 Epoch: 16 Global Step: 67360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:41:17,371-Speed 5018.27 samples/sec Loss 1.1032 LearningRate 0.0183 Epoch: 16 Global Step: 67370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:41:25,338-Speed 5142.10 samples/sec Loss 1.1087 LearningRate 0.0183 Epoch: 16 Global Step: 67380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:41:33,416-Speed 5071.33 samples/sec Loss 1.0950 LearningRate 0.0183 Epoch: 16 Global Step: 67390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:41:41,320-Speed 5183.25 samples/sec Loss 1.1136 LearningRate 0.0182 Epoch: 16 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:41:49,428-Speed 5052.59 samples/sec Loss 1.1122 LearningRate 0.0182 Epoch: 16 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:41:57,490-Speed 5081.62 samples/sec Loss 1.1158 LearningRate 0.0182 Epoch: 16 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:42:05,513-Speed 5105.84 samples/sec Loss 1.1072 LearningRate 0.0182 Epoch: 16 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:42:13,560-Speed 5090.78 samples/sec Loss 1.1160 LearningRate 0.0182 Epoch: 16 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:42:21,515-Speed 5149.92 samples/sec Loss 1.1049 LearningRate 0.0181 Epoch: 16 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:42:29,698-Speed 5005.84 samples/sec Loss 1.0998 LearningRate 0.0181 Epoch: 16 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:42:37,705-Speed 5116.74 samples/sec Loss 1.0994 LearningRate 0.0181 Epoch: 16 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:42:45,660-Speed 5149.66 samples/sec Loss 1.1073 LearningRate 0.0181 Epoch: 16 Global Step: 67480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:42:53,656-Speed 5122.88 samples/sec Loss 1.1199 LearningRate 0.0180 Epoch: 16 Global Step: 67490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:01,524-Speed 5206.64 samples/sec Loss 1.1018 LearningRate 0.0180 Epoch: 16 Global Step: 67500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:09,481-Speed 5148.67 samples/sec Loss 1.0871 LearningRate 0.0180 Epoch: 16 Global Step: 67510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:17,371-Speed 5191.81 samples/sec Loss 1.0892 LearningRate 0.0180 Epoch: 16 Global Step: 67520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:25,220-Speed 5219.19 samples/sec Loss 1.1004 LearningRate 0.0180 Epoch: 16 Global Step: 67530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:33,311-Speed 5063.35 samples/sec Loss 1.0866 LearningRate 0.0179 Epoch: 16 Global Step: 67540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:41,504-Speed 5000.21 samples/sec Loss 1.1194 LearningRate 0.0179 Epoch: 16 Global Step: 67550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:49,379-Speed 5202.27 samples/sec Loss 1.0782 LearningRate 0.0179 Epoch: 16 Global Step: 67560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:43:57,731-Speed 4904.32 samples/sec Loss 1.1031 LearningRate 0.0179 Epoch: 16 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:44:05,991-Speed 4959.61 samples/sec Loss 1.0912 LearningRate 0.0178 Epoch: 16 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:44:14,216-Speed 4981.15 samples/sec Loss 1.1402 LearningRate 0.0178 Epoch: 16 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:44:22,285-Speed 5077.24 samples/sec Loss 1.0976 LearningRate 0.0178 Epoch: 16 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:44:30,286-Speed 5120.35 samples/sec Loss 1.1175 LearningRate 0.0178 Epoch: 16 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:44:38,294-Speed 5115.55 samples/sec Loss 1.1237 LearningRate 0.0178 Epoch: 16 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:44:46,340-Speed 5091.52 samples/sec Loss 1.0957 LearningRate 0.0177 Epoch: 16 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:44:54,309-Speed 5141.23 samples/sec Loss 1.0982 LearningRate 0.0177 Epoch: 16 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:45:02,526-Speed 4985.02 samples/sec Loss 1.1090 LearningRate 0.0177 Epoch: 16 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:45:10,901-Speed 4891.49 samples/sec Loss 1.0966 LearningRate 0.0177 Epoch: 16 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:45:18,879-Speed 5134.89 samples/sec Loss 1.1090 LearningRate 0.0176 Epoch: 16 Global Step: 67670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:45:27,205-Speed 4920.12 samples/sec Loss 1.1108 LearningRate 0.0176 Epoch: 16 Global Step: 67680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:45:36,084-Speed 4613.74 samples/sec Loss 1.1136 LearningRate 0.0176 Epoch: 16 Global Step: 67690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:45:44,866-Speed 4665.07 samples/sec Loss 1.1313 LearningRate 0.0176 Epoch: 16 Global Step: 67700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:45:53,563-Speed 4710.60 samples/sec Loss 1.0973 LearningRate 0.0176 Epoch: 16 Global Step: 67710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:46:01,568-Speed 5117.21 samples/sec Loss 1.1195 LearningRate 0.0175 Epoch: 16 Global Step: 67720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:46:09,621-Speed 5087.03 samples/sec Loss 1.1107 LearningRate 0.0175 Epoch: 16 Global Step: 67730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:46:17,664-Speed 5093.52 samples/sec Loss 1.1128 LearningRate 0.0175 Epoch: 16 Global Step: 67740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:46:25,900-Speed 4973.83 samples/sec Loss 1.1482 LearningRate 0.0175 Epoch: 16 Global Step: 67750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:46:33,993-Speed 5062.11 samples/sec Loss 1.1210 LearningRate 0.0174 Epoch: 16 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:46:41,977-Speed 5131.24 samples/sec Loss 1.1247 LearningRate 0.0174 Epoch: 16 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:46:50,187-Speed 4990.05 samples/sec Loss 1.1399 LearningRate 0.0174 Epoch: 16 Global Step: 67780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:46:58,239-Speed 5087.16 samples/sec Loss 1.1435 LearningRate 0.0174 Epoch: 16 Global Step: 67790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:47:06,283-Speed 5093.01 samples/sec Loss 1.1303 LearningRate 0.0174 Epoch: 16 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:47:14,301-Speed 5109.29 samples/sec Loss 1.1005 LearningRate 0.0173 Epoch: 16 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:47:22,302-Speed 5120.18 samples/sec Loss 1.1089 LearningRate 0.0173 Epoch: 16 Global Step: 67820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:47:30,212-Speed 5179.12 samples/sec Loss 1.1071 LearningRate 0.0173 Epoch: 16 Global Step: 67830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:47:38,343-Speed 5038.06 samples/sec Loss 1.1226 LearningRate 0.0173 Epoch: 16 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:47:46,415-Speed 5074.95 samples/sec Loss 1.1390 LearningRate 0.0172 Epoch: 16 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:47:54,524-Speed 5051.85 samples/sec Loss 1.1152 LearningRate 0.0172 Epoch: 16 Global Step: 67860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:02,551-Speed 5103.75 samples/sec Loss 1.1301 LearningRate 0.0172 Epoch: 16 Global Step: 67870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:10,618-Speed 5078.25 samples/sec Loss 1.1048 LearningRate 0.0172 Epoch: 16 Global Step: 67880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:18,636-Speed 5109.89 samples/sec Loss 1.1298 LearningRate 0.0172 Epoch: 16 Global Step: 67890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:26,895-Speed 4960.37 samples/sec Loss 1.1202 LearningRate 0.0171 Epoch: 16 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:34,953-Speed 5083.80 samples/sec Loss 1.1148 LearningRate 0.0171 Epoch: 16 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:42,967-Speed 5112.08 samples/sec Loss 1.1143 LearningRate 0.0171 Epoch: 16 Global Step: 67920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:51,268-Speed 4935.39 samples/sec Loss 1.1180 LearningRate 0.0171 Epoch: 16 Global Step: 67930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:48:58,909-Speed 5361.15 samples/sec Loss 1.1363 LearningRate 0.0170 Epoch: 16 Global Step: 67940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:49:06,866-Speed 5148.43 samples/sec Loss 1.1420 LearningRate 0.0170 Epoch: 16 Global Step: 67950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:49:14,782-Speed 5174.94 samples/sec Loss 1.1474 LearningRate 0.0170 Epoch: 16 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:49:22,902-Speed 5045.05 samples/sec Loss 1.1132 LearningRate 0.0170 Epoch: 16 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:49:30,757-Speed 5215.39 samples/sec Loss 1.1219 LearningRate 0.0170 Epoch: 16 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:49:38,561-Speed 5249.20 samples/sec Loss 1.1213 LearningRate 0.0169 Epoch: 16 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:49:46,560-Speed 5121.54 samples/sec Loss 1.1260 LearningRate 0.0169 Epoch: 16 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:49:54,550-Speed 5127.05 samples/sec Loss 1.1214 LearningRate 0.0169 Epoch: 16 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:50:02,738-Speed 5003.51 samples/sec Loss 1.1194 LearningRate 0.0169 Epoch: 16 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:50:10,842-Speed 5055.02 samples/sec Loss 1.1192 LearningRate 0.0168 Epoch: 16 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:50:19,205-Speed 4898.50 samples/sec Loss 1.1416 LearningRate 0.0168 Epoch: 16 Global Step: 68040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:50:27,450-Speed 4968.56 samples/sec Loss 1.1469 LearningRate 0.0168 Epoch: 16 Global Step: 68050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:50:35,443-Speed 5125.12 samples/sec Loss 1.1182 LearningRate 0.0168 Epoch: 16 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:50:43,342-Speed 5186.54 samples/sec Loss 1.1135 LearningRate 0.0168 Epoch: 16 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:50:51,494-Speed 5025.22 samples/sec Loss 1.1268 LearningRate 0.0167 Epoch: 16 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:50:59,578-Speed 5067.68 samples/sec Loss 1.1456 LearningRate 0.0167 Epoch: 16 Global Step: 68090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:51:07,804-Speed 4979.98 samples/sec Loss 1.1400 LearningRate 0.0167 Epoch: 16 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:51:15,857-Speed 5087.20 samples/sec Loss 1.0950 LearningRate 0.0167 Epoch: 16 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:51:24,400-Speed 4795.17 samples/sec Loss 1.1226 LearningRate 0.0166 Epoch: 16 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:51:32,709-Speed 4930.10 samples/sec Loss 1.1554 LearningRate 0.0166 Epoch: 16 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:51:41,286-Speed 4776.24 samples/sec Loss 1.1343 LearningRate 0.0166 Epoch: 16 Global Step: 68140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:51:50,065-Speed 4666.65 samples/sec Loss 1.1469 LearningRate 0.0166 Epoch: 16 Global Step: 68150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:51:58,808-Speed 4684.96 samples/sec Loss 1.1545 LearningRate 0.0166 Epoch: 16 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:52:06,637-Speed 5232.72 samples/sec Loss 1.1265 LearningRate 0.0165 Epoch: 16 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:52:14,774-Speed 5035.10 samples/sec Loss 1.1190 LearningRate 0.0165 Epoch: 16 Global Step: 68180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:52:22,902-Speed 5039.97 samples/sec Loss 1.1303 LearningRate 0.0165 Epoch: 16 Global Step: 68190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:52:31,009-Speed 5053.40 samples/sec Loss 1.1517 LearningRate 0.0165 Epoch: 16 Global Step: 68200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:52:39,002-Speed 5125.07 samples/sec Loss 1.1243 LearningRate 0.0165 Epoch: 16 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:52:47,285-Speed 4945.52 samples/sec Loss 1.1442 LearningRate 0.0164 Epoch: 16 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:52:55,509-Speed 4981.45 samples/sec Loss 1.1298 LearningRate 0.0164 Epoch: 16 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:53:03,862-Speed 4904.39 samples/sec Loss 1.1296 LearningRate 0.0164 Epoch: 16 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:53:12,253-Speed 4881.99 samples/sec Loss 1.1423 LearningRate 0.0164 Epoch: 16 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:53:20,455-Speed 4994.30 samples/sec Loss 1.1309 LearningRate 0.0163 Epoch: 16 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:53:28,419-Speed 5144.50 samples/sec Loss 1.1377 LearningRate 0.0163 Epoch: 16 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:53:36,458-Speed 5095.95 samples/sec Loss 1.1192 LearningRate 0.0163 Epoch: 16 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:53:44,914-Speed 4844.13 samples/sec Loss 1.1267 LearningRate 0.0163 Epoch: 16 Global Step: 68290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:53:52,984-Speed 5076.21 samples/sec Loss 1.1260 LearningRate 0.0163 Epoch: 16 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:00,779-Speed 5255.83 samples/sec Loss 1.1351 LearningRate 0.0162 Epoch: 16 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:08,875-Speed 5060.07 samples/sec Loss 1.1309 LearningRate 0.0162 Epoch: 16 Global Step: 68320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:16,837-Speed 5145.08 samples/sec Loss 1.1181 LearningRate 0.0162 Epoch: 16 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:25,051-Speed 4987.05 samples/sec Loss 1.1450 LearningRate 0.0162 Epoch: 16 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:32,770-Speed 5307.49 samples/sec Loss 1.1325 LearningRate 0.0162 Epoch: 16 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:41,068-Speed 4937.01 samples/sec Loss 1.1439 LearningRate 0.0161 Epoch: 16 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:49,172-Speed 5054.91 samples/sec Loss 1.1388 LearningRate 0.0161 Epoch: 16 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:54:57,220-Speed 5089.49 samples/sec Loss 1.1364 LearningRate 0.0161 Epoch: 16 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:55:05,240-Speed 5108.36 samples/sec Loss 1.1341 LearningRate 0.0161 Epoch: 16 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:55:13,069-Speed 5232.55 samples/sec Loss 1.1188 LearningRate 0.0160 Epoch: 16 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:55:20,758-Speed 5327.75 samples/sec Loss 1.1284 LearningRate 0.0160 Epoch: 16 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:55:28,759-Speed 5120.60 samples/sec Loss 1.1254 LearningRate 0.0160 Epoch: 16 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:55:37,012-Speed 4963.84 samples/sec Loss 1.1456 LearningRate 0.0160 Epoch: 16 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:55:44,896-Speed 5195.43 samples/sec Loss 1.1162 LearningRate 0.0160 Epoch: 16 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:55:52,563-Speed 5343.75 samples/sec Loss 1.1194 LearningRate 0.0159 Epoch: 16 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:00,785-Speed 4982.36 samples/sec Loss 1.1286 LearningRate 0.0159 Epoch: 16 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:08,801-Speed 5110.30 samples/sec Loss 1.1428 LearningRate 0.0159 Epoch: 16 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:16,973-Speed 5013.25 samples/sec Loss 1.1185 LearningRate 0.0159 Epoch: 16 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:24,943-Speed 5139.80 samples/sec Loss 1.1333 LearningRate 0.0159 Epoch: 16 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:32,848-Speed 5182.12 samples/sec Loss 1.1413 LearningRate 0.0158 Epoch: 16 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:40,965-Speed 5047.42 samples/sec Loss 1.1391 LearningRate 0.0158 Epoch: 16 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:48,981-Speed 5110.39 samples/sec Loss 1.1375 LearningRate 0.0158 Epoch: 16 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:56:56,700-Speed 5306.51 samples/sec Loss 1.1295 LearningRate 0.0158 Epoch: 16 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:57:04,680-Speed 5133.81 samples/sec Loss 1.1204 LearningRate 0.0157 Epoch: 16 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:57:12,684-Speed 5118.45 samples/sec Loss 1.1392 LearningRate 0.0157 Epoch: 16 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:57:20,717-Speed 5099.48 samples/sec Loss 1.1352 LearningRate 0.0157 Epoch: 16 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:57:28,705-Speed 5128.51 samples/sec Loss 1.1363 LearningRate 0.0157 Epoch: 16 Global Step: 68570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:57:36,682-Speed 5135.71 samples/sec Loss 1.1280 LearningRate 0.0157 Epoch: 16 Global Step: 68580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:57:44,668-Speed 5129.08 samples/sec Loss 1.1348 LearningRate 0.0156 Epoch: 16 Global Step: 68590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:57:52,655-Speed 5129.34 samples/sec Loss 1.1378 LearningRate 0.0156 Epoch: 16 Global Step: 68600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:58:00,721-Speed 5078.80 samples/sec Loss 1.1302 LearningRate 0.0156 Epoch: 16 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:58:08,707-Speed 5130.26 samples/sec Loss 1.1273 LearningRate 0.0156 Epoch: 16 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:58:16,742-Speed 5098.04 samples/sec Loss 1.1239 LearningRate 0.0156 Epoch: 16 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:58:24,748-Speed 5116.95 samples/sec Loss 1.1287 LearningRate 0.0155 Epoch: 16 Global Step: 68640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:58:32,684-Speed 5162.41 samples/sec Loss 1.1323 LearningRate 0.0155 Epoch: 16 Global Step: 68650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:58:40,755-Speed 5076.03 samples/sec Loss 1.1255 LearningRate 0.0155 Epoch: 16 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 13:58:48,898-Speed 5031.01 samples/sec Loss 1.1235 LearningRate 0.0155 Epoch: 16 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:58:56,930-Speed 5099.84 samples/sec Loss 1.1436 LearningRate 0.0155 Epoch: 16 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:59:04,945-Speed 5110.87 samples/sec Loss 1.1101 LearningRate 0.0154 Epoch: 16 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:59:12,918-Speed 5138.64 samples/sec Loss 1.1396 LearningRate 0.0154 Epoch: 16 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:59:20,645-Speed 5301.01 samples/sec Loss 1.1251 LearningRate 0.0154 Epoch: 16 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:59:28,857-Speed 4988.83 samples/sec Loss 1.1097 LearningRate 0.0154 Epoch: 16 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:59:36,604-Speed 5288.02 samples/sec Loss 1.1472 LearningRate 0.0153 Epoch: 16 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:59:44,623-Speed 5108.59 samples/sec Loss 1.1449 LearningRate 0.0153 Epoch: 16 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 13:59:52,431-Speed 5246.54 samples/sec Loss 1.1436 LearningRate 0.0153 Epoch: 16 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:00:00,517-Speed 5066.61 samples/sec Loss 1.1308 LearningRate 0.0153 Epoch: 16 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:00:08,677-Speed 5020.54 samples/sec Loss 1.1412 LearningRate 0.0153 Epoch: 16 Global Step: 68770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:00:16,831-Speed 5023.84 samples/sec Loss 1.1091 LearningRate 0.0152 Epoch: 16 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:00:24,759-Speed 5166.86 samples/sec Loss 1.1176 LearningRate 0.0152 Epoch: 16 Global Step: 68790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:00:32,810-Speed 5088.78 samples/sec Loss 1.1010 LearningRate 0.0152 Epoch: 16 Global Step: 68800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:00:40,877-Speed 5078.12 samples/sec Loss 1.1197 LearningRate 0.0152 Epoch: 16 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:00:49,189-Speed 4928.54 samples/sec Loss 1.1300 LearningRate 0.0152 Epoch: 16 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:00:57,257-Speed 5076.89 samples/sec Loss 1.1137 LearningRate 0.0151 Epoch: 16 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:01:05,419-Speed 5019.25 samples/sec Loss 1.1024 LearningRate 0.0151 Epoch: 16 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:01:13,526-Speed 5053.48 samples/sec Loss 1.1354 LearningRate 0.0151 Epoch: 16 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:01:21,542-Speed 5110.77 samples/sec Loss 1.1457 LearningRate 0.0151 Epoch: 16 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:01:29,146-Speed 5387.38 samples/sec Loss 1.1326 LearningRate 0.0151 Epoch: 16 Global Step: 68870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:01:36,948-Speed 5250.77 samples/sec Loss 1.1473 LearningRate 0.0150 Epoch: 16 Global Step: 68880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:01:45,036-Speed 5064.54 samples/sec Loss 1.1307 LearningRate 0.0150 Epoch: 16 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:01:53,024-Speed 5128.31 samples/sec Loss 1.1610 LearningRate 0.0150 Epoch: 16 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:02:00,704-Speed 5334.29 samples/sec Loss 1.1419 LearningRate 0.0150 Epoch: 16 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:02:08,609-Speed 5182.68 samples/sec Loss 1.1387 LearningRate 0.0150 Epoch: 16 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:02:16,695-Speed 5065.88 samples/sec Loss 1.1245 LearningRate 0.0149 Epoch: 16 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:02:24,682-Speed 5129.47 samples/sec Loss 1.1458 LearningRate 0.0149 Epoch: 16 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:02:32,702-Speed 5107.61 samples/sec Loss 1.1508 LearningRate 0.0149 Epoch: 16 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:02:40,480-Speed 5267.12 samples/sec Loss 1.1314 LearningRate 0.0149 Epoch: 16 Global Step: 68960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:02:48,285-Speed 5248.24 samples/sec Loss 1.1355 LearningRate 0.0149 Epoch: 16 Global Step: 68970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:02:56,379-Speed 5061.09 samples/sec Loss 1.1344 LearningRate 0.0148 Epoch: 16 Global Step: 68980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:03:04,464-Speed 5067.17 samples/sec Loss 1.1593 LearningRate 0.0148 Epoch: 16 Global Step: 68990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:03:12,430-Speed 5142.51 samples/sec Loss 1.1189 LearningRate 0.0148 Epoch: 16 Global Step: 69000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:03:20,409-Speed 5134.33 samples/sec Loss 1.1224 LearningRate 0.0148 Epoch: 16 Global Step: 69010 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 14:03:28,460-Speed 5088.20 samples/sec Loss 1.1224 LearningRate 0.0147 Epoch: 16 Global Step: 69020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:03:36,423-Speed 5144.43 samples/sec Loss 1.1143 LearningRate 0.0147 Epoch: 16 Global Step: 69030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:03:44,394-Speed 5139.37 samples/sec Loss 1.1129 LearningRate 0.0147 Epoch: 16 Global Step: 69040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:03:52,375-Speed 5132.95 samples/sec Loss 1.1119 LearningRate 0.0147 Epoch: 16 Global Step: 69050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:04:00,374-Speed 5121.77 samples/sec Loss 1.1280 LearningRate 0.0147 Epoch: 16 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:04:08,460-Speed 5066.16 samples/sec Loss 1.1125 LearningRate 0.0146 Epoch: 16 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:04:16,544-Speed 5067.30 samples/sec Loss 1.1099 LearningRate 0.0146 Epoch: 16 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:04:24,583-Speed 5095.69 samples/sec Loss 1.1341 LearningRate 0.0146 Epoch: 16 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:04:32,785-Speed 4994.51 samples/sec Loss 1.1211 LearningRate 0.0146 Epoch: 16 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:04:41,033-Speed 4967.39 samples/sec Loss 1.1187 LearningRate 0.0146 Epoch: 16 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:04:49,116-Speed 5067.80 samples/sec Loss 1.1086 LearningRate 0.0145 Epoch: 16 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:04:57,164-Speed 5090.09 samples/sec Loss 1.1298 LearningRate 0.0145 Epoch: 16 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:05:05,174-Speed 5114.38 samples/sec Loss 1.1057 LearningRate 0.0145 Epoch: 16 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:05:13,126-Speed 5151.76 samples/sec Loss 1.1438 LearningRate 0.0145 Epoch: 16 Global Step: 69150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:05:21,074-Speed 5153.81 samples/sec Loss 1.1283 LearningRate 0.0145 Epoch: 16 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:05:29,181-Speed 5053.21 samples/sec Loss 1.1159 LearningRate 0.0144 Epoch: 16 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:05:37,129-Speed 5153.84 samples/sec Loss 1.1425 LearningRate 0.0144 Epoch: 16 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:05:45,007-Speed 5200.67 samples/sec Loss 1.1385 LearningRate 0.0144 Epoch: 16 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:05:52,841-Speed 5228.49 samples/sec Loss 1.1183 LearningRate 0.0144 Epoch: 16 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:00,826-Speed 5130.48 samples/sec Loss 1.1386 LearningRate 0.0144 Epoch: 16 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:08,779-Speed 5151.19 samples/sec Loss 1.1176 LearningRate 0.0143 Epoch: 16 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:16,732-Speed 5150.70 samples/sec Loss 1.1145 LearningRate 0.0143 Epoch: 16 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:24,756-Speed 5105.25 samples/sec Loss 1.0989 LearningRate 0.0143 Epoch: 16 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:32,818-Speed 5081.56 samples/sec Loss 1.1257 LearningRate 0.0143 Epoch: 16 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:40,815-Speed 5122.60 samples/sec Loss 1.1124 LearningRate 0.0143 Epoch: 16 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:49,026-Speed 4989.18 samples/sec Loss 1.1128 LearningRate 0.0142 Epoch: 16 Global Step: 69270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:06:57,297-Speed 4952.92 samples/sec Loss 1.1005 LearningRate 0.0142 Epoch: 16 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:07:05,550-Speed 4963.67 samples/sec Loss 1.1393 LearningRate 0.0142 Epoch: 16 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:07:13,923-Speed 4892.78 samples/sec Loss 1.1078 LearningRate 0.0142 Epoch: 16 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:07:22,040-Speed 5046.73 samples/sec Loss 1.1039 LearningRate 0.0142 Epoch: 16 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:07:30,061-Speed 5107.62 samples/sec Loss 1.0989 LearningRate 0.0141 Epoch: 16 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:07:38,000-Speed 5159.79 samples/sec Loss 1.1142 LearningRate 0.0141 Epoch: 16 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:07:46,135-Speed 5036.26 samples/sec Loss 1.1277 LearningRate 0.0141 Epoch: 16 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:07:54,167-Speed 5099.62 samples/sec Loss 1.1120 LearningRate 0.0141 Epoch: 16 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:08:02,260-Speed 5062.50 samples/sec Loss 1.1066 LearningRate 0.0141 Epoch: 16 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:08:10,222-Speed 5145.21 samples/sec Loss 1.1423 LearningRate 0.0140 Epoch: 16 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:08:18,584-Speed 4898.77 samples/sec Loss 1.1152 LearningRate 0.0140 Epoch: 16 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:08:26,740-Speed 5022.58 samples/sec Loss 1.1049 LearningRate 0.0140 Epoch: 16 Global Step: 69390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:08:34,863-Speed 5043.08 samples/sec Loss 1.1155 LearningRate 0.0140 Epoch: 16 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:08:44,250-Speed 4364.17 samples/sec Loss 1.1012 LearningRate 0.0140 Epoch: 16 Global Step: 69410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:08:52,778-Speed 4803.54 samples/sec Loss 1.1179 LearningRate 0.0139 Epoch: 16 Global Step: 69420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:09:00,741-Speed 5144.89 samples/sec Loss 1.1301 LearningRate 0.0139 Epoch: 16 Global Step: 69430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:09:10,941-Speed 4015.97 samples/sec Loss 1.0951 LearningRate 0.0139 Epoch: 16 Global Step: 69440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:09:19,703-Speed 4675.29 samples/sec Loss 1.1258 LearningRate 0.0139 Epoch: 16 Global Step: 69450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:09:28,188-Speed 4828.13 samples/sec Loss 1.1138 LearningRate 0.0139 Epoch: 16 Global Step: 69460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:09:35,891-Speed 5317.98 samples/sec Loss 1.0950 LearningRate 0.0138 Epoch: 16 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:09:43,670-Speed 5266.78 samples/sec Loss 1.1147 LearningRate 0.0138 Epoch: 16 Global Step: 69480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:09:51,531-Speed 5211.49 samples/sec Loss 1.1109 LearningRate 0.0138 Epoch: 16 Global Step: 69490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:09:59,107-Speed 5407.08 samples/sec Loss 1.1145 LearningRate 0.0138 Epoch: 16 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:10:06,826-Speed 5306.99 samples/sec Loss 1.1418 LearningRate 0.0138 Epoch: 16 Global Step: 69510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:10:14,533-Speed 5315.53 samples/sec Loss 1.1247 LearningRate 0.0137 Epoch: 16 Global Step: 69520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:10:22,326-Speed 5256.31 samples/sec Loss 1.1446 LearningRate 0.0137 Epoch: 16 Global Step: 69530 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 14:10:29,727-Speed 5535.59 samples/sec Loss 1.1122 LearningRate 0.0137 Epoch: 16 Global Step: 69540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:10:37,069-Speed 5579.11 samples/sec Loss 1.1041 LearningRate 0.0137 Epoch: 16 Global Step: 69550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:10:44,403-Speed 5585.67 samples/sec Loss 1.1001 LearningRate 0.0137 Epoch: 16 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:10:51,764-Speed 5566.07 samples/sec Loss 1.0829 LearningRate 0.0136 Epoch: 16 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:10:59,116-Speed 5571.89 samples/sec Loss 1.0940 LearningRate 0.0136 Epoch: 16 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:06,686-Speed 5411.48 samples/sec Loss 1.0869 LearningRate 0.0136 Epoch: 16 Global Step: 69590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:14,562-Speed 5201.91 samples/sec Loss 1.1078 LearningRate 0.0136 Epoch: 16 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:22,166-Speed 5386.83 samples/sec Loss 1.0865 LearningRate 0.0136 Epoch: 16 Global Step: 69610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:29,512-Speed 5576.96 samples/sec Loss 1.0938 LearningRate 0.0135 Epoch: 16 Global Step: 69620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:36,928-Speed 5523.66 samples/sec Loss 1.0865 LearningRate 0.0135 Epoch: 16 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:44,214-Speed 5622.68 samples/sec Loss 1.1097 LearningRate 0.0135 Epoch: 16 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:51,708-Speed 5466.86 samples/sec Loss 1.1129 LearningRate 0.0135 Epoch: 16 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:11:59,270-Speed 5416.74 samples/sec Loss 1.0976 LearningRate 0.0135 Epoch: 16 Global Step: 69660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:06,612-Speed 5580.13 samples/sec Loss 1.0890 LearningRate 0.0134 Epoch: 16 Global Step: 69670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:14,150-Speed 5434.19 samples/sec Loss 1.1044 LearningRate 0.0134 Epoch: 16 Global Step: 69680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:21,899-Speed 5286.67 samples/sec Loss 1.0940 LearningRate 0.0134 Epoch: 16 Global Step: 69690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:29,476-Speed 5407.27 samples/sec Loss 1.1128 LearningRate 0.0134 Epoch: 16 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:36,870-Speed 5540.31 samples/sec Loss 1.0803 LearningRate 0.0134 Epoch: 16 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:44,430-Speed 5418.30 samples/sec Loss 1.1150 LearningRate 0.0134 Epoch: 16 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:51,982-Speed 5425.39 samples/sec Loss 1.0905 LearningRate 0.0133 Epoch: 16 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:12:59,327-Speed 5577.48 samples/sec Loss 1.1243 LearningRate 0.0133 Epoch: 16 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:13:06,662-Speed 5584.96 samples/sec Loss 1.1197 LearningRate 0.0133 Epoch: 16 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:13:15,796-Speed 4484.90 samples/sec Loss 1.0946 LearningRate 0.0133 Epoch: 16 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:13:23,126-Speed 5589.10 samples/sec Loss 1.0874 LearningRate 0.0133 Epoch: 16 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:13:30,495-Speed 5559.36 samples/sec Loss 1.0886 LearningRate 0.0132 Epoch: 16 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:13:38,105-Speed 5383.33 samples/sec Loss 1.0792 LearningRate 0.0132 Epoch: 16 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:13:45,537-Speed 5512.27 samples/sec Loss 1.1066 LearningRate 0.0132 Epoch: 16 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:13:53,309-Speed 5271.03 samples/sec Loss 1.1000 LearningRate 0.0132 Epoch: 16 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:14:00,903-Speed 5394.88 samples/sec Loss 1.0937 LearningRate 0.0132 Epoch: 16 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:14:08,442-Speed 5433.67 samples/sec Loss 1.0939 LearningRate 0.0131 Epoch: 16 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:14:15,934-Speed 5467.85 samples/sec Loss 1.0803 LearningRate 0.0131 Epoch: 16 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:14:23,538-Speed 5387.53 samples/sec Loss 1.0940 LearningRate 0.0131 Epoch: 16 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:14:30,926-Speed 5545.31 samples/sec Loss 1.0862 LearningRate 0.0131 Epoch: 16 Global Step: 69860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:14:38,287-Speed 5565.16 samples/sec Loss 1.1107 LearningRate 0.0131 Epoch: 16 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:14:45,603-Speed 5599.54 samples/sec Loss 1.0881 LearningRate 0.0130 Epoch: 16 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:14:52,954-Speed 5573.17 samples/sec Loss 1.0804 LearningRate 0.0130 Epoch: 16 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:00,367-Speed 5526.23 samples/sec Loss 1.0939 LearningRate 0.0130 Epoch: 16 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:07,703-Speed 5583.87 samples/sec Loss 1.0755 LearningRate 0.0130 Epoch: 16 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:15,171-Speed 5485.22 samples/sec Loss 1.0987 LearningRate 0.0130 Epoch: 16 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:22,535-Speed 5563.37 samples/sec Loss 1.0969 LearningRate 0.0129 Epoch: 16 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:29,861-Speed 5591.87 samples/sec Loss 1.0715 LearningRate 0.0129 Epoch: 16 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:37,280-Speed 5521.45 samples/sec Loss 1.1245 LearningRate 0.0129 Epoch: 16 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:44,776-Speed 5465.23 samples/sec Loss 1.0899 LearningRate 0.0129 Epoch: 16 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:52,092-Speed 5599.28 samples/sec Loss 1.0980 LearningRate 0.0129 Epoch: 16 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:15:59,590-Speed 5463.75 samples/sec Loss 1.0846 LearningRate 0.0129 Epoch: 16 Global Step: 69980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:16:07,483-Speed 5189.89 samples/sec Loss 1.0738 LearningRate 0.0128 Epoch: 16 Global Step: 69990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:16:15,058-Speed 5407.92 samples/sec Loss 1.1070 LearningRate 0.0128 Epoch: 16 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:17:01,530-[lfw][70000]XNorm: 22.691336 Training: 2022-01-17 14:17:01,531-[lfw][70000]Accuracy-Flip: 0.99750+-0.00271 Training: 2022-01-17 14:17:01,531-[lfw][70000]Accuracy-Highest: 0.99817 Training: 2022-01-17 14:17:55,520-[cfp_fp][70000]XNorm: 21.548959 Training: 2022-01-17 14:17:55,520-[cfp_fp][70000]Accuracy-Flip: 0.99200+-0.00395 Training: 2022-01-17 14:17:55,521-[cfp_fp][70000]Accuracy-Highest: 0.99200 Training: 2022-01-17 14:18:41,956-[agedb_30][70000]XNorm: 22.840177 Training: 2022-01-17 14:18:41,957-[agedb_30][70000]Accuracy-Flip: 0.98417+-0.00696 Training: 2022-01-17 14:18:41,957-[agedb_30][70000]Accuracy-Highest: 0.98417 Training: 2022-01-17 14:18:49,653-Speed 264.95 samples/sec Loss 1.0913 LearningRate 0.0128 Epoch: 16 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:18:57,186-Speed 5437.89 samples/sec Loss 1.0901 LearningRate 0.0128 Epoch: 16 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:04,775-Speed 5397.94 samples/sec Loss 1.0802 LearningRate 0.0128 Epoch: 16 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:12,259-Speed 5474.27 samples/sec Loss 1.0649 LearningRate 0.0127 Epoch: 16 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:19,775-Speed 5450.78 samples/sec Loss 1.0868 LearningRate 0.0127 Epoch: 16 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:27,349-Speed 5408.49 samples/sec Loss 1.0918 LearningRate 0.0127 Epoch: 16 Global Step: 70060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:34,762-Speed 5526.03 samples/sec Loss 1.0711 LearningRate 0.0127 Epoch: 16 Global Step: 70070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:42,058-Speed 5615.18 samples/sec Loss 1.0975 LearningRate 0.0127 Epoch: 16 Global Step: 70080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:49,482-Speed 5517.90 samples/sec Loss 1.0962 LearningRate 0.0126 Epoch: 16 Global Step: 70090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:19:56,860-Speed 5552.36 samples/sec Loss 1.0840 LearningRate 0.0126 Epoch: 16 Global Step: 70100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:20:04,840-Speed 5133.54 samples/sec Loss 1.0680 LearningRate 0.0126 Epoch: 16 Global Step: 70110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:20:12,305-Speed 5487.95 samples/sec Loss 1.0873 LearningRate 0.0126 Epoch: 16 Global Step: 70120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:20:19,964-Speed 5349.06 samples/sec Loss 1.0925 LearningRate 0.0126 Epoch: 16 Global Step: 70130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:20:28,082-Speed 5045.43 samples/sec Loss 1.0805 LearningRate 0.0125 Epoch: 16 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:20:35,614-Speed 5439.14 samples/sec Loss 1.0706 LearningRate 0.0125 Epoch: 16 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:20:43,071-Speed 5493.71 samples/sec Loss 1.0714 LearningRate 0.0125 Epoch: 16 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:20:50,741-Speed 5341.12 samples/sec Loss 1.0788 LearningRate 0.0125 Epoch: 16 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:20:58,213-Speed 5483.29 samples/sec Loss 1.0766 LearningRate 0.0125 Epoch: 16 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:21:05,649-Speed 5508.74 samples/sec Loss 1.0687 LearningRate 0.0125 Epoch: 16 Global Step: 70190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:21:13,034-Speed 5547.30 samples/sec Loss 1.0602 LearningRate 0.0124 Epoch: 16 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:21:20,570-Speed 5436.11 samples/sec Loss 1.0603 LearningRate 0.0124 Epoch: 16 Global Step: 70210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:21:27,967-Speed 5538.34 samples/sec Loss 1.1057 LearningRate 0.0124 Epoch: 16 Global Step: 70220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:21:35,393-Speed 5516.33 samples/sec Loss 1.0664 LearningRate 0.0124 Epoch: 16 Global Step: 70230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:21:42,798-Speed 5532.13 samples/sec Loss 1.0752 LearningRate 0.0124 Epoch: 16 Global Step: 70240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:21:50,134-Speed 5584.51 samples/sec Loss 1.0594 LearningRate 0.0123 Epoch: 16 Global Step: 70250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:21:57,586-Speed 5497.60 samples/sec Loss 1.0746 LearningRate 0.0123 Epoch: 16 Global Step: 70260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:22:05,030-Speed 5503.07 samples/sec Loss 1.0804 LearningRate 0.0123 Epoch: 16 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:22:12,400-Speed 5558.10 samples/sec Loss 1.0786 LearningRate 0.0123 Epoch: 16 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:22:19,892-Speed 5468.47 samples/sec Loss 1.0511 LearningRate 0.0123 Epoch: 16 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:22:27,200-Speed 5605.38 samples/sec Loss 1.0718 LearningRate 0.0122 Epoch: 16 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:22:34,609-Speed 5529.97 samples/sec Loss 1.0806 LearningRate 0.0122 Epoch: 16 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:22:41,930-Speed 5595.36 samples/sec Loss 1.0723 LearningRate 0.0122 Epoch: 16 Global Step: 70320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:22:49,314-Speed 5548.15 samples/sec Loss 1.0615 LearningRate 0.0122 Epoch: 16 Global Step: 70330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:22:56,754-Speed 5506.43 samples/sec Loss 1.0741 LearningRate 0.0122 Epoch: 16 Global Step: 70340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:04,131-Speed 5553.35 samples/sec Loss 1.0780 LearningRate 0.0122 Epoch: 16 Global Step: 70350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:11,575-Speed 5503.52 samples/sec Loss 1.0631 LearningRate 0.0121 Epoch: 16 Global Step: 70360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:18,969-Speed 5540.23 samples/sec Loss 1.0735 LearningRate 0.0121 Epoch: 16 Global Step: 70370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:26,582-Speed 5380.98 samples/sec Loss 1.0505 LearningRate 0.0121 Epoch: 16 Global Step: 70380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:34,058-Speed 5480.14 samples/sec Loss 1.0546 LearningRate 0.0121 Epoch: 16 Global Step: 70390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:41,668-Speed 5382.60 samples/sec Loss 1.0528 LearningRate 0.0121 Epoch: 16 Global Step: 70400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:49,018-Speed 5574.13 samples/sec Loss 1.0750 LearningRate 0.0120 Epoch: 16 Global Step: 70410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:23:56,411-Speed 5541.62 samples/sec Loss 1.0650 LearningRate 0.0120 Epoch: 16 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:24:03,771-Speed 5565.75 samples/sec Loss 1.0365 LearningRate 0.0120 Epoch: 16 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:24:11,086-Speed 5600.25 samples/sec Loss 1.0465 LearningRate 0.0120 Epoch: 16 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:24:18,551-Speed 5487.77 samples/sec Loss 1.0582 LearningRate 0.0120 Epoch: 16 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:24:26,062-Speed 5454.14 samples/sec Loss 1.0654 LearningRate 0.0120 Epoch: 16 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:24:33,915-Speed 5216.71 samples/sec Loss 1.0681 LearningRate 0.0119 Epoch: 16 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:24:41,412-Speed 5464.25 samples/sec Loss 1.0410 LearningRate 0.0119 Epoch: 16 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:24:48,700-Speed 5620.93 samples/sec Loss 1.0629 LearningRate 0.0119 Epoch: 16 Global Step: 70490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:24:56,096-Speed 5539.12 samples/sec Loss 1.0426 LearningRate 0.0119 Epoch: 16 Global Step: 70500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:03,485-Speed 5544.28 samples/sec Loss 1.0578 LearningRate 0.0119 Epoch: 16 Global Step: 70510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:10,974-Speed 5470.02 samples/sec Loss 1.0469 LearningRate 0.0118 Epoch: 16 Global Step: 70520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:18,506-Speed 5438.82 samples/sec Loss 1.0537 LearningRate 0.0118 Epoch: 16 Global Step: 70530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:25,823-Speed 5598.88 samples/sec Loss 1.0448 LearningRate 0.0118 Epoch: 16 Global Step: 70540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:33,095-Speed 5633.77 samples/sec Loss 1.0632 LearningRate 0.0118 Epoch: 16 Global Step: 70550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:40,452-Speed 5568.46 samples/sec Loss 1.0465 LearningRate 0.0118 Epoch: 16 Global Step: 70560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:47,970-Speed 5449.03 samples/sec Loss 1.0639 LearningRate 0.0117 Epoch: 16 Global Step: 70570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:25:55,484-Speed 5452.10 samples/sec Loss 1.0569 LearningRate 0.0117 Epoch: 16 Global Step: 70580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:26:03,163-Speed 5334.23 samples/sec Loss 1.0505 LearningRate 0.0117 Epoch: 16 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:26:10,482-Speed 5597.89 samples/sec Loss 1.0514 LearningRate 0.0117 Epoch: 16 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:26:17,821-Speed 5581.93 samples/sec Loss 1.0323 LearningRate 0.0117 Epoch: 16 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:26:25,301-Speed 5476.21 samples/sec Loss 1.0491 LearningRate 0.0117 Epoch: 16 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:26:32,706-Speed 5532.77 samples/sec Loss 1.0508 LearningRate 0.0116 Epoch: 16 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:26:40,063-Speed 5568.23 samples/sec Loss 1.0488 LearningRate 0.0116 Epoch: 16 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:26:47,319-Speed 5645.45 samples/sec Loss 1.0434 LearningRate 0.0116 Epoch: 16 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:26:54,652-Speed 5586.82 samples/sec Loss 1.0368 LearningRate 0.0116 Epoch: 16 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:27:02,262-Speed 5383.52 samples/sec Loss 1.0391 LearningRate 0.0116 Epoch: 16 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:27:09,706-Speed 5502.96 samples/sec Loss 1.0436 LearningRate 0.0115 Epoch: 16 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:27:17,030-Speed 5593.18 samples/sec Loss 1.0413 LearningRate 0.0115 Epoch: 16 Global Step: 70690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:27:24,335-Speed 5608.79 samples/sec Loss 1.0506 LearningRate 0.0115 Epoch: 16 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:27:31,897-Speed 5417.77 samples/sec Loss 1.0749 LearningRate 0.0115 Epoch: 16 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:27:39,344-Speed 5501.20 samples/sec Loss 1.0410 LearningRate 0.0115 Epoch: 16 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:27:47,084-Speed 5292.71 samples/sec Loss 1.0319 LearningRate 0.0115 Epoch: 16 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:27:55,483-Speed 4877.44 samples/sec Loss 1.0306 LearningRate 0.0114 Epoch: 16 Global Step: 70740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:28:03,844-Speed 4899.99 samples/sec Loss 1.0520 LearningRate 0.0114 Epoch: 16 Global Step: 70750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:28:11,943-Speed 5058.36 samples/sec Loss 1.0465 LearningRate 0.0114 Epoch: 16 Global Step: 70760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:28:20,119-Speed 5009.86 samples/sec Loss 1.0437 LearningRate 0.0114 Epoch: 16 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:28:27,503-Speed 5548.44 samples/sec Loss 1.0641 LearningRate 0.0114 Epoch: 16 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:28:34,831-Speed 5589.92 samples/sec Loss 1.0388 LearningRate 0.0114 Epoch: 16 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:28:42,188-Speed 5569.05 samples/sec Loss 1.0470 LearningRate 0.0113 Epoch: 16 Global Step: 70800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:28:49,577-Speed 5543.87 samples/sec Loss 1.0241 LearningRate 0.0113 Epoch: 16 Global Step: 70810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:28:57,050-Speed 5481.56 samples/sec Loss 1.0457 LearningRate 0.0113 Epoch: 16 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:29:04,502-Speed 5497.59 samples/sec Loss 1.0388 LearningRate 0.0113 Epoch: 16 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:29:11,949-Speed 5501.48 samples/sec Loss 1.0612 LearningRate 0.0113 Epoch: 16 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:29:19,350-Speed 5534.51 samples/sec Loss 1.0448 LearningRate 0.0112 Epoch: 16 Global Step: 70850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:29:26,730-Speed 5551.48 samples/sec Loss 1.0512 LearningRate 0.0112 Epoch: 16 Global Step: 70860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:29:34,075-Speed 5576.82 samples/sec Loss 1.0509 LearningRate 0.0112 Epoch: 16 Global Step: 70870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:29:41,475-Speed 5536.39 samples/sec Loss 1.0470 LearningRate 0.0112 Epoch: 16 Global Step: 70880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:29:48,906-Speed 5512.53 samples/sec Loss 1.0194 LearningRate 0.0112 Epoch: 16 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:29:56,409-Speed 5460.11 samples/sec Loss 1.0206 LearningRate 0.0112 Epoch: 16 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:30:03,930-Speed 5446.91 samples/sec Loss 1.0260 LearningRate 0.0111 Epoch: 16 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:30:11,535-Speed 5387.20 samples/sec Loss 1.0323 LearningRate 0.0111 Epoch: 16 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:30:47,271-Speed 1146.23 samples/sec Loss 0.8291 LearningRate 0.0111 Epoch: 17 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:30:54,668-Speed 5537.63 samples/sec Loss 0.6549 LearningRate 0.0111 Epoch: 17 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:31:02,142-Speed 5481.44 samples/sec Loss 0.6483 LearningRate 0.0111 Epoch: 17 Global Step: 70950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:31:09,734-Speed 5396.10 samples/sec Loss 0.6456 LearningRate 0.0110 Epoch: 17 Global Step: 70960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:31:17,423-Speed 5327.35 samples/sec Loss 0.6477 LearningRate 0.0110 Epoch: 17 Global Step: 70970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:31:25,050-Speed 5371.41 samples/sec Loss 0.6464 LearningRate 0.0110 Epoch: 17 Global Step: 70980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:31:32,500-Speed 5499.36 samples/sec Loss 0.6363 LearningRate 0.0110 Epoch: 17 Global Step: 70990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:31:39,833-Speed 5585.70 samples/sec Loss 0.6304 LearningRate 0.0110 Epoch: 17 Global Step: 71000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:31:47,153-Speed 5596.86 samples/sec Loss 0.6428 LearningRate 0.0110 Epoch: 17 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:31:55,264-Speed 5050.85 samples/sec Loss 0.6427 LearningRate 0.0109 Epoch: 17 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:03,317-Speed 5087.03 samples/sec Loss 0.6374 LearningRate 0.0109 Epoch: 17 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:11,370-Speed 5087.64 samples/sec Loss 0.6392 LearningRate 0.0109 Epoch: 17 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:19,319-Speed 5153.95 samples/sec Loss 0.6604 LearningRate 0.0109 Epoch: 17 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:26,726-Speed 5530.63 samples/sec Loss 0.6421 LearningRate 0.0109 Epoch: 17 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:34,159-Speed 5511.72 samples/sec Loss 0.6584 LearningRate 0.0109 Epoch: 17 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:41,563-Speed 5532.79 samples/sec Loss 0.6524 LearningRate 0.0108 Epoch: 17 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:49,064-Speed 5461.54 samples/sec Loss 0.6283 LearningRate 0.0108 Epoch: 17 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:32:56,551-Speed 5471.09 samples/sec Loss 0.6456 LearningRate 0.0108 Epoch: 17 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:33:04,081-Speed 5441.08 samples/sec Loss 0.6496 LearningRate 0.0108 Epoch: 17 Global Step: 71110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:33:11,483-Speed 5534.65 samples/sec Loss 0.6571 LearningRate 0.0108 Epoch: 17 Global Step: 71120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:33:18,932-Speed 5499.38 samples/sec Loss 0.6593 LearningRate 0.0107 Epoch: 17 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:33:26,413-Speed 5475.39 samples/sec Loss 0.6476 LearningRate 0.0107 Epoch: 17 Global Step: 71140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:33:33,790-Speed 5553.73 samples/sec Loss 0.6495 LearningRate 0.0107 Epoch: 17 Global Step: 71150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:33:41,109-Speed 5596.98 samples/sec Loss 0.6552 LearningRate 0.0107 Epoch: 17 Global Step: 71160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:33:48,365-Speed 5645.98 samples/sec Loss 0.6463 LearningRate 0.0107 Epoch: 17 Global Step: 71170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:33:55,624-Speed 5643.32 samples/sec Loss 0.6598 LearningRate 0.0107 Epoch: 17 Global Step: 71180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:34:02,858-Speed 5662.75 samples/sec Loss 0.6452 LearningRate 0.0106 Epoch: 17 Global Step: 71190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:34:10,141-Speed 5625.45 samples/sec Loss 0.6500 LearningRate 0.0106 Epoch: 17 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:34:17,466-Speed 5592.69 samples/sec Loss 0.6312 LearningRate 0.0106 Epoch: 17 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:34:25,105-Speed 5362.38 samples/sec Loss 0.6418 LearningRate 0.0106 Epoch: 17 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:34:32,692-Speed 5399.61 samples/sec Loss 0.6563 LearningRate 0.0106 Epoch: 17 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:34:39,971-Speed 5627.57 samples/sec Loss 0.6601 LearningRate 0.0106 Epoch: 17 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:34:47,489-Speed 5449.48 samples/sec Loss 0.6444 LearningRate 0.0105 Epoch: 17 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:34:54,930-Speed 5505.46 samples/sec Loss 0.6567 LearningRate 0.0105 Epoch: 17 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:35:02,227-Speed 5613.72 samples/sec Loss 0.6624 LearningRate 0.0105 Epoch: 17 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:35:09,533-Speed 5607.42 samples/sec Loss 0.6623 LearningRate 0.0105 Epoch: 17 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:35:16,853-Speed 5596.51 samples/sec Loss 0.6578 LearningRate 0.0105 Epoch: 17 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:35:24,167-Speed 5600.92 samples/sec Loss 0.6507 LearningRate 0.0105 Epoch: 17 Global Step: 71300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:35:31,511-Speed 5577.72 samples/sec Loss 0.6661 LearningRate 0.0104 Epoch: 17 Global Step: 71310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:35:38,760-Speed 5651.40 samples/sec Loss 0.6684 LearningRate 0.0104 Epoch: 17 Global Step: 71320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:35:46,181-Speed 5520.32 samples/sec Loss 0.6691 LearningRate 0.0104 Epoch: 17 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:35:53,539-Speed 5567.99 samples/sec Loss 0.6533 LearningRate 0.0104 Epoch: 17 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:00,889-Speed 5573.44 samples/sec Loss 0.6573 LearningRate 0.0104 Epoch: 17 Global Step: 71350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:08,243-Speed 5570.35 samples/sec Loss 0.6628 LearningRate 0.0104 Epoch: 17 Global Step: 71360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:15,517-Speed 5631.58 samples/sec Loss 0.6510 LearningRate 0.0103 Epoch: 17 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:22,912-Speed 5540.03 samples/sec Loss 0.6696 LearningRate 0.0103 Epoch: 17 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:30,210-Speed 5613.45 samples/sec Loss 0.6531 LearningRate 0.0103 Epoch: 17 Global Step: 71390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:37,484-Speed 5631.96 samples/sec Loss 0.6525 LearningRate 0.0103 Epoch: 17 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:44,984-Speed 5461.93 samples/sec Loss 0.6441 LearningRate 0.0103 Epoch: 17 Global Step: 71410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:52,454-Speed 5483.92 samples/sec Loss 0.6707 LearningRate 0.0102 Epoch: 17 Global Step: 71420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:36:59,749-Speed 5616.24 samples/sec Loss 0.6688 LearningRate 0.0102 Epoch: 17 Global Step: 71430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:37:07,135-Speed 5546.09 samples/sec Loss 0.6591 LearningRate 0.0102 Epoch: 17 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:37:14,773-Speed 5363.71 samples/sec Loss 0.6583 LearningRate 0.0102 Epoch: 17 Global Step: 71450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:37:22,060-Speed 5621.67 samples/sec Loss 0.6806 LearningRate 0.0102 Epoch: 17 Global Step: 71460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:37:29,368-Speed 5605.46 samples/sec Loss 0.6647 LearningRate 0.0102 Epoch: 17 Global Step: 71470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:37:36,637-Speed 5636.67 samples/sec Loss 0.6632 LearningRate 0.0101 Epoch: 17 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:37:43,917-Speed 5626.94 samples/sec Loss 0.6699 LearningRate 0.0101 Epoch: 17 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:37:51,220-Speed 5609.67 samples/sec Loss 0.6702 LearningRate 0.0101 Epoch: 17 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:37:58,533-Speed 5601.51 samples/sec Loss 0.6706 LearningRate 0.0101 Epoch: 17 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:05,849-Speed 5599.71 samples/sec Loss 0.6666 LearningRate 0.0101 Epoch: 17 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:13,175-Speed 5591.87 samples/sec Loss 0.6578 LearningRate 0.0101 Epoch: 17 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:20,496-Speed 5596.36 samples/sec Loss 0.6431 LearningRate 0.0100 Epoch: 17 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:27,896-Speed 5535.81 samples/sec Loss 0.6707 LearningRate 0.0100 Epoch: 17 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:35,307-Speed 5527.85 samples/sec Loss 0.6658 LearningRate 0.0100 Epoch: 17 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:42,676-Speed 5558.99 samples/sec Loss 0.6722 LearningRate 0.0100 Epoch: 17 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:50,135-Speed 5492.46 samples/sec Loss 0.6745 LearningRate 0.0100 Epoch: 17 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:38:57,420-Speed 5622.47 samples/sec Loss 0.6671 LearningRate 0.0100 Epoch: 17 Global Step: 71590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:39:04,738-Speed 5598.81 samples/sec Loss 0.6636 LearningRate 0.0099 Epoch: 17 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:39:12,110-Speed 5556.96 samples/sec Loss 0.6667 LearningRate 0.0099 Epoch: 17 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:39:19,542-Speed 5512.12 samples/sec Loss 0.6655 LearningRate 0.0099 Epoch: 17 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:39:26,918-Speed 5553.86 samples/sec Loss 0.6665 LearningRate 0.0099 Epoch: 17 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:39:34,287-Speed 5559.63 samples/sec Loss 0.6673 LearningRate 0.0099 Epoch: 17 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:39:41,593-Speed 5606.61 samples/sec Loss 0.6732 LearningRate 0.0099 Epoch: 17 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:39:49,017-Speed 5518.26 samples/sec Loss 0.6563 LearningRate 0.0098 Epoch: 17 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:39:56,422-Speed 5531.70 samples/sec Loss 0.6839 LearningRate 0.0098 Epoch: 17 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:04,044-Speed 5374.82 samples/sec Loss 0.6675 LearningRate 0.0098 Epoch: 17 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:11,356-Speed 5603.11 samples/sec Loss 0.6837 LearningRate 0.0098 Epoch: 17 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:18,716-Speed 5565.67 samples/sec Loss 0.6808 LearningRate 0.0098 Epoch: 17 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:26,111-Speed 5539.52 samples/sec Loss 0.6628 LearningRate 0.0098 Epoch: 17 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:33,459-Speed 5575.29 samples/sec Loss 0.6599 LearningRate 0.0097 Epoch: 17 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:40,735-Speed 5630.51 samples/sec Loss 0.6651 LearningRate 0.0097 Epoch: 17 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:48,304-Speed 5412.43 samples/sec Loss 0.6768 LearningRate 0.0097 Epoch: 17 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:40:55,730-Speed 5515.82 samples/sec Loss 0.6569 LearningRate 0.0097 Epoch: 17 Global Step: 71750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:03,142-Speed 5527.50 samples/sec Loss 0.6887 LearningRate 0.0097 Epoch: 17 Global Step: 71760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:10,617-Speed 5480.81 samples/sec Loss 0.6709 LearningRate 0.0097 Epoch: 17 Global Step: 71770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:17,996-Speed 5551.78 samples/sec Loss 0.6728 LearningRate 0.0096 Epoch: 17 Global Step: 71780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:25,286-Speed 5619.35 samples/sec Loss 0.7014 LearningRate 0.0096 Epoch: 17 Global Step: 71790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:32,769-Speed 5474.82 samples/sec Loss 0.6688 LearningRate 0.0096 Epoch: 17 Global Step: 71800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:40,418-Speed 5355.82 samples/sec Loss 0.6971 LearningRate 0.0096 Epoch: 17 Global Step: 71810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:48,104-Speed 5329.89 samples/sec Loss 0.6674 LearningRate 0.0096 Epoch: 17 Global Step: 71820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:41:55,635-Speed 5439.62 samples/sec Loss 0.6734 LearningRate 0.0096 Epoch: 17 Global Step: 71830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:42:02,934-Speed 5612.04 samples/sec Loss 0.6725 LearningRate 0.0095 Epoch: 17 Global Step: 71840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-17 14:42:10,384-Speed 5499.05 samples/sec Loss 0.6798 LearningRate 0.0095 Epoch: 17 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:42:17,859-Speed 5480.57 samples/sec Loss 0.6766 LearningRate 0.0095 Epoch: 17 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:42:25,529-Speed 5340.78 samples/sec Loss 0.6828 LearningRate 0.0095 Epoch: 17 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:42:33,165-Speed 5364.71 samples/sec Loss 0.6676 LearningRate 0.0095 Epoch: 17 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:42:40,655-Speed 5469.35 samples/sec Loss 0.6597 LearningRate 0.0095 Epoch: 17 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:42:47,961-Speed 5607.05 samples/sec Loss 0.6727 LearningRate 0.0094 Epoch: 17 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:42:55,299-Speed 5582.96 samples/sec Loss 0.6810 LearningRate 0.0094 Epoch: 17 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:43:02,665-Speed 5561.52 samples/sec Loss 0.6918 LearningRate 0.0094 Epoch: 17 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:43:09,940-Speed 5630.94 samples/sec Loss 0.6805 LearningRate 0.0094 Epoch: 17 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:43:17,303-Speed 5564.18 samples/sec Loss 0.6778 LearningRate 0.0094 Epoch: 17 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:43:24,766-Speed 5488.58 samples/sec Loss 0.6899 LearningRate 0.0094 Epoch: 17 Global Step: 71950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:43:32,081-Speed 5600.43 samples/sec Loss 0.6791 LearningRate 0.0093 Epoch: 17 Global Step: 71960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:43:39,406-Speed 5592.66 samples/sec Loss 0.6814 LearningRate 0.0093 Epoch: 17 Global Step: 71970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:43:46,719-Speed 5601.81 samples/sec Loss 0.6617 LearningRate 0.0093 Epoch: 17 Global Step: 71980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:43:53,965-Speed 5653.80 samples/sec Loss 0.6865 LearningRate 0.0093 Epoch: 17 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:01,222-Speed 5644.48 samples/sec Loss 0.6617 LearningRate 0.0093 Epoch: 17 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:08,548-Speed 5592.59 samples/sec Loss 0.6845 LearningRate 0.0093 Epoch: 17 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:15,818-Speed 5635.44 samples/sec Loss 0.6668 LearningRate 0.0093 Epoch: 17 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:23,077-Speed 5642.74 samples/sec Loss 0.6801 LearningRate 0.0092 Epoch: 17 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:30,344-Speed 5637.30 samples/sec Loss 0.6648 LearningRate 0.0092 Epoch: 17 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:37,922-Speed 5405.86 samples/sec Loss 0.6753 LearningRate 0.0092 Epoch: 17 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:45,486-Speed 5415.91 samples/sec Loss 0.6828 LearningRate 0.0092 Epoch: 17 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:44:53,084-Speed 5391.97 samples/sec Loss 0.6508 LearningRate 0.0092 Epoch: 17 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:45:00,672-Speed 5398.36 samples/sec Loss 0.6779 LearningRate 0.0092 Epoch: 17 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:45:07,962-Speed 5619.70 samples/sec Loss 0.6748 LearningRate 0.0091 Epoch: 17 Global Step: 72090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:45:15,413-Speed 5498.54 samples/sec Loss 0.6892 LearningRate 0.0091 Epoch: 17 Global Step: 72100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:45:22,782-Speed 5559.02 samples/sec Loss 0.6807 LearningRate 0.0091 Epoch: 17 Global Step: 72110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:45:30,202-Speed 5521.67 samples/sec Loss 0.6792 LearningRate 0.0091 Epoch: 17 Global Step: 72120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:45:37,594-Speed 5542.52 samples/sec Loss 0.6862 LearningRate 0.0091 Epoch: 17 Global Step: 72130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:45:45,105-Speed 5454.63 samples/sec Loss 0.6997 LearningRate 0.0091 Epoch: 17 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:45:52,662-Speed 5421.02 samples/sec Loss 0.6880 LearningRate 0.0090 Epoch: 17 Global Step: 72150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:00,101-Speed 5507.10 samples/sec Loss 0.6839 LearningRate 0.0090 Epoch: 17 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:07,613-Speed 5453.26 samples/sec Loss 0.6706 LearningRate 0.0090 Epoch: 17 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:14,959-Speed 5577.23 samples/sec Loss 0.6763 LearningRate 0.0090 Epoch: 17 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:22,326-Speed 5560.81 samples/sec Loss 0.6680 LearningRate 0.0090 Epoch: 17 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:29,629-Speed 5609.66 samples/sec Loss 0.6722 LearningRate 0.0090 Epoch: 17 Global Step: 72200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:36,979-Speed 5574.30 samples/sec Loss 0.6718 LearningRate 0.0089 Epoch: 17 Global Step: 72210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:44,329-Speed 5573.43 samples/sec Loss 0.6630 LearningRate 0.0089 Epoch: 17 Global Step: 72220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:51,719-Speed 5543.77 samples/sec Loss 0.6901 LearningRate 0.0089 Epoch: 17 Global Step: 72230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:46:59,011-Speed 5617.84 samples/sec Loss 0.6908 LearningRate 0.0089 Epoch: 17 Global Step: 72240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:06,341-Speed 5588.60 samples/sec Loss 0.6727 LearningRate 0.0089 Epoch: 17 Global Step: 72250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:13,711-Speed 5558.91 samples/sec Loss 0.7018 LearningRate 0.0089 Epoch: 17 Global Step: 72260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:21,033-Speed 5595.32 samples/sec Loss 0.6669 LearningRate 0.0088 Epoch: 17 Global Step: 72270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:28,493-Speed 5491.36 samples/sec Loss 0.6887 LearningRate 0.0088 Epoch: 17 Global Step: 72280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:35,758-Speed 5638.64 samples/sec Loss 0.6663 LearningRate 0.0088 Epoch: 17 Global Step: 72290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:43,063-Speed 5608.32 samples/sec Loss 0.6717 LearningRate 0.0088 Epoch: 17 Global Step: 72300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:50,614-Speed 5425.30 samples/sec Loss 0.6780 LearningRate 0.0088 Epoch: 17 Global Step: 72310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:47:58,134-Speed 5447.44 samples/sec Loss 0.6657 LearningRate 0.0088 Epoch: 17 Global Step: 72320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:48:05,445-Speed 5603.08 samples/sec Loss 0.6887 LearningRate 0.0088 Epoch: 17 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:48:12,745-Speed 5612.29 samples/sec Loss 0.6882 LearningRate 0.0087 Epoch: 17 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:48:20,067-Speed 5595.74 samples/sec Loss 0.7005 LearningRate 0.0087 Epoch: 17 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:48:27,346-Speed 5627.57 samples/sec Loss 0.6628 LearningRate 0.0087 Epoch: 17 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:48:34,681-Speed 5585.04 samples/sec Loss 0.6915 LearningRate 0.0087 Epoch: 17 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:48:42,040-Speed 5566.96 samples/sec Loss 0.6868 LearningRate 0.0087 Epoch: 17 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:48:49,449-Speed 5529.45 samples/sec Loss 0.6713 LearningRate 0.0087 Epoch: 17 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:48:56,857-Speed 5529.94 samples/sec Loss 0.6816 LearningRate 0.0086 Epoch: 17 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:49:04,438-Speed 5404.04 samples/sec Loss 0.6746 LearningRate 0.0086 Epoch: 17 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:49:11,881-Speed 5504.70 samples/sec Loss 0.6723 LearningRate 0.0086 Epoch: 17 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:49:19,215-Speed 5585.80 samples/sec Loss 0.7020 LearningRate 0.0086 Epoch: 17 Global Step: 72430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:49:26,628-Speed 5525.84 samples/sec Loss 0.6848 LearningRate 0.0086 Epoch: 17 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:49:33,915-Speed 5622.01 samples/sec Loss 0.6722 LearningRate 0.0086 Epoch: 17 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:49:41,251-Speed 5584.42 samples/sec Loss 0.6642 LearningRate 0.0086 Epoch: 17 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:49:48,743-Speed 5467.90 samples/sec Loss 0.6907 LearningRate 0.0085 Epoch: 17 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:49:56,123-Speed 5551.17 samples/sec Loss 0.6914 LearningRate 0.0085 Epoch: 17 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:50:03,427-Speed 5608.51 samples/sec Loss 0.6823 LearningRate 0.0085 Epoch: 17 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:50:10,705-Speed 5628.65 samples/sec Loss 0.6754 LearningRate 0.0085 Epoch: 17 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:50:18,207-Speed 5460.32 samples/sec Loss 0.6810 LearningRate 0.0085 Epoch: 17 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:50:25,617-Speed 5528.53 samples/sec Loss 0.6906 LearningRate 0.0085 Epoch: 17 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:50:33,149-Speed 5439.30 samples/sec Loss 0.6920 LearningRate 0.0084 Epoch: 17 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:50:40,681-Speed 5439.06 samples/sec Loss 0.6786 LearningRate 0.0084 Epoch: 17 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:50:48,035-Speed 5570.72 samples/sec Loss 0.7013 LearningRate 0.0084 Epoch: 17 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:50:55,430-Speed 5539.57 samples/sec Loss 0.6791 LearningRate 0.0084 Epoch: 17 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:51:02,885-Speed 5495.24 samples/sec Loss 0.6825 LearningRate 0.0084 Epoch: 17 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:51:10,275-Speed 5543.48 samples/sec Loss 0.6863 LearningRate 0.0084 Epoch: 17 Global Step: 72580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:51:17,784-Speed 5455.32 samples/sec Loss 0.6707 LearningRate 0.0083 Epoch: 17 Global Step: 72590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:51:25,148-Speed 5563.16 samples/sec Loss 0.6840 LearningRate 0.0083 Epoch: 17 Global Step: 72600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:51:32,491-Speed 5579.20 samples/sec Loss 0.6784 LearningRate 0.0083 Epoch: 17 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:51:39,882-Speed 5542.11 samples/sec Loss 0.6947 LearningRate 0.0083 Epoch: 17 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:51:47,184-Speed 5611.00 samples/sec Loss 0.6983 LearningRate 0.0083 Epoch: 17 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:51:54,466-Speed 5625.36 samples/sec Loss 0.6872 LearningRate 0.0083 Epoch: 17 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:52:01,843-Speed 5553.46 samples/sec Loss 0.6805 LearningRate 0.0083 Epoch: 17 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:52:09,158-Speed 5600.27 samples/sec Loss 0.6572 LearningRate 0.0082 Epoch: 17 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:52:16,445-Speed 5621.78 samples/sec Loss 0.6722 LearningRate 0.0082 Epoch: 17 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:52:23,800-Speed 5569.68 samples/sec Loss 0.6785 LearningRate 0.0082 Epoch: 17 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:52:31,231-Speed 5513.20 samples/sec Loss 0.6901 LearningRate 0.0082 Epoch: 17 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:52:38,550-Speed 5596.78 samples/sec Loss 0.6985 LearningRate 0.0082 Epoch: 17 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:52:45,892-Speed 5580.32 samples/sec Loss 0.6676 LearningRate 0.0082 Epoch: 17 Global Step: 72710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:52:53,524-Speed 5367.42 samples/sec Loss 0.6950 LearningRate 0.0082 Epoch: 17 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:00,966-Speed 5505.39 samples/sec Loss 0.6780 LearningRate 0.0081 Epoch: 17 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:08,270-Speed 5608.51 samples/sec Loss 0.6746 LearningRate 0.0081 Epoch: 17 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:15,627-Speed 5567.85 samples/sec Loss 0.6776 LearningRate 0.0081 Epoch: 17 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:22,927-Speed 5612.57 samples/sec Loss 0.6848 LearningRate 0.0081 Epoch: 17 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:30,231-Speed 5608.00 samples/sec Loss 0.6639 LearningRate 0.0081 Epoch: 17 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:37,688-Speed 5493.98 samples/sec Loss 0.6869 LearningRate 0.0081 Epoch: 17 Global Step: 72780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:45,311-Speed 5374.24 samples/sec Loss 0.6895 LearningRate 0.0080 Epoch: 17 Global Step: 72790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:53:52,690-Speed 5551.46 samples/sec Loss 0.6709 LearningRate 0.0080 Epoch: 17 Global Step: 72800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:54:00,042-Speed 5572.17 samples/sec Loss 0.6745 LearningRate 0.0080 Epoch: 17 Global Step: 72810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 14:54:07,340-Speed 5613.86 samples/sec Loss 0.6814 LearningRate 0.0080 Epoch: 17 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:54:14,647-Speed 5605.94 samples/sec Loss 0.6813 LearningRate 0.0080 Epoch: 17 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:54:21,999-Speed 5572.24 samples/sec Loss 0.6793 LearningRate 0.0080 Epoch: 17 Global Step: 72840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:54:29,399-Speed 5536.70 samples/sec Loss 0.6672 LearningRate 0.0080 Epoch: 17 Global Step: 72850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:54:36,774-Speed 5554.00 samples/sec Loss 0.6849 LearningRate 0.0079 Epoch: 17 Global Step: 72860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:54:44,177-Speed 5533.51 samples/sec Loss 0.6734 LearningRate 0.0079 Epoch: 17 Global Step: 72870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:54:51,584-Speed 5531.23 samples/sec Loss 0.7026 LearningRate 0.0079 Epoch: 17 Global Step: 72880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:54:58,925-Speed 5580.57 samples/sec Loss 0.6786 LearningRate 0.0079 Epoch: 17 Global Step: 72890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:55:06,473-Speed 5426.68 samples/sec Loss 0.6696 LearningRate 0.0079 Epoch: 17 Global Step: 72900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 14:55:13,837-Speed 5563.49 samples/sec Loss 0.6679 LearningRate 0.0079 Epoch: 17 Global Step: 72910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:55:21,235-Speed 5537.90 samples/sec Loss 0.6942 LearningRate 0.0078 Epoch: 17 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:55:28,555-Speed 5596.30 samples/sec Loss 0.6598 LearningRate 0.0078 Epoch: 17 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:55:35,919-Speed 5563.09 samples/sec Loss 0.6684 LearningRate 0.0078 Epoch: 17 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:55:43,310-Speed 5542.41 samples/sec Loss 0.6932 LearningRate 0.0078 Epoch: 17 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:55:50,625-Speed 5600.28 samples/sec Loss 0.7008 LearningRate 0.0078 Epoch: 17 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:55:57,941-Speed 5599.98 samples/sec Loss 0.6680 LearningRate 0.0078 Epoch: 17 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:56:05,210-Speed 5635.34 samples/sec Loss 0.6669 LearningRate 0.0078 Epoch: 17 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:56:12,574-Speed 5563.08 samples/sec Loss 0.6659 LearningRate 0.0077 Epoch: 17 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:56:20,043-Speed 5485.27 samples/sec Loss 0.6819 LearningRate 0.0077 Epoch: 17 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:56:27,480-Speed 5509.95 samples/sec Loss 0.6972 LearningRate 0.0077 Epoch: 17 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:56:34,796-Speed 5599.77 samples/sec Loss 0.6741 LearningRate 0.0077 Epoch: 17 Global Step: 73020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:56:42,035-Speed 5659.57 samples/sec Loss 0.6965 LearningRate 0.0077 Epoch: 17 Global Step: 73030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:56:49,275-Speed 5657.91 samples/sec Loss 0.6603 LearningRate 0.0077 Epoch: 17 Global Step: 73040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:56:56,720-Speed 5502.93 samples/sec Loss 0.6812 LearningRate 0.0077 Epoch: 17 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:04,014-Speed 5616.39 samples/sec Loss 0.6949 LearningRate 0.0076 Epoch: 17 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:11,285-Speed 5633.81 samples/sec Loss 0.6899 LearningRate 0.0076 Epoch: 17 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:18,897-Speed 5381.53 samples/sec Loss 0.6847 LearningRate 0.0076 Epoch: 17 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:26,482-Speed 5401.28 samples/sec Loss 0.6663 LearningRate 0.0076 Epoch: 17 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:33,898-Speed 5523.84 samples/sec Loss 0.6665 LearningRate 0.0076 Epoch: 17 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:41,229-Speed 5588.12 samples/sec Loss 0.6880 LearningRate 0.0076 Epoch: 17 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:48,627-Speed 5537.86 samples/sec Loss 0.6672 LearningRate 0.0076 Epoch: 17 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:57:56,148-Speed 5446.62 samples/sec Loss 0.6859 LearningRate 0.0075 Epoch: 17 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:58:03,402-Speed 5647.93 samples/sec Loss 0.6861 LearningRate 0.0075 Epoch: 17 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 14:58:10,694-Speed 5618.11 samples/sec Loss 0.6769 LearningRate 0.0075 Epoch: 17 Global Step: 73150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:58:18,011-Speed 5598.57 samples/sec Loss 0.6741 LearningRate 0.0075 Epoch: 17 Global Step: 73160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:58:25,448-Speed 5508.80 samples/sec Loss 0.6800 LearningRate 0.0075 Epoch: 17 Global Step: 73170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:58:32,791-Speed 5578.69 samples/sec Loss 0.6665 LearningRate 0.0075 Epoch: 17 Global Step: 73180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:58:40,255-Speed 5488.35 samples/sec Loss 0.6623 LearningRate 0.0075 Epoch: 17 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:58:47,765-Speed 5455.24 samples/sec Loss 0.6848 LearningRate 0.0074 Epoch: 17 Global Step: 73200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:58:55,185-Speed 5521.01 samples/sec Loss 0.6676 LearningRate 0.0074 Epoch: 17 Global Step: 73210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:02,484-Speed 5611.89 samples/sec Loss 0.6761 LearningRate 0.0074 Epoch: 17 Global Step: 73220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:09,752-Speed 5636.58 samples/sec Loss 0.6869 LearningRate 0.0074 Epoch: 17 Global Step: 73230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:17,373-Speed 5380.66 samples/sec Loss 0.6575 LearningRate 0.0074 Epoch: 17 Global Step: 73240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:24,617-Speed 5654.36 samples/sec Loss 0.6553 LearningRate 0.0074 Epoch: 17 Global Step: 73250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:31,895-Speed 5629.13 samples/sec Loss 0.6615 LearningRate 0.0074 Epoch: 17 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:39,205-Speed 5604.86 samples/sec Loss 0.6798 LearningRate 0.0073 Epoch: 17 Global Step: 73270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:46,466-Speed 5641.24 samples/sec Loss 0.6718 LearningRate 0.0073 Epoch: 17 Global Step: 73280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 14:59:53,768-Speed 5610.20 samples/sec Loss 0.6906 LearningRate 0.0073 Epoch: 17 Global Step: 73290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:00:01,373-Speed 5388.33 samples/sec Loss 0.6537 LearningRate 0.0073 Epoch: 17 Global Step: 73300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:00:08,821-Speed 5499.39 samples/sec Loss 0.6769 LearningRate 0.0073 Epoch: 17 Global Step: 73310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:00:16,182-Speed 5565.79 samples/sec Loss 0.6655 LearningRate 0.0073 Epoch: 17 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:00:23,659-Speed 5478.64 samples/sec Loss 0.6718 LearningRate 0.0072 Epoch: 17 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:00:31,017-Speed 5567.87 samples/sec Loss 0.6722 LearningRate 0.0072 Epoch: 17 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:00:38,420-Speed 5533.54 samples/sec Loss 0.6646 LearningRate 0.0072 Epoch: 17 Global Step: 73350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:00:45,757-Speed 5583.58 samples/sec Loss 0.6765 LearningRate 0.0072 Epoch: 17 Global Step: 73360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:00:53,093-Speed 5585.58 samples/sec Loss 0.6748 LearningRate 0.0072 Epoch: 17 Global Step: 73370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:01:00,367-Speed 5631.58 samples/sec Loss 0.6869 LearningRate 0.0072 Epoch: 17 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:01:07,635-Speed 5636.63 samples/sec Loss 0.6546 LearningRate 0.0072 Epoch: 17 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:01:15,140-Speed 5458.22 samples/sec Loss 0.6878 LearningRate 0.0071 Epoch: 17 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:01:22,670-Speed 5440.86 samples/sec Loss 0.6505 LearningRate 0.0071 Epoch: 17 Global Step: 73410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:01:29,938-Speed 5636.14 samples/sec Loss 0.6754 LearningRate 0.0071 Epoch: 17 Global Step: 73420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:01:37,249-Speed 5603.75 samples/sec Loss 0.6550 LearningRate 0.0071 Epoch: 17 Global Step: 73430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:01:44,481-Speed 5664.32 samples/sec Loss 0.6657 LearningRate 0.0071 Epoch: 17 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:01:51,869-Speed 5544.62 samples/sec Loss 0.6518 LearningRate 0.0071 Epoch: 17 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:01:59,153-Speed 5624.66 samples/sec Loss 0.6793 LearningRate 0.0071 Epoch: 17 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:02:06,460-Speed 5605.92 samples/sec Loss 0.6647 LearningRate 0.0071 Epoch: 17 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:02:13,766-Speed 5607.58 samples/sec Loss 0.6614 LearningRate 0.0070 Epoch: 17 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:02:21,065-Speed 5611.93 samples/sec Loss 0.6506 LearningRate 0.0070 Epoch: 17 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:02:28,378-Speed 5602.49 samples/sec Loss 0.6619 LearningRate 0.0070 Epoch: 17 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:02:35,706-Speed 5589.99 samples/sec Loss 0.6667 LearningRate 0.0070 Epoch: 17 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:02:43,001-Speed 5615.58 samples/sec Loss 0.6642 LearningRate 0.0070 Epoch: 17 Global Step: 73520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:02:50,323-Speed 5594.76 samples/sec Loss 0.6657 LearningRate 0.0070 Epoch: 17 Global Step: 73530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:02:57,873-Speed 5426.00 samples/sec Loss 0.6729 LearningRate 0.0070 Epoch: 17 Global Step: 73540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:05,259-Speed 5546.65 samples/sec Loss 0.6875 LearningRate 0.0069 Epoch: 17 Global Step: 73550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:12,984-Speed 5302.97 samples/sec Loss 0.6721 LearningRate 0.0069 Epoch: 17 Global Step: 73560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:20,643-Speed 5348.38 samples/sec Loss 0.6489 LearningRate 0.0069 Epoch: 17 Global Step: 73570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:28,026-Speed 5548.96 samples/sec Loss 0.6656 LearningRate 0.0069 Epoch: 17 Global Step: 73580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:35,364-Speed 5582.67 samples/sec Loss 0.6764 LearningRate 0.0069 Epoch: 17 Global Step: 73590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:42,926-Speed 5417.36 samples/sec Loss 0.6563 LearningRate 0.0069 Epoch: 17 Global Step: 73600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:50,679-Speed 5283.51 samples/sec Loss 0.6661 LearningRate 0.0069 Epoch: 17 Global Step: 73610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:03:58,085-Speed 5531.90 samples/sec Loss 0.6699 LearningRate 0.0068 Epoch: 17 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:05,493-Speed 5529.69 samples/sec Loss 0.6525 LearningRate 0.0068 Epoch: 17 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:12,799-Speed 5607.20 samples/sec Loss 0.6607 LearningRate 0.0068 Epoch: 17 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:20,092-Speed 5616.96 samples/sec Loss 0.6486 LearningRate 0.0068 Epoch: 17 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:27,482-Speed 5543.39 samples/sec Loss 0.6578 LearningRate 0.0068 Epoch: 17 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:34,900-Speed 5522.30 samples/sec Loss 0.6514 LearningRate 0.0068 Epoch: 17 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:42,481-Speed 5404.21 samples/sec Loss 0.6739 LearningRate 0.0068 Epoch: 17 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:50,117-Speed 5364.46 samples/sec Loss 0.6539 LearningRate 0.0067 Epoch: 17 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:04:57,760-Speed 5360.52 samples/sec Loss 0.6526 LearningRate 0.0067 Epoch: 17 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:05:05,243-Speed 5474.05 samples/sec Loss 0.6670 LearningRate 0.0067 Epoch: 17 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:05:12,561-Speed 5597.97 samples/sec Loss 0.6587 LearningRate 0.0067 Epoch: 17 Global Step: 73720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:05:19,990-Speed 5514.73 samples/sec Loss 0.6644 LearningRate 0.0067 Epoch: 17 Global Step: 73730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:05:27,517-Speed 5442.53 samples/sec Loss 0.6546 LearningRate 0.0067 Epoch: 17 Global Step: 73740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:05:34,925-Speed 5529.43 samples/sec Loss 0.6709 LearningRate 0.0067 Epoch: 17 Global Step: 73750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:05:42,371-Speed 5501.96 samples/sec Loss 0.6492 LearningRate 0.0066 Epoch: 17 Global Step: 73760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:05:49,702-Speed 5588.14 samples/sec Loss 0.6528 LearningRate 0.0066 Epoch: 17 Global Step: 73770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:05:57,061-Speed 5566.96 samples/sec Loss 0.6513 LearningRate 0.0066 Epoch: 17 Global Step: 73780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:06:04,629-Speed 5413.00 samples/sec Loss 0.6499 LearningRate 0.0066 Epoch: 17 Global Step: 73790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:06:11,977-Speed 5574.79 samples/sec Loss 0.6520 LearningRate 0.0066 Epoch: 17 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:06:19,981-Speed 5118.12 samples/sec Loss 0.6608 LearningRate 0.0066 Epoch: 17 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:06:28,010-Speed 5102.78 samples/sec Loss 0.6702 LearningRate 0.0066 Epoch: 17 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:06:35,975-Speed 5142.45 samples/sec Loss 0.6677 LearningRate 0.0066 Epoch: 17 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:06:44,079-Speed 5055.40 samples/sec Loss 0.6572 LearningRate 0.0065 Epoch: 17 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:06:52,088-Speed 5114.92 samples/sec Loss 0.6357 LearningRate 0.0065 Epoch: 17 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:07:00,035-Speed 5154.98 samples/sec Loss 0.6511 LearningRate 0.0065 Epoch: 17 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:07:08,147-Speed 5050.32 samples/sec Loss 0.6435 LearningRate 0.0065 Epoch: 17 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:07:16,187-Speed 5094.95 samples/sec Loss 0.6494 LearningRate 0.0065 Epoch: 17 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:07:24,189-Speed 5119.84 samples/sec Loss 0.6520 LearningRate 0.0065 Epoch: 17 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:07:32,213-Speed 5105.59 samples/sec Loss 0.6647 LearningRate 0.0065 Epoch: 17 Global Step: 73900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:07:40,454-Speed 4970.92 samples/sec Loss 0.6526 LearningRate 0.0064 Epoch: 17 Global Step: 73910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:07:48,260-Speed 5247.53 samples/sec Loss 0.6577 LearningRate 0.0064 Epoch: 17 Global Step: 73920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:07:55,569-Speed 5605.19 samples/sec Loss 0.6503 LearningRate 0.0064 Epoch: 17 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:03,041-Speed 5482.68 samples/sec Loss 0.6500 LearningRate 0.0064 Epoch: 17 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:10,627-Speed 5399.98 samples/sec Loss 0.6511 LearningRate 0.0064 Epoch: 17 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:18,138-Speed 5454.17 samples/sec Loss 0.6524 LearningRate 0.0064 Epoch: 17 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:25,789-Speed 5354.35 samples/sec Loss 0.6486 LearningRate 0.0064 Epoch: 17 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:33,138-Speed 5574.49 samples/sec Loss 0.6545 LearningRate 0.0063 Epoch: 17 Global Step: 73980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:40,415-Speed 5629.61 samples/sec Loss 0.6514 LearningRate 0.0063 Epoch: 17 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:47,734-Speed 5597.83 samples/sec Loss 0.6639 LearningRate 0.0063 Epoch: 17 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:08:55,026-Speed 5617.63 samples/sec Loss 0.6324 LearningRate 0.0063 Epoch: 17 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:02,341-Speed 5600.26 samples/sec Loss 0.6551 LearningRate 0.0063 Epoch: 17 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:09,633-Speed 5617.70 samples/sec Loss 0.6519 LearningRate 0.0063 Epoch: 17 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:17,207-Speed 5408.75 samples/sec Loss 0.6255 LearningRate 0.0063 Epoch: 17 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:24,791-Speed 5401.69 samples/sec Loss 0.6464 LearningRate 0.0063 Epoch: 17 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:32,207-Speed 5524.27 samples/sec Loss 0.6607 LearningRate 0.0062 Epoch: 17 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:39,517-Speed 5604.50 samples/sec Loss 0.6579 LearningRate 0.0062 Epoch: 17 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:46,871-Speed 5569.93 samples/sec Loss 0.6421 LearningRate 0.0062 Epoch: 17 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:09:54,158-Speed 5622.30 samples/sec Loss 0.6464 LearningRate 0.0062 Epoch: 17 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:10:01,609-Speed 5498.11 samples/sec Loss 0.6357 LearningRate 0.0062 Epoch: 17 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:10:08,952-Speed 5578.94 samples/sec Loss 0.6519 LearningRate 0.0062 Epoch: 17 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:10:16,330-Speed 5552.17 samples/sec Loss 0.6427 LearningRate 0.0062 Epoch: 17 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:10:23,791-Speed 5490.70 samples/sec Loss 0.6392 LearningRate 0.0061 Epoch: 17 Global Step: 74130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:10:31,117-Speed 5591.51 samples/sec Loss 0.6354 LearningRate 0.0061 Epoch: 17 Global Step: 74140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:10:38,529-Speed 5527.14 samples/sec Loss 0.6614 LearningRate 0.0061 Epoch: 17 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:10:45,827-Speed 5614.02 samples/sec Loss 0.6410 LearningRate 0.0061 Epoch: 17 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:10:53,156-Speed 5588.96 samples/sec Loss 0.6579 LearningRate 0.0061 Epoch: 17 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:00,473-Speed 5599.46 samples/sec Loss 0.6545 LearningRate 0.0061 Epoch: 17 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:07,747-Speed 5631.63 samples/sec Loss 0.6612 LearningRate 0.0061 Epoch: 17 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:15,156-Speed 5529.14 samples/sec Loss 0.6357 LearningRate 0.0061 Epoch: 17 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:22,755-Speed 5390.61 samples/sec Loss 0.6480 LearningRate 0.0060 Epoch: 17 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:30,251-Speed 5465.27 samples/sec Loss 0.6526 LearningRate 0.0060 Epoch: 17 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:37,842-Speed 5397.06 samples/sec Loss 0.6345 LearningRate 0.0060 Epoch: 17 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:45,260-Speed 5521.88 samples/sec Loss 0.6325 LearningRate 0.0060 Epoch: 17 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:11:52,675-Speed 5524.73 samples/sec Loss 0.6494 LearningRate 0.0060 Epoch: 17 Global Step: 74250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:00,058-Speed 5548.56 samples/sec Loss 0.6425 LearningRate 0.0060 Epoch: 17 Global Step: 74260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:07,379-Speed 5596.01 samples/sec Loss 0.6478 LearningRate 0.0060 Epoch: 17 Global Step: 74270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:14,730-Speed 5572.45 samples/sec Loss 0.6490 LearningRate 0.0060 Epoch: 17 Global Step: 74280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:22,406-Speed 5336.87 samples/sec Loss 0.6535 LearningRate 0.0059 Epoch: 17 Global Step: 74290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:30,017-Speed 5382.80 samples/sec Loss 0.6447 LearningRate 0.0059 Epoch: 17 Global Step: 74300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:37,326-Speed 5604.37 samples/sec Loss 0.6501 LearningRate 0.0059 Epoch: 17 Global Step: 74310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:44,705-Speed 5552.75 samples/sec Loss 0.6471 LearningRate 0.0059 Epoch: 17 Global Step: 74320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:52,045-Speed 5580.93 samples/sec Loss 0.6538 LearningRate 0.0059 Epoch: 17 Global Step: 74330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:12:59,479-Speed 5510.57 samples/sec Loss 0.6366 LearningRate 0.0059 Epoch: 17 Global Step: 74340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:06,885-Speed 5531.22 samples/sec Loss 0.6532 LearningRate 0.0059 Epoch: 17 Global Step: 74350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:14,188-Speed 5609.67 samples/sec Loss 0.6520 LearningRate 0.0058 Epoch: 17 Global Step: 74360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:21,458-Speed 5634.68 samples/sec Loss 0.6300 LearningRate 0.0058 Epoch: 17 Global Step: 74370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:28,777-Speed 5597.72 samples/sec Loss 0.6359 LearningRate 0.0058 Epoch: 17 Global Step: 74380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:36,020-Speed 5655.19 samples/sec Loss 0.6455 LearningRate 0.0058 Epoch: 17 Global Step: 74390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:43,468-Speed 5500.21 samples/sec Loss 0.6529 LearningRate 0.0058 Epoch: 17 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:50,764-Speed 5615.62 samples/sec Loss 0.6471 LearningRate 0.0058 Epoch: 17 Global Step: 74410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:13:58,069-Speed 5608.12 samples/sec Loss 0.6144 LearningRate 0.0058 Epoch: 17 Global Step: 74420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:14:05,344-Speed 5630.77 samples/sec Loss 0.6523 LearningRate 0.0058 Epoch: 17 Global Step: 74430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:14:12,651-Speed 5606.31 samples/sec Loss 0.6481 LearningRate 0.0057 Epoch: 17 Global Step: 74440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:14:20,116-Speed 5487.96 samples/sec Loss 0.6368 LearningRate 0.0057 Epoch: 17 Global Step: 74450 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 15:14:27,648-Speed 5439.27 samples/sec Loss 0.6448 LearningRate 0.0057 Epoch: 17 Global Step: 74460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:14:34,950-Speed 5610.33 samples/sec Loss 0.6318 LearningRate 0.0057 Epoch: 17 Global Step: 74470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:14:42,200-Speed 5650.06 samples/sec Loss 0.6497 LearningRate 0.0057 Epoch: 17 Global Step: 74480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:14:49,503-Speed 5609.98 samples/sec Loss 0.6554 LearningRate 0.0057 Epoch: 17 Global Step: 74490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:14:56,770-Speed 5636.63 samples/sec Loss 0.6495 LearningRate 0.0057 Epoch: 17 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:04,023-Speed 5648.37 samples/sec Loss 0.6361 LearningRate 0.0057 Epoch: 17 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:11,407-Speed 5548.27 samples/sec Loss 0.6366 LearningRate 0.0056 Epoch: 17 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:18,726-Speed 5597.68 samples/sec Loss 0.6367 LearningRate 0.0056 Epoch: 17 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:26,079-Speed 5570.71 samples/sec Loss 0.6269 LearningRate 0.0056 Epoch: 17 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:33,407-Speed 5590.38 samples/sec Loss 0.6344 LearningRate 0.0056 Epoch: 17 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:40,769-Speed 5564.91 samples/sec Loss 0.6192 LearningRate 0.0056 Epoch: 17 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:48,101-Speed 5587.43 samples/sec Loss 0.6282 LearningRate 0.0056 Epoch: 17 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:15:55,617-Speed 5450.02 samples/sec Loss 0.6540 LearningRate 0.0056 Epoch: 17 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:16:02,858-Speed 5657.32 samples/sec Loss 0.6387 LearningRate 0.0056 Epoch: 17 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:16:10,176-Speed 5598.72 samples/sec Loss 0.6460 LearningRate 0.0055 Epoch: 17 Global Step: 74600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:16:17,751-Speed 5408.01 samples/sec Loss 0.6196 LearningRate 0.0055 Epoch: 17 Global Step: 74610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:16:25,024-Speed 5632.58 samples/sec Loss 0.6459 LearningRate 0.0055 Epoch: 17 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:16:32,363-Speed 5581.80 samples/sec Loss 0.6138 LearningRate 0.0055 Epoch: 17 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:16:39,757-Speed 5540.82 samples/sec Loss 0.6359 LearningRate 0.0055 Epoch: 17 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:16:47,040-Speed 5624.27 samples/sec Loss 0.6403 LearningRate 0.0055 Epoch: 17 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:16:54,493-Speed 5497.04 samples/sec Loss 0.6284 LearningRate 0.0055 Epoch: 17 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:17:01,758-Speed 5638.66 samples/sec Loss 0.6167 LearningRate 0.0055 Epoch: 17 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:17:09,000-Speed 5656.71 samples/sec Loss 0.6389 LearningRate 0.0054 Epoch: 17 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:17:16,687-Speed 5329.23 samples/sec Loss 0.6430 LearningRate 0.0054 Epoch: 17 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:17:24,061-Speed 5555.93 samples/sec Loss 0.6240 LearningRate 0.0054 Epoch: 17 Global Step: 74700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:17:31,342-Speed 5626.25 samples/sec Loss 0.6245 LearningRate 0.0054 Epoch: 17 Global Step: 74710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:17:38,613-Speed 5634.32 samples/sec Loss 0.6444 LearningRate 0.0054 Epoch: 17 Global Step: 74720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:17:45,908-Speed 5615.35 samples/sec Loss 0.6247 LearningRate 0.0054 Epoch: 17 Global Step: 74730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:17:53,309-Speed 5534.96 samples/sec Loss 0.6398 LearningRate 0.0054 Epoch: 17 Global Step: 74740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:18:00,694-Speed 5547.42 samples/sec Loss 0.6429 LearningRate 0.0054 Epoch: 17 Global Step: 74750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:18:08,091-Speed 5538.08 samples/sec Loss 0.6024 LearningRate 0.0053 Epoch: 17 Global Step: 74760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:18:15,412-Speed 5595.82 samples/sec Loss 0.6110 LearningRate 0.0053 Epoch: 17 Global Step: 74770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:18:22,806-Speed 5540.77 samples/sec Loss 0.6331 LearningRate 0.0053 Epoch: 17 Global Step: 74780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:18:30,329-Speed 5445.02 samples/sec Loss 0.6029 LearningRate 0.0053 Epoch: 17 Global Step: 74790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-17 15:18:37,745-Speed 5524.45 samples/sec Loss 0.6152 LearningRate 0.0053 Epoch: 17 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:18:45,110-Speed 5561.72 samples/sec Loss 0.6127 LearningRate 0.0053 Epoch: 17 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:18:52,484-Speed 5555.74 samples/sec Loss 0.6264 LearningRate 0.0053 Epoch: 17 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:18:59,829-Speed 5578.46 samples/sec Loss 0.6256 LearningRate 0.0053 Epoch: 17 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:19:07,078-Speed 5650.90 samples/sec Loss 0.6316 LearningRate 0.0052 Epoch: 17 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:19:14,332-Speed 5647.32 samples/sec Loss 0.6226 LearningRate 0.0052 Epoch: 17 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:19:21,600-Speed 5636.83 samples/sec Loss 0.6296 LearningRate 0.0052 Epoch: 17 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:19:29,031-Speed 5513.34 samples/sec Loss 0.6375 LearningRate 0.0052 Epoch: 17 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:19:36,647-Speed 5378.61 samples/sec Loss 0.6199 LearningRate 0.0052 Epoch: 17 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:19:44,056-Speed 5529.26 samples/sec Loss 0.6108 LearningRate 0.0052 Epoch: 17 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:19:51,423-Speed 5560.73 samples/sec Loss 0.6237 LearningRate 0.0052 Epoch: 17 Global Step: 74900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:19:59,087-Speed 5345.11 samples/sec Loss 0.6336 LearningRate 0.0052 Epoch: 17 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:06,485-Speed 5537.56 samples/sec Loss 0.6205 LearningRate 0.0051 Epoch: 17 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:14,126-Speed 5361.31 samples/sec Loss 0.6122 LearningRate 0.0051 Epoch: 17 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:21,568-Speed 5504.50 samples/sec Loss 0.6167 LearningRate 0.0051 Epoch: 17 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:28,886-Speed 5599.02 samples/sec Loss 0.6148 LearningRate 0.0051 Epoch: 17 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:36,176-Speed 5619.53 samples/sec Loss 0.6154 LearningRate 0.0051 Epoch: 17 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:43,666-Speed 5469.03 samples/sec Loss 0.6295 LearningRate 0.0051 Epoch: 17 Global Step: 74970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:51,075-Speed 5530.08 samples/sec Loss 0.6099 LearningRate 0.0051 Epoch: 17 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:20:58,485-Speed 5528.26 samples/sec Loss 0.6207 LearningRate 0.0051 Epoch: 17 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:21:06,154-Speed 5341.91 samples/sec Loss 0.6131 LearningRate 0.0051 Epoch: 17 Global Step: 75000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:21:52,698-[lfw][75000]XNorm: 22.030906 Training: 2022-01-17 15:21:52,698-[lfw][75000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-17 15:21:52,699-[lfw][75000]Accuracy-Highest: 0.99817 Training: 2022-01-17 15:22:46,814-[cfp_fp][75000]XNorm: 21.346256 Training: 2022-01-17 15:22:46,815-[cfp_fp][75000]Accuracy-Flip: 0.99329+-0.00404 Training: 2022-01-17 15:22:46,815-[cfp_fp][75000]Accuracy-Highest: 0.99329 Training: 2022-01-17 15:23:33,393-[agedb_30][75000]XNorm: 22.524024 Training: 2022-01-17 15:23:33,394-[agedb_30][75000]Accuracy-Flip: 0.98450+-0.00633 Training: 2022-01-17 15:23:33,395-[agedb_30][75000]Accuracy-Highest: 0.98450 Training: 2022-01-17 15:23:41,110-Speed 264.33 samples/sec Loss 0.6289 LearningRate 0.0050 Epoch: 17 Global Step: 75010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:23:48,350-Speed 5658.55 samples/sec Loss 0.6200 LearningRate 0.0050 Epoch: 17 Global Step: 75020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:23:55,849-Speed 5463.16 samples/sec Loss 0.6132 LearningRate 0.0050 Epoch: 17 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:24:03,112-Speed 5640.40 samples/sec Loss 0.6314 LearningRate 0.0050 Epoch: 17 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:24:10,372-Speed 5642.40 samples/sec Loss 0.6245 LearningRate 0.0050 Epoch: 17 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:24:17,734-Speed 5564.81 samples/sec Loss 0.6229 LearningRate 0.0050 Epoch: 17 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:24:24,991-Speed 5645.20 samples/sec Loss 0.6234 LearningRate 0.0050 Epoch: 17 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:24:32,286-Speed 5615.65 samples/sec Loss 0.6069 LearningRate 0.0050 Epoch: 17 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:24:40,125-Speed 5225.59 samples/sec Loss 0.6097 LearningRate 0.0049 Epoch: 17 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:25:33,192-Speed 771.88 samples/sec Loss 0.5159 LearningRate 0.0049 Epoch: 18 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:25:40,541-Speed 5574.55 samples/sec Loss 0.3663 LearningRate 0.0049 Epoch: 18 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:25:47,949-Speed 5530.18 samples/sec Loss 0.3715 LearningRate 0.0049 Epoch: 18 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:25:55,307-Speed 5567.55 samples/sec Loss 0.3637 LearningRate 0.0049 Epoch: 18 Global Step: 75130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:02,711-Speed 5533.02 samples/sec Loss 0.3532 LearningRate 0.0049 Epoch: 18 Global Step: 75140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:10,158-Speed 5500.63 samples/sec Loss 0.3497 LearningRate 0.0049 Epoch: 18 Global Step: 75150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:17,505-Speed 5576.66 samples/sec Loss 0.3489 LearningRate 0.0049 Epoch: 18 Global Step: 75160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:25,063-Speed 5420.06 samples/sec Loss 0.3535 LearningRate 0.0049 Epoch: 18 Global Step: 75170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:32,830-Speed 5274.36 samples/sec Loss 0.3511 LearningRate 0.0048 Epoch: 18 Global Step: 75180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:40,260-Speed 5513.48 samples/sec Loss 0.3594 LearningRate 0.0048 Epoch: 18 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:47,880-Speed 5375.84 samples/sec Loss 0.3563 LearningRate 0.0048 Epoch: 18 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:26:55,307-Speed 5516.36 samples/sec Loss 0.3518 LearningRate 0.0048 Epoch: 18 Global Step: 75210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:27:02,711-Speed 5532.61 samples/sec Loss 0.3615 LearningRate 0.0048 Epoch: 18 Global Step: 75220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:27:10,251-Speed 5433.12 samples/sec Loss 0.3436 LearningRate 0.0048 Epoch: 18 Global Step: 75230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:27:17,678-Speed 5516.89 samples/sec Loss 0.3451 LearningRate 0.0048 Epoch: 18 Global Step: 75240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:27:25,122-Speed 5503.09 samples/sec Loss 0.3457 LearningRate 0.0048 Epoch: 18 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:27:32,644-Speed 5445.92 samples/sec Loss 0.3500 LearningRate 0.0047 Epoch: 18 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:27:40,170-Speed 5443.06 samples/sec Loss 0.3416 LearningRate 0.0047 Epoch: 18 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:27:47,572-Speed 5534.41 samples/sec Loss 0.3459 LearningRate 0.0047 Epoch: 18 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:27:55,029-Speed 5493.79 samples/sec Loss 0.3456 LearningRate 0.0047 Epoch: 18 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:28:02,561-Speed 5439.01 samples/sec Loss 0.3448 LearningRate 0.0047 Epoch: 18 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:28:09,988-Speed 5515.55 samples/sec Loss 0.3475 LearningRate 0.0047 Epoch: 18 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:28:17,454-Speed 5487.39 samples/sec Loss 0.3574 LearningRate 0.0047 Epoch: 18 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:28:25,196-Speed 5291.34 samples/sec Loss 0.3421 LearningRate 0.0047 Epoch: 18 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:28:32,974-Speed 5266.85 samples/sec Loss 0.3535 LearningRate 0.0047 Epoch: 18 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:28:40,891-Speed 5174.23 samples/sec Loss 0.3585 LearningRate 0.0046 Epoch: 18 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:28:48,515-Speed 5373.62 samples/sec Loss 0.3560 LearningRate 0.0046 Epoch: 18 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:28:56,599-Speed 5067.19 samples/sec Loss 0.3458 LearningRate 0.0046 Epoch: 18 Global Step: 75370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:29:04,235-Speed 5364.72 samples/sec Loss 0.3546 LearningRate 0.0046 Epoch: 18 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:29:11,634-Speed 5536.98 samples/sec Loss 0.3602 LearningRate 0.0046 Epoch: 18 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:29:19,271-Speed 5363.69 samples/sec Loss 0.3541 LearningRate 0.0046 Epoch: 18 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:29:26,724-Speed 5496.98 samples/sec Loss 0.3480 LearningRate 0.0046 Epoch: 18 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:29:34,433-Speed 5314.60 samples/sec Loss 0.3428 LearningRate 0.0046 Epoch: 18 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:29:42,126-Speed 5324.35 samples/sec Loss 0.3541 LearningRate 0.0046 Epoch: 18 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:29:50,033-Speed 5181.46 samples/sec Loss 0.3524 LearningRate 0.0045 Epoch: 18 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:29:57,437-Speed 5532.85 samples/sec Loss 0.3550 LearningRate 0.0045 Epoch: 18 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:30:04,909-Speed 5482.17 samples/sec Loss 0.3523 LearningRate 0.0045 Epoch: 18 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:30:12,328-Speed 5521.82 samples/sec Loss 0.3454 LearningRate 0.0045 Epoch: 18 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:30:19,754-Speed 5516.63 samples/sec Loss 0.3490 LearningRate 0.0045 Epoch: 18 Global Step: 75480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:30:27,447-Speed 5325.68 samples/sec Loss 0.3542 LearningRate 0.0045 Epoch: 18 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:30:34,857-Speed 5528.13 samples/sec Loss 0.3567 LearningRate 0.0045 Epoch: 18 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:30:42,392-Speed 5436.57 samples/sec Loss 0.3464 LearningRate 0.0045 Epoch: 18 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:30:50,106-Speed 5310.20 samples/sec Loss 0.3497 LearningRate 0.0044 Epoch: 18 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:30:58,432-Speed 4920.63 samples/sec Loss 0.3545 LearningRate 0.0044 Epoch: 18 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:31:06,106-Speed 5337.80 samples/sec Loss 0.3544 LearningRate 0.0044 Epoch: 18 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:31:13,841-Speed 5295.94 samples/sec Loss 0.3493 LearningRate 0.0044 Epoch: 18 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:31:21,343-Speed 5461.30 samples/sec Loss 0.3521 LearningRate 0.0044 Epoch: 18 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:31:28,719-Speed 5553.89 samples/sec Loss 0.3593 LearningRate 0.0044 Epoch: 18 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:31:36,131-Speed 5527.32 samples/sec Loss 0.3350 LearningRate 0.0044 Epoch: 18 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:31:43,530-Speed 5536.51 samples/sec Loss 0.3608 LearningRate 0.0044 Epoch: 18 Global Step: 75590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:31:51,041-Speed 5454.04 samples/sec Loss 0.3550 LearningRate 0.0044 Epoch: 18 Global Step: 75600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:31:58,627-Speed 5399.77 samples/sec Loss 0.3603 LearningRate 0.0043 Epoch: 18 Global Step: 75610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:32:06,153-Speed 5443.58 samples/sec Loss 0.3505 LearningRate 0.0043 Epoch: 18 Global Step: 75620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:32:13,715-Speed 5417.47 samples/sec Loss 0.3475 LearningRate 0.0043 Epoch: 18 Global Step: 75630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:32:21,204-Speed 5469.80 samples/sec Loss 0.3554 LearningRate 0.0043 Epoch: 18 Global Step: 75640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:32:28,686-Speed 5475.59 samples/sec Loss 0.3492 LearningRate 0.0043 Epoch: 18 Global Step: 75650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:32:36,337-Speed 5354.11 samples/sec Loss 0.3561 LearningRate 0.0043 Epoch: 18 Global Step: 75660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:32:43,713-Speed 5554.08 samples/sec Loss 0.3425 LearningRate 0.0043 Epoch: 18 Global Step: 75670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:32:51,105-Speed 5541.35 samples/sec Loss 0.3522 LearningRate 0.0043 Epoch: 18 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:32:58,469-Speed 5564.03 samples/sec Loss 0.3510 LearningRate 0.0043 Epoch: 18 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:06,014-Speed 5429.60 samples/sec Loss 0.3548 LearningRate 0.0042 Epoch: 18 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:13,378-Speed 5563.26 samples/sec Loss 0.3542 LearningRate 0.0042 Epoch: 18 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:20,837-Speed 5491.69 samples/sec Loss 0.3559 LearningRate 0.0042 Epoch: 18 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:28,313-Speed 5480.22 samples/sec Loss 0.3426 LearningRate 0.0042 Epoch: 18 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:35,637-Speed 5593.16 samples/sec Loss 0.3498 LearningRate 0.0042 Epoch: 18 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:43,079-Speed 5504.77 samples/sec Loss 0.3517 LearningRate 0.0042 Epoch: 18 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:50,506-Speed 5515.36 samples/sec Loss 0.3495 LearningRate 0.0042 Epoch: 18 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:33:57,908-Speed 5535.05 samples/sec Loss 0.3469 LearningRate 0.0042 Epoch: 18 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:34:05,309-Speed 5534.81 samples/sec Loss 0.3486 LearningRate 0.0042 Epoch: 18 Global Step: 75780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:34:12,677-Speed 5560.21 samples/sec Loss 0.3397 LearningRate 0.0042 Epoch: 18 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:34:20,051-Speed 5555.19 samples/sec Loss 0.3512 LearningRate 0.0041 Epoch: 18 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:34:27,423-Speed 5557.39 samples/sec Loss 0.3594 LearningRate 0.0041 Epoch: 18 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:34:34,828-Speed 5532.08 samples/sec Loss 0.3472 LearningRate 0.0041 Epoch: 18 Global Step: 75820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:34:42,271-Speed 5503.93 samples/sec Loss 0.3465 LearningRate 0.0041 Epoch: 18 Global Step: 75830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:34:49,914-Speed 5359.79 samples/sec Loss 0.3472 LearningRate 0.0041 Epoch: 18 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:34:57,707-Speed 5257.16 samples/sec Loss 0.3599 LearningRate 0.0041 Epoch: 18 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:05,256-Speed 5426.49 samples/sec Loss 0.3513 LearningRate 0.0041 Epoch: 18 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:12,663-Speed 5530.28 samples/sec Loss 0.3522 LearningRate 0.0041 Epoch: 18 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:20,100-Speed 5508.61 samples/sec Loss 0.3522 LearningRate 0.0041 Epoch: 18 Global Step: 75880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:27,634-Speed 5437.61 samples/sec Loss 0.3404 LearningRate 0.0040 Epoch: 18 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:35,031-Speed 5537.47 samples/sec Loss 0.3532 LearningRate 0.0040 Epoch: 18 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:42,446-Speed 5524.88 samples/sec Loss 0.3632 LearningRate 0.0040 Epoch: 18 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:49,863-Speed 5523.34 samples/sec Loss 0.3542 LearningRate 0.0040 Epoch: 18 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:35:57,402-Speed 5434.02 samples/sec Loss 0.3634 LearningRate 0.0040 Epoch: 18 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:36:04,803-Speed 5534.97 samples/sec Loss 0.3673 LearningRate 0.0040 Epoch: 18 Global Step: 75940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:36:12,169-Speed 5561.86 samples/sec Loss 0.3428 LearningRate 0.0040 Epoch: 18 Global Step: 75950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:36:19,541-Speed 5556.46 samples/sec Loss 0.3509 LearningRate 0.0040 Epoch: 18 Global Step: 75960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:36:26,902-Speed 5565.36 samples/sec Loss 0.3545 LearningRate 0.0040 Epoch: 18 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:36:34,301-Speed 5537.11 samples/sec Loss 0.3561 LearningRate 0.0039 Epoch: 18 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:36:41,721-Speed 5521.66 samples/sec Loss 0.3692 LearningRate 0.0039 Epoch: 18 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:36:49,190-Speed 5484.65 samples/sec Loss 0.3511 LearningRate 0.0039 Epoch: 18 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:36:56,664-Speed 5480.70 samples/sec Loss 0.3472 LearningRate 0.0039 Epoch: 18 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:37:04,088-Speed 5518.76 samples/sec Loss 0.3538 LearningRate 0.0039 Epoch: 18 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:37:11,479-Speed 5542.76 samples/sec Loss 0.3579 LearningRate 0.0039 Epoch: 18 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:37:18,885-Speed 5530.87 samples/sec Loss 0.3553 LearningRate 0.0039 Epoch: 18 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:37:26,298-Speed 5526.53 samples/sec Loss 0.3444 LearningRate 0.0039 Epoch: 18 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:37:33,737-Speed 5507.17 samples/sec Loss 0.3508 LearningRate 0.0039 Epoch: 18 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:37:41,166-Speed 5514.29 samples/sec Loss 0.3631 LearningRate 0.0039 Epoch: 18 Global Step: 76070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:37:48,585-Speed 5521.55 samples/sec Loss 0.3520 LearningRate 0.0038 Epoch: 18 Global Step: 76080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:37:55,946-Speed 5565.29 samples/sec Loss 0.3573 LearningRate 0.0038 Epoch: 18 Global Step: 76090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:03,341-Speed 5539.12 samples/sec Loss 0.3519 LearningRate 0.0038 Epoch: 18 Global Step: 76100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:10,704-Speed 5564.25 samples/sec Loss 0.3563 LearningRate 0.0038 Epoch: 18 Global Step: 76110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:18,092-Speed 5544.67 samples/sec Loss 0.3599 LearningRate 0.0038 Epoch: 18 Global Step: 76120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:25,592-Speed 5462.30 samples/sec Loss 0.3707 LearningRate 0.0038 Epoch: 18 Global Step: 76130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:33,075-Speed 5475.72 samples/sec Loss 0.3617 LearningRate 0.0038 Epoch: 18 Global Step: 76140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:40,459-Speed 5547.55 samples/sec Loss 0.3617 LearningRate 0.0038 Epoch: 18 Global Step: 76150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:47,877-Speed 5521.82 samples/sec Loss 0.3434 LearningRate 0.0038 Epoch: 18 Global Step: 76160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:38:55,336-Speed 5492.81 samples/sec Loss 0.3591 LearningRate 0.0037 Epoch: 18 Global Step: 76170 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 15:39:02,829-Speed 5467.32 samples/sec Loss 0.3616 LearningRate 0.0037 Epoch: 18 Global Step: 76180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:39:10,298-Speed 5484.31 samples/sec Loss 0.3631 LearningRate 0.0037 Epoch: 18 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:39:17,899-Speed 5389.68 samples/sec Loss 0.3678 LearningRate 0.0037 Epoch: 18 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:39:25,358-Speed 5492.28 samples/sec Loss 0.3564 LearningRate 0.0037 Epoch: 18 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:39:32,863-Speed 5458.32 samples/sec Loss 0.3554 LearningRate 0.0037 Epoch: 18 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:39:40,433-Speed 5411.65 samples/sec Loss 0.3558 LearningRate 0.0037 Epoch: 18 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:39:48,112-Speed 5334.48 samples/sec Loss 0.3602 LearningRate 0.0037 Epoch: 18 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:39:55,717-Speed 5386.45 samples/sec Loss 0.3433 LearningRate 0.0037 Epoch: 18 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:40:03,375-Speed 5349.61 samples/sec Loss 0.3576 LearningRate 0.0037 Epoch: 18 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:40:10,744-Speed 5559.26 samples/sec Loss 0.3624 LearningRate 0.0036 Epoch: 18 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:40:18,174-Speed 5512.96 samples/sec Loss 0.3578 LearningRate 0.0036 Epoch: 18 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:40:25,931-Speed 5281.61 samples/sec Loss 0.3572 LearningRate 0.0036 Epoch: 18 Global Step: 76290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:40:33,596-Speed 5344.33 samples/sec Loss 0.3521 LearningRate 0.0036 Epoch: 18 Global Step: 76300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:40:41,137-Speed 5432.37 samples/sec Loss 0.3505 LearningRate 0.0036 Epoch: 18 Global Step: 76310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:40:48,603-Speed 5486.29 samples/sec Loss 0.3467 LearningRate 0.0036 Epoch: 18 Global Step: 76320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:40:55,976-Speed 5556.30 samples/sec Loss 0.3457 LearningRate 0.0036 Epoch: 18 Global Step: 76330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:41:03,626-Speed 5355.41 samples/sec Loss 0.3594 LearningRate 0.0036 Epoch: 18 Global Step: 76340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:41:11,162-Speed 5435.47 samples/sec Loss 0.3594 LearningRate 0.0036 Epoch: 18 Global Step: 76350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:41:18,529-Speed 5560.47 samples/sec Loss 0.3596 LearningRate 0.0036 Epoch: 18 Global Step: 76360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:41:25,952-Speed 5519.47 samples/sec Loss 0.3516 LearningRate 0.0035 Epoch: 18 Global Step: 76370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:41:33,378-Speed 5517.08 samples/sec Loss 0.3540 LearningRate 0.0035 Epoch: 18 Global Step: 76380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:41:40,864-Speed 5472.26 samples/sec Loss 0.3603 LearningRate 0.0035 Epoch: 18 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:41:48,363-Speed 5462.76 samples/sec Loss 0.3644 LearningRate 0.0035 Epoch: 18 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:41:55,714-Speed 5572.35 samples/sec Loss 0.3447 LearningRate 0.0035 Epoch: 18 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:03,123-Speed 5529.55 samples/sec Loss 0.3515 LearningRate 0.0035 Epoch: 18 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:10,567-Speed 5503.46 samples/sec Loss 0.3652 LearningRate 0.0035 Epoch: 18 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:17,979-Speed 5526.89 samples/sec Loss 0.3595 LearningRate 0.0035 Epoch: 18 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:25,668-Speed 5327.90 samples/sec Loss 0.3501 LearningRate 0.0035 Epoch: 18 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:33,034-Speed 5561.40 samples/sec Loss 0.3512 LearningRate 0.0035 Epoch: 18 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:40,520-Speed 5472.67 samples/sec Loss 0.3511 LearningRate 0.0034 Epoch: 18 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:48,006-Speed 5471.64 samples/sec Loss 0.3601 LearningRate 0.0034 Epoch: 18 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:42:55,401-Speed 5540.30 samples/sec Loss 0.3534 LearningRate 0.0034 Epoch: 18 Global Step: 76490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:43:02,951-Speed 5426.02 samples/sec Loss 0.3551 LearningRate 0.0034 Epoch: 18 Global Step: 76500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:43:10,401-Speed 5498.72 samples/sec Loss 0.3448 LearningRate 0.0034 Epoch: 18 Global Step: 76510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:43:17,802-Speed 5534.89 samples/sec Loss 0.3507 LearningRate 0.0034 Epoch: 18 Global Step: 76520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:43:25,227-Speed 5516.84 samples/sec Loss 0.3588 LearningRate 0.0034 Epoch: 18 Global Step: 76530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:43:32,762-Speed 5437.11 samples/sec Loss 0.3576 LearningRate 0.0034 Epoch: 18 Global Step: 76540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:43:40,177-Speed 5525.13 samples/sec Loss 0.3563 LearningRate 0.0034 Epoch: 18 Global Step: 76550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:43:47,533-Speed 5568.57 samples/sec Loss 0.3566 LearningRate 0.0034 Epoch: 18 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:43:54,944-Speed 5527.73 samples/sec Loss 0.3483 LearningRate 0.0033 Epoch: 18 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:02,392-Speed 5500.00 samples/sec Loss 0.3625 LearningRate 0.0033 Epoch: 18 Global Step: 76580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:09,795-Speed 5533.59 samples/sec Loss 0.3534 LearningRate 0.0033 Epoch: 18 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:17,171-Speed 5554.21 samples/sec Loss 0.3525 LearningRate 0.0033 Epoch: 18 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:24,734-Speed 5416.57 samples/sec Loss 0.3419 LearningRate 0.0033 Epoch: 18 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:32,082-Speed 5575.34 samples/sec Loss 0.3511 LearningRate 0.0033 Epoch: 18 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:39,415-Speed 5586.39 samples/sec Loss 0.3470 LearningRate 0.0033 Epoch: 18 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:46,767-Speed 5572.15 samples/sec Loss 0.3608 LearningRate 0.0033 Epoch: 18 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:44:54,237-Speed 5483.58 samples/sec Loss 0.3550 LearningRate 0.0033 Epoch: 18 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:45:01,563-Speed 5591.90 samples/sec Loss 0.3410 LearningRate 0.0033 Epoch: 18 Global Step: 76660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:45:08,898-Speed 5585.35 samples/sec Loss 0.3582 LearningRate 0.0033 Epoch: 18 Global Step: 76670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:45:16,210-Speed 5602.62 samples/sec Loss 0.3498 LearningRate 0.0032 Epoch: 18 Global Step: 76680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:45:23,608-Speed 5537.07 samples/sec Loss 0.3602 LearningRate 0.0032 Epoch: 18 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:45:31,102-Speed 5466.94 samples/sec Loss 0.3517 LearningRate 0.0032 Epoch: 18 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:45:38,429-Speed 5590.96 samples/sec Loss 0.3609 LearningRate 0.0032 Epoch: 18 Global Step: 76710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:45:46,011-Speed 5403.35 samples/sec Loss 0.3560 LearningRate 0.0032 Epoch: 18 Global Step: 76720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:45:53,377-Speed 5561.35 samples/sec Loss 0.3652 LearningRate 0.0032 Epoch: 18 Global Step: 76730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:46:00,717-Speed 5581.02 samples/sec Loss 0.3533 LearningRate 0.0032 Epoch: 18 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:08,067-Speed 5573.63 samples/sec Loss 0.3696 LearningRate 0.0032 Epoch: 18 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:15,447-Speed 5551.21 samples/sec Loss 0.3492 LearningRate 0.0032 Epoch: 18 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:22,798-Speed 5572.30 samples/sec Loss 0.3444 LearningRate 0.0032 Epoch: 18 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:30,118-Speed 5596.90 samples/sec Loss 0.3559 LearningRate 0.0031 Epoch: 18 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:37,439-Speed 5595.90 samples/sec Loss 0.3500 LearningRate 0.0031 Epoch: 18 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:44,762-Speed 5593.83 samples/sec Loss 0.3605 LearningRate 0.0031 Epoch: 18 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:52,148-Speed 5547.17 samples/sec Loss 0.3490 LearningRate 0.0031 Epoch: 18 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:46:59,591-Speed 5503.71 samples/sec Loss 0.3620 LearningRate 0.0031 Epoch: 18 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:06,906-Speed 5600.70 samples/sec Loss 0.3515 LearningRate 0.0031 Epoch: 18 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:14,292-Speed 5546.34 samples/sec Loss 0.3514 LearningRate 0.0031 Epoch: 18 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:21,948-Speed 5350.55 samples/sec Loss 0.3389 LearningRate 0.0031 Epoch: 18 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:29,462-Speed 5452.23 samples/sec Loss 0.3559 LearningRate 0.0031 Epoch: 18 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:36,809-Speed 5575.77 samples/sec Loss 0.3622 LearningRate 0.0031 Epoch: 18 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:44,185-Speed 5554.52 samples/sec Loss 0.3530 LearningRate 0.0031 Epoch: 18 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:51,509-Speed 5593.94 samples/sec Loss 0.3471 LearningRate 0.0030 Epoch: 18 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:47:58,866-Speed 5567.95 samples/sec Loss 0.3503 LearningRate 0.0030 Epoch: 18 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:48:06,391-Speed 5444.01 samples/sec Loss 0.3559 LearningRate 0.0030 Epoch: 18 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:48:13,916-Speed 5444.49 samples/sec Loss 0.3528 LearningRate 0.0030 Epoch: 18 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:48:21,834-Speed 5173.04 samples/sec Loss 0.3480 LearningRate 0.0030 Epoch: 18 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:48:29,836-Speed 5119.38 samples/sec Loss 0.3480 LearningRate 0.0030 Epoch: 18 Global Step: 76940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:48:37,833-Speed 5122.95 samples/sec Loss 0.3490 LearningRate 0.0030 Epoch: 18 Global Step: 76950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:48:45,815-Speed 5132.16 samples/sec Loss 0.3540 LearningRate 0.0030 Epoch: 18 Global Step: 76960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:48:53,839-Speed 5105.52 samples/sec Loss 0.3560 LearningRate 0.0030 Epoch: 18 Global Step: 76970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:49:01,887-Speed 5090.17 samples/sec Loss 0.3643 LearningRate 0.0030 Epoch: 18 Global Step: 76980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:49:09,917-Speed 5100.88 samples/sec Loss 0.3568 LearningRate 0.0030 Epoch: 18 Global Step: 76990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:49:18,103-Speed 5004.49 samples/sec Loss 0.3474 LearningRate 0.0029 Epoch: 18 Global Step: 77000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:49:26,110-Speed 5116.21 samples/sec Loss 0.3534 LearningRate 0.0029 Epoch: 18 Global Step: 77010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:49:34,294-Speed 5005.45 samples/sec Loss 0.3664 LearningRate 0.0029 Epoch: 18 Global Step: 77020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:49:42,625-Speed 4917.66 samples/sec Loss 0.3573 LearningRate 0.0029 Epoch: 18 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 15:49:50,576-Speed 5152.84 samples/sec Loss 0.3480 LearningRate 0.0029 Epoch: 18 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:49:58,013-Speed 5508.74 samples/sec Loss 0.3507 LearningRate 0.0029 Epoch: 18 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:50:05,306-Speed 5617.08 samples/sec Loss 0.3549 LearningRate 0.0029 Epoch: 18 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:50:12,827-Speed 5446.23 samples/sec Loss 0.3477 LearningRate 0.0029 Epoch: 18 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:50:20,181-Speed 5570.85 samples/sec Loss 0.3557 LearningRate 0.0029 Epoch: 18 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:50:27,640-Speed 5492.14 samples/sec Loss 0.3560 LearningRate 0.0029 Epoch: 18 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 15:50:35,086-Speed 5502.13 samples/sec Loss 0.3505 LearningRate 0.0029 Epoch: 18 Global Step: 77100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:50:42,478-Speed 5541.28 samples/sec Loss 0.3492 LearningRate 0.0028 Epoch: 18 Global Step: 77110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:50:49,888-Speed 5529.03 samples/sec Loss 0.3552 LearningRate 0.0028 Epoch: 18 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:50:57,783-Speed 5188.51 samples/sec Loss 0.3540 LearningRate 0.0028 Epoch: 18 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:51:05,851-Speed 5078.06 samples/sec Loss 0.3507 LearningRate 0.0028 Epoch: 18 Global Step: 77140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:51:13,591-Speed 5292.36 samples/sec Loss 0.3502 LearningRate 0.0028 Epoch: 18 Global Step: 77150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:51:21,009-Speed 5522.28 samples/sec Loss 0.3448 LearningRate 0.0028 Epoch: 18 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:51:28,456-Speed 5500.78 samples/sec Loss 0.3466 LearningRate 0.0028 Epoch: 18 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:51:35,896-Speed 5506.40 samples/sec Loss 0.3496 LearningRate 0.0028 Epoch: 18 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:51:43,260-Speed 5562.99 samples/sec Loss 0.3462 LearningRate 0.0028 Epoch: 18 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:51:50,597-Speed 5583.40 samples/sec Loss 0.3503 LearningRate 0.0028 Epoch: 18 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:51:57,918-Speed 5595.86 samples/sec Loss 0.3559 LearningRate 0.0028 Epoch: 18 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:52:05,285-Speed 5560.42 samples/sec Loss 0.3515 LearningRate 0.0027 Epoch: 18 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:52:12,764-Speed 5477.28 samples/sec Loss 0.3382 LearningRate 0.0027 Epoch: 18 Global Step: 77230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:52:20,191-Speed 5516.10 samples/sec Loss 0.3498 LearningRate 0.0027 Epoch: 18 Global Step: 77240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:52:27,605-Speed 5531.32 samples/sec Loss 0.3625 LearningRate 0.0027 Epoch: 18 Global Step: 77250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:52:34,949-Speed 5578.16 samples/sec Loss 0.3450 LearningRate 0.0027 Epoch: 18 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:52:42,295-Speed 5576.52 samples/sec Loss 0.3579 LearningRate 0.0027 Epoch: 18 Global Step: 77270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:52:49,710-Speed 5525.15 samples/sec Loss 0.3455 LearningRate 0.0027 Epoch: 18 Global Step: 77280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:52:57,222-Speed 5453.67 samples/sec Loss 0.3547 LearningRate 0.0027 Epoch: 18 Global Step: 77290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:53:04,602-Speed 5550.82 samples/sec Loss 0.3558 LearningRate 0.0027 Epoch: 18 Global Step: 77300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:53:11,940-Speed 5582.75 samples/sec Loss 0.3428 LearningRate 0.0027 Epoch: 18 Global Step: 77310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:53:19,594-Speed 5352.35 samples/sec Loss 0.3554 LearningRate 0.0027 Epoch: 18 Global Step: 77320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:53:27,035-Speed 5505.33 samples/sec Loss 0.3568 LearningRate 0.0026 Epoch: 18 Global Step: 77330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:53:34,408-Speed 5556.71 samples/sec Loss 0.3500 LearningRate 0.0026 Epoch: 18 Global Step: 77340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:53:41,918-Speed 5455.04 samples/sec Loss 0.3456 LearningRate 0.0026 Epoch: 18 Global Step: 77350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:53:49,458-Speed 5433.05 samples/sec Loss 0.3480 LearningRate 0.0026 Epoch: 18 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:53:56,845-Speed 5545.54 samples/sec Loss 0.3503 LearningRate 0.0026 Epoch: 18 Global Step: 77370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:04,227-Speed 5549.42 samples/sec Loss 0.3471 LearningRate 0.0026 Epoch: 18 Global Step: 77380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:11,677-Speed 5498.35 samples/sec Loss 0.3495 LearningRate 0.0026 Epoch: 18 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:19,091-Speed 5526.06 samples/sec Loss 0.3551 LearningRate 0.0026 Epoch: 18 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:26,527-Speed 5508.56 samples/sec Loss 0.3398 LearningRate 0.0026 Epoch: 18 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:33,875-Speed 5575.70 samples/sec Loss 0.3470 LearningRate 0.0026 Epoch: 18 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:41,339-Speed 5488.69 samples/sec Loss 0.3587 LearningRate 0.0026 Epoch: 18 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:48,684-Speed 5577.05 samples/sec Loss 0.3427 LearningRate 0.0026 Epoch: 18 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:54:56,170-Speed 5472.59 samples/sec Loss 0.3499 LearningRate 0.0025 Epoch: 18 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:55:03,571-Speed 5535.88 samples/sec Loss 0.3442 LearningRate 0.0025 Epoch: 18 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:55:11,046-Speed 5480.43 samples/sec Loss 0.3368 LearningRate 0.0025 Epoch: 18 Global Step: 77470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:55:18,374-Speed 5589.65 samples/sec Loss 0.3529 LearningRate 0.0025 Epoch: 18 Global Step: 77480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:55:25,685-Speed 5603.90 samples/sec Loss 0.3361 LearningRate 0.0025 Epoch: 18 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:55:33,012-Speed 5590.97 samples/sec Loss 0.3487 LearningRate 0.0025 Epoch: 18 Global Step: 77500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:55:40,437-Speed 5517.57 samples/sec Loss 0.3656 LearningRate 0.0025 Epoch: 18 Global Step: 77510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:55:47,782-Speed 5577.62 samples/sec Loss 0.3608 LearningRate 0.0025 Epoch: 18 Global Step: 77520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:55:55,149-Speed 5560.39 samples/sec Loss 0.3427 LearningRate 0.0025 Epoch: 18 Global Step: 77530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:02,529-Speed 5550.80 samples/sec Loss 0.3478 LearningRate 0.0025 Epoch: 18 Global Step: 77540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:09,883-Speed 5571.25 samples/sec Loss 0.3525 LearningRate 0.0025 Epoch: 18 Global Step: 77550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:17,325-Speed 5504.61 samples/sec Loss 0.3557 LearningRate 0.0025 Epoch: 18 Global Step: 77560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:24,580-Speed 5646.09 samples/sec Loss 0.3387 LearningRate 0.0024 Epoch: 18 Global Step: 77570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:31,986-Speed 5532.16 samples/sec Loss 0.3379 LearningRate 0.0024 Epoch: 18 Global Step: 77580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:39,604-Speed 5377.52 samples/sec Loss 0.3439 LearningRate 0.0024 Epoch: 18 Global Step: 77590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:47,100-Speed 5464.93 samples/sec Loss 0.3452 LearningRate 0.0024 Epoch: 18 Global Step: 77600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:56:54,481-Speed 5550.39 samples/sec Loss 0.3504 LearningRate 0.0024 Epoch: 18 Global Step: 77610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:57:02,100-Speed 5376.35 samples/sec Loss 0.3552 LearningRate 0.0024 Epoch: 18 Global Step: 77620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:57:09,677-Speed 5406.55 samples/sec Loss 0.3678 LearningRate 0.0024 Epoch: 18 Global Step: 77630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:57:17,137-Speed 5492.02 samples/sec Loss 0.3499 LearningRate 0.0024 Epoch: 18 Global Step: 77640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:57:24,785-Speed 5356.05 samples/sec Loss 0.3482 LearningRate 0.0024 Epoch: 18 Global Step: 77650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:57:32,192-Speed 5531.03 samples/sec Loss 0.3612 LearningRate 0.0024 Epoch: 18 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:57:39,532-Speed 5580.90 samples/sec Loss 0.3499 LearningRate 0.0024 Epoch: 18 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:57:46,893-Speed 5565.24 samples/sec Loss 0.3448 LearningRate 0.0024 Epoch: 18 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:57:54,208-Speed 5600.05 samples/sec Loss 0.3661 LearningRate 0.0023 Epoch: 18 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:58:01,574-Speed 5561.78 samples/sec Loss 0.3509 LearningRate 0.0023 Epoch: 18 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:58:08,938-Speed 5563.25 samples/sec Loss 0.3455 LearningRate 0.0023 Epoch: 18 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:58:16,368-Speed 5513.62 samples/sec Loss 0.3481 LearningRate 0.0023 Epoch: 18 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:58:23,747-Speed 5551.70 samples/sec Loss 0.3429 LearningRate 0.0023 Epoch: 18 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:58:31,047-Speed 5612.04 samples/sec Loss 0.3494 LearningRate 0.0023 Epoch: 18 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:58:38,311-Speed 5639.80 samples/sec Loss 0.3474 LearningRate 0.0023 Epoch: 18 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:58:45,651-Speed 5580.82 samples/sec Loss 0.3528 LearningRate 0.0023 Epoch: 18 Global Step: 77760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 15:58:52,975-Speed 5593.90 samples/sec Loss 0.3577 LearningRate 0.0023 Epoch: 18 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:00,291-Speed 5598.73 samples/sec Loss 0.3432 LearningRate 0.0023 Epoch: 18 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:07,656-Speed 5562.80 samples/sec Loss 0.3464 LearningRate 0.0023 Epoch: 18 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:14,941-Speed 5623.14 samples/sec Loss 0.3429 LearningRate 0.0023 Epoch: 18 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:22,189-Speed 5651.93 samples/sec Loss 0.3413 LearningRate 0.0022 Epoch: 18 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:29,577-Speed 5545.48 samples/sec Loss 0.3230 LearningRate 0.0022 Epoch: 18 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:37,017-Speed 5505.53 samples/sec Loss 0.3511 LearningRate 0.0022 Epoch: 18 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:44,369-Speed 5572.69 samples/sec Loss 0.3570 LearningRate 0.0022 Epoch: 18 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:51,737-Speed 5559.82 samples/sec Loss 0.3565 LearningRate 0.0022 Epoch: 18 Global Step: 77850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 15:59:59,046-Speed 5604.95 samples/sec Loss 0.3448 LearningRate 0.0022 Epoch: 18 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:00:06,390-Speed 5577.89 samples/sec Loss 0.3348 LearningRate 0.0022 Epoch: 18 Global Step: 77870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:00:13,887-Speed 5464.26 samples/sec Loss 0.3659 LearningRate 0.0022 Epoch: 18 Global Step: 77880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:00:21,221-Speed 5585.80 samples/sec Loss 0.3400 LearningRate 0.0022 Epoch: 18 Global Step: 77890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:00:28,675-Speed 5495.66 samples/sec Loss 0.3391 LearningRate 0.0022 Epoch: 18 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:00:35,987-Speed 5602.97 samples/sec Loss 0.3431 LearningRate 0.0022 Epoch: 18 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:00:43,309-Speed 5594.55 samples/sec Loss 0.3356 LearningRate 0.0022 Epoch: 18 Global Step: 77920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:00:50,701-Speed 5541.60 samples/sec Loss 0.3536 LearningRate 0.0022 Epoch: 18 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:00:58,092-Speed 5543.22 samples/sec Loss 0.3493 LearningRate 0.0021 Epoch: 18 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:05,434-Speed 5579.36 samples/sec Loss 0.3462 LearningRate 0.0021 Epoch: 18 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:12,790-Speed 5568.68 samples/sec Loss 0.3402 LearningRate 0.0021 Epoch: 18 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:20,197-Speed 5531.24 samples/sec Loss 0.3413 LearningRate 0.0021 Epoch: 18 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:27,907-Speed 5312.90 samples/sec Loss 0.3455 LearningRate 0.0021 Epoch: 18 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:35,531-Speed 5373.12 samples/sec Loss 0.3389 LearningRate 0.0021 Epoch: 18 Global Step: 77990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:42,990-Speed 5492.27 samples/sec Loss 0.3412 LearningRate 0.0021 Epoch: 18 Global Step: 78000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:50,352-Speed 5564.58 samples/sec Loss 0.3411 LearningRate 0.0021 Epoch: 18 Global Step: 78010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:01:57,751-Speed 5536.20 samples/sec Loss 0.3357 LearningRate 0.0021 Epoch: 18 Global Step: 78020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:02:05,411-Speed 5348.33 samples/sec Loss 0.3393 LearningRate 0.0021 Epoch: 18 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:02:12,974-Speed 5416.60 samples/sec Loss 0.3485 LearningRate 0.0021 Epoch: 18 Global Step: 78040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:02:20,637-Speed 5346.25 samples/sec Loss 0.3426 LearningRate 0.0021 Epoch: 18 Global Step: 78050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:02:28,715-Speed 5071.42 samples/sec Loss 0.3328 LearningRate 0.0021 Epoch: 18 Global Step: 78060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:02:36,605-Speed 5191.54 samples/sec Loss 0.3525 LearningRate 0.0020 Epoch: 18 Global Step: 78070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:02:44,066-Speed 5490.91 samples/sec Loss 0.3489 LearningRate 0.0020 Epoch: 18 Global Step: 78080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:02:51,745-Speed 5334.91 samples/sec Loss 0.3430 LearningRate 0.0020 Epoch: 18 Global Step: 78090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:02:59,070-Speed 5593.19 samples/sec Loss 0.3428 LearningRate 0.0020 Epoch: 18 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:03:06,902-Speed 5230.35 samples/sec Loss 0.3420 LearningRate 0.0020 Epoch: 18 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:03:14,622-Speed 5306.70 samples/sec Loss 0.3498 LearningRate 0.0020 Epoch: 18 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:03:22,002-Speed 5550.78 samples/sec Loss 0.3482 LearningRate 0.0020 Epoch: 18 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:03:29,338-Speed 5584.87 samples/sec Loss 0.3333 LearningRate 0.0020 Epoch: 18 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-17 16:03:36,676-Speed 5582.59 samples/sec Loss 0.3339 LearningRate 0.0020 Epoch: 18 Global Step: 78150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:03:44,072-Speed 5539.04 samples/sec Loss 0.3382 LearningRate 0.0020 Epoch: 18 Global Step: 78160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:03:51,442-Speed 5558.33 samples/sec Loss 0.3527 LearningRate 0.0020 Epoch: 18 Global Step: 78170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:03:58,782-Speed 5581.72 samples/sec Loss 0.3445 LearningRate 0.0020 Epoch: 18 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:04:06,127-Speed 5577.40 samples/sec Loss 0.3456 LearningRate 0.0020 Epoch: 18 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:04:13,466-Speed 5581.94 samples/sec Loss 0.3375 LearningRate 0.0019 Epoch: 18 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:04:20,820-Speed 5570.69 samples/sec Loss 0.3353 LearningRate 0.0019 Epoch: 18 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:04:28,198-Speed 5552.15 samples/sec Loss 0.3473 LearningRate 0.0019 Epoch: 18 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:04:35,695-Speed 5465.81 samples/sec Loss 0.3466 LearningRate 0.0019 Epoch: 18 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:04:43,090-Speed 5539.30 samples/sec Loss 0.3445 LearningRate 0.0019 Epoch: 18 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:04:50,409-Speed 5597.50 samples/sec Loss 0.3427 LearningRate 0.0019 Epoch: 18 Global Step: 78250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:04:57,812-Speed 5533.79 samples/sec Loss 0.3385 LearningRate 0.0019 Epoch: 18 Global Step: 78260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:05:05,221-Speed 5528.98 samples/sec Loss 0.3388 LearningRate 0.0019 Epoch: 18 Global Step: 78270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:05:12,614-Speed 5540.93 samples/sec Loss 0.3259 LearningRate 0.0019 Epoch: 18 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:05:19,950-Speed 5584.57 samples/sec Loss 0.3345 LearningRate 0.0019 Epoch: 18 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:05:27,264-Speed 5600.98 samples/sec Loss 0.3472 LearningRate 0.0019 Epoch: 18 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:05:34,637-Speed 5555.95 samples/sec Loss 0.3368 LearningRate 0.0019 Epoch: 18 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:05:41,990-Speed 5572.21 samples/sec Loss 0.3392 LearningRate 0.0019 Epoch: 18 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:05:49,274-Speed 5624.10 samples/sec Loss 0.3396 LearningRate 0.0019 Epoch: 18 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:05:56,643-Speed 5558.89 samples/sec Loss 0.3432 LearningRate 0.0018 Epoch: 18 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:06:03,896-Speed 5648.42 samples/sec Loss 0.3382 LearningRate 0.0018 Epoch: 18 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:06:11,281-Speed 5547.38 samples/sec Loss 0.3408 LearningRate 0.0018 Epoch: 18 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:06:18,581-Speed 5611.27 samples/sec Loss 0.3299 LearningRate 0.0018 Epoch: 18 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:06:25,934-Speed 5571.26 samples/sec Loss 0.3435 LearningRate 0.0018 Epoch: 18 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:06:33,240-Speed 5607.11 samples/sec Loss 0.3456 LearningRate 0.0018 Epoch: 18 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:06:40,676-Speed 5509.11 samples/sec Loss 0.3383 LearningRate 0.0018 Epoch: 18 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:06:48,069-Speed 5541.68 samples/sec Loss 0.3398 LearningRate 0.0018 Epoch: 18 Global Step: 78410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:06:55,421-Speed 5571.90 samples/sec Loss 0.3408 LearningRate 0.0018 Epoch: 18 Global Step: 78420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:02,913-Speed 5467.36 samples/sec Loss 0.3516 LearningRate 0.0018 Epoch: 18 Global Step: 78430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:10,222-Speed 5605.00 samples/sec Loss 0.3340 LearningRate 0.0018 Epoch: 18 Global Step: 78440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:17,527-Speed 5607.89 samples/sec Loss 0.3463 LearningRate 0.0018 Epoch: 18 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:24,865-Speed 5582.84 samples/sec Loss 0.3419 LearningRate 0.0018 Epoch: 18 Global Step: 78460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:32,163-Speed 5613.18 samples/sec Loss 0.3275 LearningRate 0.0018 Epoch: 18 Global Step: 78470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:39,667-Speed 5460.30 samples/sec Loss 0.3340 LearningRate 0.0017 Epoch: 18 Global Step: 78480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:47,177-Speed 5454.53 samples/sec Loss 0.3473 LearningRate 0.0017 Epoch: 18 Global Step: 78490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:07:54,883-Speed 5316.39 samples/sec Loss 0.3356 LearningRate 0.0017 Epoch: 18 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:08:02,222-Speed 5581.26 samples/sec Loss 0.3470 LearningRate 0.0017 Epoch: 18 Global Step: 78510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:08:09,597-Speed 5554.77 samples/sec Loss 0.3331 LearningRate 0.0017 Epoch: 18 Global Step: 78520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:08:17,424-Speed 5234.59 samples/sec Loss 0.3527 LearningRate 0.0017 Epoch: 18 Global Step: 78530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:08:24,777-Speed 5571.29 samples/sec Loss 0.3440 LearningRate 0.0017 Epoch: 18 Global Step: 78540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:08:32,150-Speed 5555.98 samples/sec Loss 0.3378 LearningRate 0.0017 Epoch: 18 Global Step: 78550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:08:39,675-Speed 5444.28 samples/sec Loss 0.3425 LearningRate 0.0017 Epoch: 18 Global Step: 78560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:08:47,656-Speed 5132.87 samples/sec Loss 0.3392 LearningRate 0.0017 Epoch: 18 Global Step: 78570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:08:55,205-Speed 5426.01 samples/sec Loss 0.3431 LearningRate 0.0017 Epoch: 18 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:02,650-Speed 5502.65 samples/sec Loss 0.3404 LearningRate 0.0017 Epoch: 18 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:10,009-Speed 5566.69 samples/sec Loss 0.3289 LearningRate 0.0017 Epoch: 18 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:17,392-Speed 5548.76 samples/sec Loss 0.3391 LearningRate 0.0017 Epoch: 18 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:24,765-Speed 5556.51 samples/sec Loss 0.3409 LearningRate 0.0016 Epoch: 18 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:32,034-Speed 5635.59 samples/sec Loss 0.3459 LearningRate 0.0016 Epoch: 18 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:39,358-Speed 5593.63 samples/sec Loss 0.3239 LearningRate 0.0016 Epoch: 18 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:46,792-Speed 5510.26 samples/sec Loss 0.3317 LearningRate 0.0016 Epoch: 18 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:09:54,157-Speed 5562.58 samples/sec Loss 0.3337 LearningRate 0.0016 Epoch: 18 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:10:01,533-Speed 5553.61 samples/sec Loss 0.3410 LearningRate 0.0016 Epoch: 18 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:10:09,172-Speed 5363.07 samples/sec Loss 0.3379 LearningRate 0.0016 Epoch: 18 Global Step: 78680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:10:16,843-Speed 5340.20 samples/sec Loss 0.3362 LearningRate 0.0016 Epoch: 18 Global Step: 78690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:10:24,215-Speed 5556.85 samples/sec Loss 0.3419 LearningRate 0.0016 Epoch: 18 Global Step: 78700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:10:31,552-Speed 5583.60 samples/sec Loss 0.3485 LearningRate 0.0016 Epoch: 18 Global Step: 78710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:10:38,851-Speed 5612.42 samples/sec Loss 0.3295 LearningRate 0.0016 Epoch: 18 Global Step: 78720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:10:46,320-Speed 5484.60 samples/sec Loss 0.3396 LearningRate 0.0016 Epoch: 18 Global Step: 78730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:10:53,682-Speed 5564.76 samples/sec Loss 0.3330 LearningRate 0.0016 Epoch: 18 Global Step: 78740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:11:01,170-Speed 5470.82 samples/sec Loss 0.3350 LearningRate 0.0016 Epoch: 18 Global Step: 78750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:11:08,522-Speed 5571.97 samples/sec Loss 0.3324 LearningRate 0.0016 Epoch: 18 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:11:15,849-Speed 5591.52 samples/sec Loss 0.3429 LearningRate 0.0015 Epoch: 18 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:11:23,154-Speed 5607.34 samples/sec Loss 0.3273 LearningRate 0.0015 Epoch: 18 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:11:30,460-Speed 5607.61 samples/sec Loss 0.3304 LearningRate 0.0015 Epoch: 18 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:11:37,941-Speed 5475.97 samples/sec Loss 0.3264 LearningRate 0.0015 Epoch: 18 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:11:45,417-Speed 5479.26 samples/sec Loss 0.3409 LearningRate 0.0015 Epoch: 18 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:11:52,777-Speed 5566.04 samples/sec Loss 0.3337 LearningRate 0.0015 Epoch: 18 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:12:00,120-Speed 5579.19 samples/sec Loss 0.3317 LearningRate 0.0015 Epoch: 18 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:12:07,452-Speed 5587.28 samples/sec Loss 0.3431 LearningRate 0.0015 Epoch: 18 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:12:14,735-Speed 5625.01 samples/sec Loss 0.3353 LearningRate 0.0015 Epoch: 18 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:12:22,294-Speed 5419.43 samples/sec Loss 0.3379 LearningRate 0.0015 Epoch: 18 Global Step: 78860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:12:29,741-Speed 5501.10 samples/sec Loss 0.3314 LearningRate 0.0015 Epoch: 18 Global Step: 78870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:12:37,162-Speed 5520.52 samples/sec Loss 0.3392 LearningRate 0.0015 Epoch: 18 Global Step: 78880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:12:44,671-Speed 5455.64 samples/sec Loss 0.3329 LearningRate 0.0015 Epoch: 18 Global Step: 78890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:12:52,445-Speed 5269.36 samples/sec Loss 0.3359 LearningRate 0.0015 Epoch: 18 Global Step: 78900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:00,495-Speed 5088.59 samples/sec Loss 0.3369 LearningRate 0.0015 Epoch: 18 Global Step: 78910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:08,201-Speed 5316.55 samples/sec Loss 0.3241 LearningRate 0.0014 Epoch: 18 Global Step: 78920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:15,571-Speed 5558.82 samples/sec Loss 0.3292 LearningRate 0.0014 Epoch: 18 Global Step: 78930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:22,968-Speed 5537.88 samples/sec Loss 0.3341 LearningRate 0.0014 Epoch: 18 Global Step: 78940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:30,366-Speed 5537.44 samples/sec Loss 0.3313 LearningRate 0.0014 Epoch: 18 Global Step: 78950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:37,715-Speed 5574.90 samples/sec Loss 0.3252 LearningRate 0.0014 Epoch: 18 Global Step: 78960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:45,070-Speed 5569.31 samples/sec Loss 0.3324 LearningRate 0.0014 Epoch: 18 Global Step: 78970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:13:52,538-Speed 5486.81 samples/sec Loss 0.3268 LearningRate 0.0014 Epoch: 18 Global Step: 78980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:00,123-Speed 5401.42 samples/sec Loss 0.3366 LearningRate 0.0014 Epoch: 18 Global Step: 78990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:07,551-Speed 5514.78 samples/sec Loss 0.3210 LearningRate 0.0014 Epoch: 18 Global Step: 79000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:15,111-Speed 5419.00 samples/sec Loss 0.3445 LearningRate 0.0014 Epoch: 18 Global Step: 79010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:22,556-Speed 5502.03 samples/sec Loss 0.3278 LearningRate 0.0014 Epoch: 18 Global Step: 79020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:30,039-Speed 5474.70 samples/sec Loss 0.3309 LearningRate 0.0014 Epoch: 18 Global Step: 79030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:37,358-Speed 5596.99 samples/sec Loss 0.3474 LearningRate 0.0014 Epoch: 18 Global Step: 79040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:44,811-Speed 5496.78 samples/sec Loss 0.3221 LearningRate 0.0014 Epoch: 18 Global Step: 79050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:52,127-Speed 5599.16 samples/sec Loss 0.3405 LearningRate 0.0014 Epoch: 18 Global Step: 79060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:14:59,595-Speed 5486.02 samples/sec Loss 0.3264 LearningRate 0.0014 Epoch: 18 Global Step: 79070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:15:06,975-Speed 5550.53 samples/sec Loss 0.3325 LearningRate 0.0013 Epoch: 18 Global Step: 79080 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:15:14,400-Speed 5517.13 samples/sec Loss 0.3372 LearningRate 0.0013 Epoch: 18 Global Step: 79090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:15:21,818-Speed 5523.04 samples/sec Loss 0.3279 LearningRate 0.0013 Epoch: 18 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:15:29,501-Speed 5331.75 samples/sec Loss 0.3300 LearningRate 0.0013 Epoch: 18 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:15:37,104-Speed 5388.01 samples/sec Loss 0.3335 LearningRate 0.0013 Epoch: 18 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:15:44,549-Speed 5502.45 samples/sec Loss 0.3448 LearningRate 0.0013 Epoch: 18 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:15:52,034-Speed 5473.00 samples/sec Loss 0.3366 LearningRate 0.0013 Epoch: 18 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:15:59,439-Speed 5532.21 samples/sec Loss 0.3443 LearningRate 0.0013 Epoch: 18 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:16:06,928-Speed 5470.83 samples/sec Loss 0.3267 LearningRate 0.0013 Epoch: 18 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:16:14,541-Speed 5380.75 samples/sec Loss 0.3177 LearningRate 0.0013 Epoch: 18 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:16:22,187-Speed 5358.08 samples/sec Loss 0.3207 LearningRate 0.0013 Epoch: 18 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:16:29,712-Speed 5443.17 samples/sec Loss 0.3288 LearningRate 0.0013 Epoch: 18 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:16:37,100-Speed 5544.94 samples/sec Loss 0.3324 LearningRate 0.0013 Epoch: 18 Global Step: 79200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:16:44,393-Speed 5617.58 samples/sec Loss 0.3428 LearningRate 0.0013 Epoch: 18 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:16:51,712-Speed 5596.85 samples/sec Loss 0.3394 LearningRate 0.0013 Epoch: 18 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:16:59,229-Speed 5450.38 samples/sec Loss 0.3249 LearningRate 0.0013 Epoch: 18 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:17:06,623-Speed 5540.35 samples/sec Loss 0.3228 LearningRate 0.0013 Epoch: 18 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:17:14,024-Speed 5534.97 samples/sec Loss 0.3288 LearningRate 0.0012 Epoch: 18 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:17:21,735-Speed 5312.73 samples/sec Loss 0.3310 LearningRate 0.0012 Epoch: 18 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:18:14,503-Speed 776.25 samples/sec Loss 0.3032 LearningRate 0.0012 Epoch: 19 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:18:21,863-Speed 5566.05 samples/sec Loss 0.2299 LearningRate 0.0012 Epoch: 19 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:18:29,366-Speed 5460.01 samples/sec Loss 0.2127 LearningRate 0.0012 Epoch: 19 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:18:36,813-Speed 5501.72 samples/sec Loss 0.2148 LearningRate 0.0012 Epoch: 19 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:18:44,166-Speed 5571.51 samples/sec Loss 0.2243 LearningRate 0.0012 Epoch: 19 Global Step: 79310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:18:51,761-Speed 5393.33 samples/sec Loss 0.2130 LearningRate 0.0012 Epoch: 19 Global Step: 79320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:18:59,136-Speed 5555.35 samples/sec Loss 0.2181 LearningRate 0.0012 Epoch: 19 Global Step: 79330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:19:06,570-Speed 5510.58 samples/sec Loss 0.2244 LearningRate 0.0012 Epoch: 19 Global Step: 79340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:19:13,906-Speed 5584.06 samples/sec Loss 0.2240 LearningRate 0.0012 Epoch: 19 Global Step: 79350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:19:21,239-Speed 5586.91 samples/sec Loss 0.2109 LearningRate 0.0012 Epoch: 19 Global Step: 79360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:19:28,577-Speed 5582.79 samples/sec Loss 0.2274 LearningRate 0.0012 Epoch: 19 Global Step: 79370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:19:35,918-Speed 5579.93 samples/sec Loss 0.2312 LearningRate 0.0012 Epoch: 19 Global Step: 79380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:19:43,304-Speed 5546.92 samples/sec Loss 0.2176 LearningRate 0.0012 Epoch: 19 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:19:50,903-Speed 5391.20 samples/sec Loss 0.2217 LearningRate 0.0012 Epoch: 19 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:19:58,259-Speed 5568.68 samples/sec Loss 0.2082 LearningRate 0.0012 Epoch: 19 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:05,793-Speed 5437.29 samples/sec Loss 0.2157 LearningRate 0.0011 Epoch: 19 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:13,380-Speed 5399.83 samples/sec Loss 0.2228 LearningRate 0.0011 Epoch: 19 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:21,181-Speed 5251.32 samples/sec Loss 0.2219 LearningRate 0.0011 Epoch: 19 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:29,214-Speed 5100.38 samples/sec Loss 0.2253 LearningRate 0.0011 Epoch: 19 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:37,028-Speed 5242.61 samples/sec Loss 0.2174 LearningRate 0.0011 Epoch: 19 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:44,744-Speed 5309.47 samples/sec Loss 0.2139 LearningRate 0.0011 Epoch: 19 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:52,182-Speed 5507.94 samples/sec Loss 0.2129 LearningRate 0.0011 Epoch: 19 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:20:59,476-Speed 5616.53 samples/sec Loss 0.2221 LearningRate 0.0011 Epoch: 19 Global Step: 79490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:21:06,825-Speed 5574.22 samples/sec Loss 0.2168 LearningRate 0.0011 Epoch: 19 Global Step: 79500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:21:14,220-Speed 5540.33 samples/sec Loss 0.2180 LearningRate 0.0011 Epoch: 19 Global Step: 79510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:21:21,533-Speed 5602.19 samples/sec Loss 0.2116 LearningRate 0.0011 Epoch: 19 Global Step: 79520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:21:28,956-Speed 5518.43 samples/sec Loss 0.2156 LearningRate 0.0011 Epoch: 19 Global Step: 79530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:21:36,331-Speed 5555.06 samples/sec Loss 0.2131 LearningRate 0.0011 Epoch: 19 Global Step: 79540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:21:43,915-Speed 5401.27 samples/sec Loss 0.2162 LearningRate 0.0011 Epoch: 19 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:21:51,450-Speed 5436.80 samples/sec Loss 0.2147 LearningRate 0.0011 Epoch: 19 Global Step: 79560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:21:58,996-Speed 5429.01 samples/sec Loss 0.2173 LearningRate 0.0011 Epoch: 19 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:06,427-Speed 5512.80 samples/sec Loss 0.2251 LearningRate 0.0011 Epoch: 19 Global Step: 79580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:13,825-Speed 5538.00 samples/sec Loss 0.2119 LearningRate 0.0011 Epoch: 19 Global Step: 79590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:21,175-Speed 5573.70 samples/sec Loss 0.2137 LearningRate 0.0010 Epoch: 19 Global Step: 79600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:28,607-Speed 5511.77 samples/sec Loss 0.2130 LearningRate 0.0010 Epoch: 19 Global Step: 79610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:36,051-Speed 5503.51 samples/sec Loss 0.2201 LearningRate 0.0010 Epoch: 19 Global Step: 79620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:43,599-Speed 5427.59 samples/sec Loss 0.2170 LearningRate 0.0010 Epoch: 19 Global Step: 79630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:50,997-Speed 5537.26 samples/sec Loss 0.2183 LearningRate 0.0010 Epoch: 19 Global Step: 79640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:22:58,340-Speed 5579.35 samples/sec Loss 0.2168 LearningRate 0.0010 Epoch: 19 Global Step: 79650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:05,836-Speed 5465.60 samples/sec Loss 0.2215 LearningRate 0.0010 Epoch: 19 Global Step: 79660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:13,421-Speed 5400.55 samples/sec Loss 0.2221 LearningRate 0.0010 Epoch: 19 Global Step: 79670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:20,780-Speed 5566.75 samples/sec Loss 0.2193 LearningRate 0.0010 Epoch: 19 Global Step: 79680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:28,398-Speed 5377.74 samples/sec Loss 0.2199 LearningRate 0.0010 Epoch: 19 Global Step: 79690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:36,045-Speed 5357.30 samples/sec Loss 0.2142 LearningRate 0.0010 Epoch: 19 Global Step: 79700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:43,526-Speed 5476.05 samples/sec Loss 0.2269 LearningRate 0.0010 Epoch: 19 Global Step: 79710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:50,940-Speed 5525.08 samples/sec Loss 0.2172 LearningRate 0.0010 Epoch: 19 Global Step: 79720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:23:58,231-Speed 5619.04 samples/sec Loss 0.2162 LearningRate 0.0010 Epoch: 19 Global Step: 79730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:24:05,557-Speed 5592.20 samples/sec Loss 0.2255 LearningRate 0.0010 Epoch: 19 Global Step: 79740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:24:12,984-Speed 5515.40 samples/sec Loss 0.2263 LearningRate 0.0010 Epoch: 19 Global Step: 79750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:24:20,366-Speed 5549.68 samples/sec Loss 0.2215 LearningRate 0.0010 Epoch: 19 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:24:27,923-Speed 5420.86 samples/sec Loss 0.2082 LearningRate 0.0010 Epoch: 19 Global Step: 79770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:24:35,275-Speed 5571.55 samples/sec Loss 0.2146 LearningRate 0.0010 Epoch: 19 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:24:42,692-Speed 5523.85 samples/sec Loss 0.2247 LearningRate 0.0009 Epoch: 19 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:24:49,995-Speed 5609.67 samples/sec Loss 0.2137 LearningRate 0.0009 Epoch: 19 Global Step: 79800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:24:57,307-Speed 5602.87 samples/sec Loss 0.2196 LearningRate 0.0009 Epoch: 19 Global Step: 79810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:25:04,611-Speed 5608.52 samples/sec Loss 0.2251 LearningRate 0.0009 Epoch: 19 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:25:11,966-Speed 5569.95 samples/sec Loss 0.2244 LearningRate 0.0009 Epoch: 19 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:25:19,326-Speed 5565.56 samples/sec Loss 0.2130 LearningRate 0.0009 Epoch: 19 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:25:26,569-Speed 5656.22 samples/sec Loss 0.2145 LearningRate 0.0009 Epoch: 19 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:25:33,818-Speed 5650.96 samples/sec Loss 0.2207 LearningRate 0.0009 Epoch: 19 Global Step: 79860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:25:41,138-Speed 5597.09 samples/sec Loss 0.2178 LearningRate 0.0009 Epoch: 19 Global Step: 79870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:25:48,527-Speed 5544.08 samples/sec Loss 0.2152 LearningRate 0.0009 Epoch: 19 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:25:55,875-Speed 5575.71 samples/sec Loss 0.2138 LearningRate 0.0009 Epoch: 19 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:03,213-Speed 5582.11 samples/sec Loss 0.2141 LearningRate 0.0009 Epoch: 19 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:10,601-Speed 5544.82 samples/sec Loss 0.2253 LearningRate 0.0009 Epoch: 19 Global Step: 79910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:18,105-Speed 5459.41 samples/sec Loss 0.2187 LearningRate 0.0009 Epoch: 19 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:25,509-Speed 5532.92 samples/sec Loss 0.2063 LearningRate 0.0009 Epoch: 19 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:32,941-Speed 5512.60 samples/sec Loss 0.2142 LearningRate 0.0009 Epoch: 19 Global Step: 79940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:40,204-Speed 5640.60 samples/sec Loss 0.2176 LearningRate 0.0009 Epoch: 19 Global Step: 79950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:47,482-Speed 5627.84 samples/sec Loss 0.2176 LearningRate 0.0009 Epoch: 19 Global Step: 79960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:26:54,766-Speed 5624.60 samples/sec Loss 0.2113 LearningRate 0.0009 Epoch: 19 Global Step: 79970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:27:02,038-Speed 5633.83 samples/sec Loss 0.2231 LearningRate 0.0008 Epoch: 19 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:27:09,392-Speed 5569.60 samples/sec Loss 0.2119 LearningRate 0.0008 Epoch: 19 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:27:16,957-Speed 5415.74 samples/sec Loss 0.2178 LearningRate 0.0008 Epoch: 19 Global Step: 80000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:28:03,522-[lfw][80000]XNorm: 21.787054 Training: 2022-01-17 16:28:03,523-[lfw][80000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-01-17 16:28:03,523-[lfw][80000]Accuracy-Highest: 0.99833 Training: 2022-01-17 16:28:57,659-[cfp_fp][80000]XNorm: 21.603065 Training: 2022-01-17 16:28:57,659-[cfp_fp][80000]Accuracy-Flip: 0.99300+-0.00358 Training: 2022-01-17 16:28:57,660-[cfp_fp][80000]Accuracy-Highest: 0.99329 Training: 2022-01-17 16:29:44,183-[agedb_30][80000]XNorm: 22.661619 Training: 2022-01-17 16:29:44,184-[agedb_30][80000]Accuracy-Flip: 0.98467+-0.00690 Training: 2022-01-17 16:29:44,184-[agedb_30][80000]Accuracy-Highest: 0.98467 Training: 2022-01-17 16:29:51,457-Speed 265.11 samples/sec Loss 0.2218 LearningRate 0.0008 Epoch: 19 Global Step: 80010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:29:58,779-Speed 5594.67 samples/sec Loss 0.2082 LearningRate 0.0008 Epoch: 19 Global Step: 80020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:30:06,160-Speed 5550.38 samples/sec Loss 0.2203 LearningRate 0.0008 Epoch: 19 Global Step: 80030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:30:13,446-Speed 5622.93 samples/sec Loss 0.2287 LearningRate 0.0008 Epoch: 19 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:30:20,796-Speed 5573.91 samples/sec Loss 0.2235 LearningRate 0.0008 Epoch: 19 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:30:28,169-Speed 5556.04 samples/sec Loss 0.2232 LearningRate 0.0008 Epoch: 19 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:30:35,569-Speed 5536.41 samples/sec Loss 0.2107 LearningRate 0.0008 Epoch: 19 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:30:42,870-Speed 5610.60 samples/sec Loss 0.2160 LearningRate 0.0008 Epoch: 19 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:30:50,169-Speed 5612.34 samples/sec Loss 0.2156 LearningRate 0.0008 Epoch: 19 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:30:57,590-Speed 5520.83 samples/sec Loss 0.2111 LearningRate 0.0008 Epoch: 19 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:31:04,974-Speed 5547.66 samples/sec Loss 0.2187 LearningRate 0.0008 Epoch: 19 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:31:12,377-Speed 5533.51 samples/sec Loss 0.2223 LearningRate 0.0008 Epoch: 19 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:31:19,766-Speed 5544.50 samples/sec Loss 0.2131 LearningRate 0.0008 Epoch: 19 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:31:27,278-Speed 5453.60 samples/sec Loss 0.2115 LearningRate 0.0008 Epoch: 19 Global Step: 80140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:31:34,697-Speed 5521.44 samples/sec Loss 0.2229 LearningRate 0.0008 Epoch: 19 Global Step: 80150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:31:42,144-Speed 5501.42 samples/sec Loss 0.2165 LearningRate 0.0008 Epoch: 19 Global Step: 80160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:31:49,429-Speed 5623.29 samples/sec Loss 0.2163 LearningRate 0.0008 Epoch: 19 Global Step: 80170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:31:56,731-Speed 5610.01 samples/sec Loss 0.2167 LearningRate 0.0008 Epoch: 19 Global Step: 80180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:04,041-Speed 5605.18 samples/sec Loss 0.2174 LearningRate 0.0007 Epoch: 19 Global Step: 80190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:11,426-Speed 5547.06 samples/sec Loss 0.2155 LearningRate 0.0007 Epoch: 19 Global Step: 80200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:18,808-Speed 5549.62 samples/sec Loss 0.2169 LearningRate 0.0007 Epoch: 19 Global Step: 80210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:26,200-Speed 5542.83 samples/sec Loss 0.2248 LearningRate 0.0007 Epoch: 19 Global Step: 80220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:33,593-Speed 5540.36 samples/sec Loss 0.2232 LearningRate 0.0007 Epoch: 19 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:40,990-Speed 5538.49 samples/sec Loss 0.2177 LearningRate 0.0007 Epoch: 19 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:48,355-Speed 5562.00 samples/sec Loss 0.2157 LearningRate 0.0007 Epoch: 19 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:32:55,895-Speed 5433.36 samples/sec Loss 0.2209 LearningRate 0.0007 Epoch: 19 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:03,367-Speed 5482.23 samples/sec Loss 0.2163 LearningRate 0.0007 Epoch: 19 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:10,829-Speed 5490.09 samples/sec Loss 0.2148 LearningRate 0.0007 Epoch: 19 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:18,240-Speed 5527.27 samples/sec Loss 0.2180 LearningRate 0.0007 Epoch: 19 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:25,605-Speed 5563.10 samples/sec Loss 0.2172 LearningRate 0.0007 Epoch: 19 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:32,957-Speed 5572.46 samples/sec Loss 0.2238 LearningRate 0.0007 Epoch: 19 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:40,500-Speed 5430.99 samples/sec Loss 0.2092 LearningRate 0.0007 Epoch: 19 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:48,098-Speed 5391.66 samples/sec Loss 0.2203 LearningRate 0.0007 Epoch: 19 Global Step: 80330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:33:55,448-Speed 5573.65 samples/sec Loss 0.2096 LearningRate 0.0007 Epoch: 19 Global Step: 80340 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 16:34:02,792-Speed 5577.80 samples/sec Loss 0.2184 LearningRate 0.0007 Epoch: 19 Global Step: 80350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:34:10,071-Speed 5628.56 samples/sec Loss 0.2159 LearningRate 0.0007 Epoch: 19 Global Step: 80360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:34:17,399-Speed 5590.31 samples/sec Loss 0.2125 LearningRate 0.0007 Epoch: 19 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:34:24,705-Speed 5607.11 samples/sec Loss 0.2116 LearningRate 0.0007 Epoch: 19 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:34:32,063-Speed 5567.63 samples/sec Loss 0.2246 LearningRate 0.0007 Epoch: 19 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:34:39,429-Speed 5561.51 samples/sec Loss 0.2096 LearningRate 0.0007 Epoch: 19 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:34:47,097-Speed 5342.25 samples/sec Loss 0.2156 LearningRate 0.0007 Epoch: 19 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:34:54,774-Speed 5336.47 samples/sec Loss 0.2039 LearningRate 0.0006 Epoch: 19 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:35:02,147-Speed 5556.03 samples/sec Loss 0.2258 LearningRate 0.0006 Epoch: 19 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:35:09,592-Speed 5502.10 samples/sec Loss 0.2150 LearningRate 0.0006 Epoch: 19 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:35:17,030-Speed 5507.52 samples/sec Loss 0.2153 LearningRate 0.0006 Epoch: 19 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:35:24,422-Speed 5542.28 samples/sec Loss 0.2181 LearningRate 0.0006 Epoch: 19 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:35:31,863-Speed 5505.55 samples/sec Loss 0.2170 LearningRate 0.0006 Epoch: 19 Global Step: 80470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:35:39,230-Speed 5560.57 samples/sec Loss 0.2083 LearningRate 0.0006 Epoch: 19 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:35:46,885-Speed 5351.33 samples/sec Loss 0.2228 LearningRate 0.0006 Epoch: 19 Global Step: 80490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:35:54,253-Speed 5560.93 samples/sec Loss 0.2131 LearningRate 0.0006 Epoch: 19 Global Step: 80500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:36:01,571-Speed 5598.00 samples/sec Loss 0.2159 LearningRate 0.0006 Epoch: 19 Global Step: 80510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:36:08,893-Speed 5595.41 samples/sec Loss 0.2105 LearningRate 0.0006 Epoch: 19 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:36:16,579-Speed 5329.84 samples/sec Loss 0.2146 LearningRate 0.0006 Epoch: 19 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:36:24,211-Speed 5367.22 samples/sec Loss 0.2111 LearningRate 0.0006 Epoch: 19 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:36:31,952-Speed 5292.52 samples/sec Loss 0.2278 LearningRate 0.0006 Epoch: 19 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:36:39,704-Speed 5283.88 samples/sec Loss 0.2209 LearningRate 0.0006 Epoch: 19 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:36:47,445-Speed 5292.73 samples/sec Loss 0.2200 LearningRate 0.0006 Epoch: 19 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:36:55,151-Speed 5315.36 samples/sec Loss 0.2127 LearningRate 0.0006 Epoch: 19 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:37:03,145-Speed 5124.79 samples/sec Loss 0.2137 LearningRate 0.0006 Epoch: 19 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:37:10,844-Speed 5320.79 samples/sec Loss 0.2220 LearningRate 0.0006 Epoch: 19 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:37:18,693-Speed 5218.98 samples/sec Loss 0.2131 LearningRate 0.0006 Epoch: 19 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:37:26,591-Speed 5187.28 samples/sec Loss 0.2153 LearningRate 0.0006 Epoch: 19 Global Step: 80620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:37:33,964-Speed 5556.16 samples/sec Loss 0.2174 LearningRate 0.0006 Epoch: 19 Global Step: 80630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:37:41,287-Speed 5593.46 samples/sec Loss 0.2200 LearningRate 0.0006 Epoch: 19 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:37:48,552-Speed 5639.25 samples/sec Loss 0.2261 LearningRate 0.0006 Epoch: 19 Global Step: 80650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:37:55,896-Speed 5578.85 samples/sec Loss 0.2205 LearningRate 0.0005 Epoch: 19 Global Step: 80660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:03,344-Speed 5500.08 samples/sec Loss 0.2219 LearningRate 0.0005 Epoch: 19 Global Step: 80670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:10,896-Speed 5425.07 samples/sec Loss 0.2294 LearningRate 0.0005 Epoch: 19 Global Step: 80680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:18,292-Speed 5539.05 samples/sec Loss 0.2130 LearningRate 0.0005 Epoch: 19 Global Step: 80690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:25,786-Speed 5466.55 samples/sec Loss 0.2144 LearningRate 0.0005 Epoch: 19 Global Step: 80700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:33,125-Speed 5581.39 samples/sec Loss 0.2172 LearningRate 0.0005 Epoch: 19 Global Step: 80710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:40,490-Speed 5562.62 samples/sec Loss 0.2047 LearningRate 0.0005 Epoch: 19 Global Step: 80720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:47,843-Speed 5571.85 samples/sec Loss 0.2255 LearningRate 0.0005 Epoch: 19 Global Step: 80730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:38:55,184-Speed 5580.42 samples/sec Loss 0.2206 LearningRate 0.0005 Epoch: 19 Global Step: 80740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:02,582-Speed 5537.71 samples/sec Loss 0.2112 LearningRate 0.0005 Epoch: 19 Global Step: 80750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:09,964-Speed 5549.09 samples/sec Loss 0.2119 LearningRate 0.0005 Epoch: 19 Global Step: 80760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:17,670-Speed 5316.37 samples/sec Loss 0.2261 LearningRate 0.0005 Epoch: 19 Global Step: 80770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:25,196-Speed 5443.56 samples/sec Loss 0.2185 LearningRate 0.0005 Epoch: 19 Global Step: 80780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:32,549-Speed 5570.87 samples/sec Loss 0.2211 LearningRate 0.0005 Epoch: 19 Global Step: 80790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:39,876-Speed 5591.36 samples/sec Loss 0.2196 LearningRate 0.0005 Epoch: 19 Global Step: 80800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:47,346-Speed 5484.49 samples/sec Loss 0.2155 LearningRate 0.0005 Epoch: 19 Global Step: 80810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:39:54,677-Speed 5587.41 samples/sec Loss 0.2081 LearningRate 0.0005 Epoch: 19 Global Step: 80820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:40:02,054-Speed 5553.42 samples/sec Loss 0.2181 LearningRate 0.0005 Epoch: 19 Global Step: 80830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:40:09,453-Speed 5536.99 samples/sec Loss 0.2084 LearningRate 0.0005 Epoch: 19 Global Step: 80840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:40:16,878-Speed 5517.50 samples/sec Loss 0.2144 LearningRate 0.0005 Epoch: 19 Global Step: 80850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:40:24,225-Speed 5576.26 samples/sec Loss 0.2184 LearningRate 0.0005 Epoch: 19 Global Step: 80860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:40:31,601-Speed 5554.15 samples/sec Loss 0.2104 LearningRate 0.0005 Epoch: 19 Global Step: 80870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:40:39,154-Speed 5423.68 samples/sec Loss 0.2160 LearningRate 0.0005 Epoch: 19 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:40:46,450-Speed 5615.31 samples/sec Loss 0.2079 LearningRate 0.0005 Epoch: 19 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:40:53,830-Speed 5551.49 samples/sec Loss 0.2179 LearningRate 0.0005 Epoch: 19 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:01,282-Speed 5497.11 samples/sec Loss 0.2189 LearningRate 0.0005 Epoch: 19 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:08,746-Speed 5488.47 samples/sec Loss 0.2180 LearningRate 0.0005 Epoch: 19 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:16,181-Speed 5509.81 samples/sec Loss 0.2062 LearningRate 0.0004 Epoch: 19 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:23,760-Speed 5404.93 samples/sec Loss 0.2187 LearningRate 0.0004 Epoch: 19 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:31,078-Speed 5598.01 samples/sec Loss 0.2190 LearningRate 0.0004 Epoch: 19 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:38,406-Speed 5591.06 samples/sec Loss 0.2092 LearningRate 0.0004 Epoch: 19 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:45,852-Speed 5501.81 samples/sec Loss 0.2159 LearningRate 0.0004 Epoch: 19 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:41:53,323-Speed 5483.07 samples/sec Loss 0.2078 LearningRate 0.0004 Epoch: 19 Global Step: 80980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:00,698-Speed 5555.34 samples/sec Loss 0.2142 LearningRate 0.0004 Epoch: 19 Global Step: 80990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:08,003-Speed 5607.28 samples/sec Loss 0.2159 LearningRate 0.0004 Epoch: 19 Global Step: 81000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:15,351-Speed 5575.02 samples/sec Loss 0.2121 LearningRate 0.0004 Epoch: 19 Global Step: 81010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:22,858-Speed 5457.28 samples/sec Loss 0.2144 LearningRate 0.0004 Epoch: 19 Global Step: 81020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:30,196-Speed 5582.97 samples/sec Loss 0.2200 LearningRate 0.0004 Epoch: 19 Global Step: 81030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:37,614-Speed 5522.63 samples/sec Loss 0.2194 LearningRate 0.0004 Epoch: 19 Global Step: 81040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:44,950-Speed 5583.43 samples/sec Loss 0.2137 LearningRate 0.0004 Epoch: 19 Global Step: 81050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:52,327-Speed 5553.72 samples/sec Loss 0.2088 LearningRate 0.0004 Epoch: 19 Global Step: 81060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:42:59,728-Speed 5535.12 samples/sec Loss 0.2173 LearningRate 0.0004 Epoch: 19 Global Step: 81070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:43:07,186-Speed 5492.65 samples/sec Loss 0.2155 LearningRate 0.0004 Epoch: 19 Global Step: 81080 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 16:43:14,543-Speed 5568.63 samples/sec Loss 0.2161 LearningRate 0.0004 Epoch: 19 Global Step: 81090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:43:22,034-Speed 5468.58 samples/sec Loss 0.2150 LearningRate 0.0004 Epoch: 19 Global Step: 81100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:43:29,364-Speed 5588.56 samples/sec Loss 0.2164 LearningRate 0.0004 Epoch: 19 Global Step: 81110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:43:36,746-Speed 5549.47 samples/sec Loss 0.2135 LearningRate 0.0004 Epoch: 19 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:43:44,181-Speed 5510.11 samples/sec Loss 0.2160 LearningRate 0.0004 Epoch: 19 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:43:51,473-Speed 5617.77 samples/sec Loss 0.2246 LearningRate 0.0004 Epoch: 19 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:43:58,837-Speed 5563.38 samples/sec Loss 0.2114 LearningRate 0.0004 Epoch: 19 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:44:06,313-Speed 5479.95 samples/sec Loss 0.2183 LearningRate 0.0004 Epoch: 19 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:44:13,620-Speed 5606.15 samples/sec Loss 0.2142 LearningRate 0.0004 Epoch: 19 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:44:20,978-Speed 5567.45 samples/sec Loss 0.2158 LearningRate 0.0004 Epoch: 19 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:44:28,440-Speed 5490.43 samples/sec Loss 0.2184 LearningRate 0.0004 Epoch: 19 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:44:35,759-Speed 5597.39 samples/sec Loss 0.2217 LearningRate 0.0004 Epoch: 19 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:44:43,103-Speed 5577.96 samples/sec Loss 0.2069 LearningRate 0.0004 Epoch: 19 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:44:50,615-Speed 5453.23 samples/sec Loss 0.2299 LearningRate 0.0003 Epoch: 19 Global Step: 81220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:44:57,997-Speed 5549.71 samples/sec Loss 0.2281 LearningRate 0.0003 Epoch: 19 Global Step: 81230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:45:05,493-Speed 5464.85 samples/sec Loss 0.2145 LearningRate 0.0003 Epoch: 19 Global Step: 81240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:45:13,105-Speed 5382.20 samples/sec Loss 0.2161 LearningRate 0.0003 Epoch: 19 Global Step: 81250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:45:20,672-Speed 5413.88 samples/sec Loss 0.2127 LearningRate 0.0003 Epoch: 19 Global Step: 81260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:45:28,116-Speed 5503.30 samples/sec Loss 0.2206 LearningRate 0.0003 Epoch: 19 Global Step: 81270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 16:45:35,831-Speed 5309.70 samples/sec Loss 0.2176 LearningRate 0.0003 Epoch: 19 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:45:43,368-Speed 5435.04 samples/sec Loss 0.2191 LearningRate 0.0003 Epoch: 19 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:45:50,680-Speed 5602.35 samples/sec Loss 0.2183 LearningRate 0.0003 Epoch: 19 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:45:58,063-Speed 5548.84 samples/sec Loss 0.2141 LearningRate 0.0003 Epoch: 19 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 16:46:05,520-Speed 5493.83 samples/sec Loss 0.2171 LearningRate 0.0003 Epoch: 19 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:46:12,988-Speed 5485.38 samples/sec Loss 0.2183 LearningRate 0.0003 Epoch: 19 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:46:20,335-Speed 5575.83 samples/sec Loss 0.2225 LearningRate 0.0003 Epoch: 19 Global Step: 81340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:46:27,757-Speed 5519.62 samples/sec Loss 0.2139 LearningRate 0.0003 Epoch: 19 Global Step: 81350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:46:35,103-Speed 5577.03 samples/sec Loss 0.2108 LearningRate 0.0003 Epoch: 19 Global Step: 81360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:46:42,415-Speed 5601.86 samples/sec Loss 0.2169 LearningRate 0.0003 Epoch: 19 Global Step: 81370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:46:49,790-Speed 5554.69 samples/sec Loss 0.2179 LearningRate 0.0003 Epoch: 19 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:46:57,198-Speed 5531.46 samples/sec Loss 0.2208 LearningRate 0.0003 Epoch: 19 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:04,543-Speed 5577.27 samples/sec Loss 0.2213 LearningRate 0.0003 Epoch: 19 Global Step: 81400 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:11,897-Speed 5570.43 samples/sec Loss 0.2151 LearningRate 0.0003 Epoch: 19 Global Step: 81410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:19,289-Speed 5541.45 samples/sec Loss 0.2121 LearningRate 0.0003 Epoch: 19 Global Step: 81420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:26,648-Speed 5567.07 samples/sec Loss 0.2177 LearningRate 0.0003 Epoch: 19 Global Step: 81430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:34,114-Speed 5486.90 samples/sec Loss 0.2206 LearningRate 0.0003 Epoch: 19 Global Step: 81440 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:41,491-Speed 5553.14 samples/sec Loss 0.2188 LearningRate 0.0003 Epoch: 19 Global Step: 81450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:48,835-Speed 5577.76 samples/sec Loss 0.2170 LearningRate 0.0003 Epoch: 19 Global Step: 81460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:47:56,096-Speed 5642.05 samples/sec Loss 0.2117 LearningRate 0.0003 Epoch: 19 Global Step: 81470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:03,390-Speed 5616.52 samples/sec Loss 0.2054 LearningRate 0.0003 Epoch: 19 Global Step: 81480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:10,847-Speed 5493.54 samples/sec Loss 0.2153 LearningRate 0.0003 Epoch: 19 Global Step: 81490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:18,147-Speed 5611.98 samples/sec Loss 0.2180 LearningRate 0.0003 Epoch: 19 Global Step: 81500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:25,408-Speed 5641.72 samples/sec Loss 0.2207 LearningRate 0.0003 Epoch: 19 Global Step: 81510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:32,676-Speed 5637.24 samples/sec Loss 0.2232 LearningRate 0.0003 Epoch: 19 Global Step: 81520 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:39,976-Speed 5611.22 samples/sec Loss 0.2154 LearningRate 0.0003 Epoch: 19 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:47,322-Speed 5577.04 samples/sec Loss 0.2127 LearningRate 0.0003 Epoch: 19 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:48:54,815-Speed 5467.17 samples/sec Loss 0.2175 LearningRate 0.0003 Epoch: 19 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:49:02,286-Speed 5483.10 samples/sec Loss 0.2167 LearningRate 0.0003 Epoch: 19 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:49:09,575-Speed 5620.74 samples/sec Loss 0.2204 LearningRate 0.0002 Epoch: 19 Global Step: 81570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:49:16,861-Speed 5622.48 samples/sec Loss 0.2247 LearningRate 0.0002 Epoch: 19 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:49:24,180-Speed 5596.74 samples/sec Loss 0.2126 LearningRate 0.0002 Epoch: 19 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:49:31,511-Speed 5588.14 samples/sec Loss 0.2163 LearningRate 0.0002 Epoch: 19 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:49:38,832-Speed 5596.16 samples/sec Loss 0.2023 LearningRate 0.0002 Epoch: 19 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:49:46,165-Speed 5586.02 samples/sec Loss 0.2141 LearningRate 0.0002 Epoch: 19 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:49:53,505-Speed 5581.31 samples/sec Loss 0.2116 LearningRate 0.0002 Epoch: 19 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:50:00,839-Speed 5585.53 samples/sec Loss 0.2106 LearningRate 0.0002 Epoch: 19 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:50:08,243-Speed 5533.37 samples/sec Loss 0.2186 LearningRate 0.0002 Epoch: 19 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:50:16,435-Speed 5000.50 samples/sec Loss 0.2138 LearningRate 0.0002 Epoch: 19 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:50:24,475-Speed 5095.15 samples/sec Loss 0.2076 LearningRate 0.0002 Epoch: 19 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:50:32,502-Speed 5103.87 samples/sec Loss 0.2205 LearningRate 0.0002 Epoch: 19 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:50:39,834-Speed 5587.11 samples/sec Loss 0.2145 LearningRate 0.0002 Epoch: 19 Global Step: 81690 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:50:47,272-Speed 5507.78 samples/sec Loss 0.2149 LearningRate 0.0002 Epoch: 19 Global Step: 81700 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:50:54,828-Speed 5421.28 samples/sec Loss 0.2182 LearningRate 0.0002 Epoch: 19 Global Step: 81710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:51:02,179-Speed 5572.85 samples/sec Loss 0.2133 LearningRate 0.0002 Epoch: 19 Global Step: 81720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:51:09,551-Speed 5557.01 samples/sec Loss 0.2175 LearningRate 0.0002 Epoch: 19 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:51:16,884-Speed 5586.65 samples/sec Loss 0.2222 LearningRate 0.0002 Epoch: 19 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:51:24,297-Speed 5526.69 samples/sec Loss 0.2114 LearningRate 0.0002 Epoch: 19 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:51:31,781-Speed 5473.14 samples/sec Loss 0.2089 LearningRate 0.0002 Epoch: 19 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:51:39,155-Speed 5555.97 samples/sec Loss 0.2177 LearningRate 0.0002 Epoch: 19 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:51:46,533-Speed 5552.52 samples/sec Loss 0.2317 LearningRate 0.0002 Epoch: 19 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:51:53,866-Speed 5586.34 samples/sec Loss 0.2247 LearningRate 0.0002 Epoch: 19 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:52:01,230-Speed 5562.91 samples/sec Loss 0.2213 LearningRate 0.0002 Epoch: 19 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:52:08,609-Speed 5551.18 samples/sec Loss 0.2169 LearningRate 0.0002 Epoch: 19 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:52:16,456-Speed 5220.52 samples/sec Loss 0.2167 LearningRate 0.0002 Epoch: 19 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:52:23,797-Speed 5580.94 samples/sec Loss 0.2043 LearningRate 0.0002 Epoch: 19 Global Step: 81830 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:52:31,087-Speed 5618.89 samples/sec Loss 0.2160 LearningRate 0.0002 Epoch: 19 Global Step: 81840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:52:38,544-Speed 5493.63 samples/sec Loss 0.2184 LearningRate 0.0002 Epoch: 19 Global Step: 81850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:52:45,915-Speed 5558.31 samples/sec Loss 0.2182 LearningRate 0.0002 Epoch: 19 Global Step: 81860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:52:53,255-Speed 5580.81 samples/sec Loss 0.2183 LearningRate 0.0002 Epoch: 19 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:00,546-Speed 5618.95 samples/sec Loss 0.2203 LearningRate 0.0002 Epoch: 19 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:07,989-Speed 5504.31 samples/sec Loss 0.2166 LearningRate 0.0002 Epoch: 19 Global Step: 81890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:15,375-Speed 5546.91 samples/sec Loss 0.2115 LearningRate 0.0002 Epoch: 19 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:22,819-Speed 5503.29 samples/sec Loss 0.2121 LearningRate 0.0002 Epoch: 19 Global Step: 81910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:30,202-Speed 5548.15 samples/sec Loss 0.2155 LearningRate 0.0002 Epoch: 19 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:37,555-Speed 5571.38 samples/sec Loss 0.2236 LearningRate 0.0002 Epoch: 19 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:44,926-Speed 5557.74 samples/sec Loss 0.2195 LearningRate 0.0002 Epoch: 19 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:52,349-Speed 5518.68 samples/sec Loss 0.2174 LearningRate 0.0002 Epoch: 19 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:53:59,730-Speed 5550.40 samples/sec Loss 0.2172 LearningRate 0.0002 Epoch: 19 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:54:07,221-Speed 5468.18 samples/sec Loss 0.2120 LearningRate 0.0002 Epoch: 19 Global Step: 81970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:54:14,668-Speed 5501.02 samples/sec Loss 0.2128 LearningRate 0.0002 Epoch: 19 Global Step: 81980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:54:22,036-Speed 5560.08 samples/sec Loss 0.2194 LearningRate 0.0001 Epoch: 19 Global Step: 81990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:54:29,625-Speed 5397.80 samples/sec Loss 0.2191 LearningRate 0.0001 Epoch: 19 Global Step: 82000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:54:37,148-Speed 5445.07 samples/sec Loss 0.2195 LearningRate 0.0001 Epoch: 19 Global Step: 82010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:54:44,738-Speed 5397.72 samples/sec Loss 0.2191 LearningRate 0.0001 Epoch: 19 Global Step: 82020 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:54:52,265-Speed 5442.74 samples/sec Loss 0.2175 LearningRate 0.0001 Epoch: 19 Global Step: 82030 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:54:59,646-Speed 5549.95 samples/sec Loss 0.2098 LearningRate 0.0001 Epoch: 19 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:07,123-Speed 5478.75 samples/sec Loss 0.2036 LearningRate 0.0001 Epoch: 19 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:14,608-Speed 5473.32 samples/sec Loss 0.2085 LearningRate 0.0001 Epoch: 19 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:22,081-Speed 5481.83 samples/sec Loss 0.2159 LearningRate 0.0001 Epoch: 19 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:29,626-Speed 5429.26 samples/sec Loss 0.2199 LearningRate 0.0001 Epoch: 19 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:37,200-Speed 5409.12 samples/sec Loss 0.2165 LearningRate 0.0001 Epoch: 19 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:44,672-Speed 5482.50 samples/sec Loss 0.2066 LearningRate 0.0001 Epoch: 19 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:52,097-Speed 5517.71 samples/sec Loss 0.2054 LearningRate 0.0001 Epoch: 19 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:55:59,642-Speed 5430.45 samples/sec Loss 0.2110 LearningRate 0.0001 Epoch: 19 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:56:07,132-Speed 5469.18 samples/sec Loss 0.2232 LearningRate 0.0001 Epoch: 19 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:56:14,582-Speed 5499.44 samples/sec Loss 0.2127 LearningRate 0.0001 Epoch: 19 Global Step: 82140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:56:21,983-Speed 5534.77 samples/sec Loss 0.2105 LearningRate 0.0001 Epoch: 19 Global Step: 82150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:56:29,384-Speed 5535.35 samples/sec Loss 0.2133 LearningRate 0.0001 Epoch: 19 Global Step: 82160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:56:36,771-Speed 5545.61 samples/sec Loss 0.2238 LearningRate 0.0001 Epoch: 19 Global Step: 82170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:56:44,211-Speed 5506.20 samples/sec Loss 0.2113 LearningRate 0.0001 Epoch: 19 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:56:51,587-Speed 5554.20 samples/sec Loss 0.2119 LearningRate 0.0001 Epoch: 19 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:56:59,117-Speed 5440.56 samples/sec Loss 0.2103 LearningRate 0.0001 Epoch: 19 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:57:06,684-Speed 5413.73 samples/sec Loss 0.2255 LearningRate 0.0001 Epoch: 19 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:57:14,048-Speed 5563.27 samples/sec Loss 0.2086 LearningRate 0.0001 Epoch: 19 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:57:21,442-Speed 5539.77 samples/sec Loss 0.2233 LearningRate 0.0001 Epoch: 19 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:57:28,834-Speed 5542.24 samples/sec Loss 0.2163 LearningRate 0.0001 Epoch: 19 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:58:05,209-Speed 5574.02 samples/sec Loss 0.2172 LearningRate 0.0001 Epoch: 19 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:58:12,804-Speed 5393.48 samples/sec Loss 0.2168 LearningRate 0.0001 Epoch: 19 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:58:20,171-Speed 5561.18 samples/sec Loss 0.2178 LearningRate 0.0001 Epoch: 19 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:58:27,580-Speed 5531.84 samples/sec Loss 0.2180 LearningRate 0.0001 Epoch: 19 Global Step: 82280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:58:35,068-Speed 5470.51 samples/sec Loss 0.2142 LearningRate 0.0001 Epoch: 19 Global Step: 82290 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:58:42,692-Speed 5373.42 samples/sec Loss 0.2191 LearningRate 0.0001 Epoch: 19 Global Step: 82300 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:58:50,101-Speed 5529.27 samples/sec Loss 0.2179 LearningRate 0.0001 Epoch: 19 Global Step: 82310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:58:57,557-Speed 5493.86 samples/sec Loss 0.2153 LearningRate 0.0001 Epoch: 19 Global Step: 82320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:59:05,073-Speed 5450.79 samples/sec Loss 0.2177 LearningRate 0.0001 Epoch: 19 Global Step: 82330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:59:12,501-Speed 5514.98 samples/sec Loss 0.2150 LearningRate 0.0001 Epoch: 19 Global Step: 82340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:59:19,962-Speed 5490.49 samples/sec Loss 0.2195 LearningRate 0.0001 Epoch: 19 Global Step: 82350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 16:59:27,321-Speed 5567.01 samples/sec Loss 0.2172 LearningRate 0.0001 Epoch: 19 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:59:34,704-Speed 5548.33 samples/sec Loss 0.2233 LearningRate 0.0001 Epoch: 19 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:59:42,135-Speed 5512.26 samples/sec Loss 0.2076 LearningRate 0.0001 Epoch: 19 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:59:49,656-Speed 5447.53 samples/sec Loss 0.2205 LearningRate 0.0001 Epoch: 19 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 16:59:57,005-Speed 5574.05 samples/sec Loss 0.2040 LearningRate 0.0001 Epoch: 19 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:00:04,789-Speed 5262.93 samples/sec Loss 0.2099 LearningRate 0.0001 Epoch: 19 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:00:12,215-Speed 5516.15 samples/sec Loss 0.2187 LearningRate 0.0001 Epoch: 19 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:00:19,883-Speed 5342.27 samples/sec Loss 0.2117 LearningRate 0.0001 Epoch: 19 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:00:27,299-Speed 5524.77 samples/sec Loss 0.2123 LearningRate 0.0001 Epoch: 19 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:00:34,794-Speed 5465.36 samples/sec Loss 0.2136 LearningRate 0.0001 Epoch: 19 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:00:42,172-Speed 5552.14 samples/sec Loss 0.2196 LearningRate 0.0001 Epoch: 19 Global Step: 82460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:00:49,606-Speed 5510.88 samples/sec Loss 0.2118 LearningRate 0.0001 Epoch: 19 Global Step: 82470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:00:57,230-Speed 5373.51 samples/sec Loss 0.2191 LearningRate 0.0001 Epoch: 19 Global Step: 82480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:04,877-Speed 5356.81 samples/sec Loss 0.2114 LearningRate 0.0001 Epoch: 19 Global Step: 82490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:12,449-Speed 5410.54 samples/sec Loss 0.2141 LearningRate 0.0001 Epoch: 19 Global Step: 82500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:19,814-Speed 5562.43 samples/sec Loss 0.2145 LearningRate 0.0001 Epoch: 19 Global Step: 82510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:27,162-Speed 5574.78 samples/sec Loss 0.2233 LearningRate 0.0001 Epoch: 19 Global Step: 82520 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:34,677-Speed 5451.51 samples/sec Loss 0.2157 LearningRate 0.0001 Epoch: 19 Global Step: 82530 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:42,027-Speed 5573.68 samples/sec Loss 0.2073 LearningRate 0.0001 Epoch: 19 Global Step: 82540 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:49,430-Speed 5533.08 samples/sec Loss 0.2152 LearningRate 0.0001 Epoch: 19 Global Step: 82550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:01:57,143-Speed 5312.07 samples/sec Loss 0.2096 LearningRate 0.0001 Epoch: 19 Global Step: 82560 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 17:02:04,758-Speed 5379.66 samples/sec Loss 0.2163 LearningRate 0.0001 Epoch: 19 Global Step: 82570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:02:12,421-Speed 5345.23 samples/sec Loss 0.2078 LearningRate 0.0001 Epoch: 19 Global Step: 82580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:02:20,093-Speed 5340.07 samples/sec Loss 0.2088 LearningRate 0.0001 Epoch: 19 Global Step: 82590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:02:27,730-Speed 5363.73 samples/sec Loss 0.2232 LearningRate 0.0001 Epoch: 19 Global Step: 82600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:02:35,145-Speed 5524.99 samples/sec Loss 0.2112 LearningRate 0.0000 Epoch: 19 Global Step: 82610 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:02:42,637-Speed 5468.09 samples/sec Loss 0.2154 LearningRate 0.0000 Epoch: 19 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:02:50,093-Speed 5494.50 samples/sec Loss 0.2143 LearningRate 0.0000 Epoch: 19 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:02:57,489-Speed 5539.24 samples/sec Loss 0.2102 LearningRate 0.0000 Epoch: 19 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:04,985-Speed 5464.85 samples/sec Loss 0.2128 LearningRate 0.0000 Epoch: 19 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:12,372-Speed 5546.05 samples/sec Loss 0.2076 LearningRate 0.0000 Epoch: 19 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:19,781-Speed 5529.09 samples/sec Loss 0.2198 LearningRate 0.0000 Epoch: 19 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:27,774-Speed 5125.22 samples/sec Loss 0.2100 LearningRate 0.0000 Epoch: 19 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:35,760-Speed 5129.72 samples/sec Loss 0.2172 LearningRate 0.0000 Epoch: 19 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:43,687-Speed 5167.85 samples/sec Loss 0.2088 LearningRate 0.0000 Epoch: 19 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:51,944-Speed 4961.37 samples/sec Loss 0.2153 LearningRate 0.0000 Epoch: 19 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:03:59,916-Speed 5138.40 samples/sec Loss 0.2193 LearningRate 0.0000 Epoch: 19 Global Step: 82720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:04:07,917-Speed 5120.20 samples/sec Loss 0.2212 LearningRate 0.0000 Epoch: 19 Global Step: 82730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:04:15,932-Speed 5111.46 samples/sec Loss 0.2174 LearningRate 0.0000 Epoch: 19 Global Step: 82740 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:04:23,869-Speed 5161.30 samples/sec Loss 0.2160 LearningRate 0.0000 Epoch: 19 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:04:31,313-Speed 5502.62 samples/sec Loss 0.2154 LearningRate 0.0000 Epoch: 19 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:04:38,705-Speed 5542.92 samples/sec Loss 0.2150 LearningRate 0.0000 Epoch: 19 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:04:45,997-Speed 5617.27 samples/sec Loss 0.2117 LearningRate 0.0000 Epoch: 19 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:04:53,474-Speed 5479.49 samples/sec Loss 0.2215 LearningRate 0.0000 Epoch: 19 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:05:00,971-Speed 5464.00 samples/sec Loss 0.2156 LearningRate 0.0000 Epoch: 19 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:05:08,363-Speed 5541.94 samples/sec Loss 0.2171 LearningRate 0.0000 Epoch: 19 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:05:15,788-Speed 5517.46 samples/sec Loss 0.2200 LearningRate 0.0000 Epoch: 19 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:05:23,298-Speed 5454.48 samples/sec Loss 0.2164 LearningRate 0.0000 Epoch: 19 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:05:30,844-Speed 5428.92 samples/sec Loss 0.2106 LearningRate 0.0000 Epoch: 19 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:05:38,323-Speed 5477.64 samples/sec Loss 0.2154 LearningRate 0.0000 Epoch: 19 Global Step: 82850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:05:45,804-Speed 5476.30 samples/sec Loss 0.2219 LearningRate 0.0000 Epoch: 19 Global Step: 82860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:05:53,418-Speed 5380.54 samples/sec Loss 0.2056 LearningRate 0.0000 Epoch: 19 Global Step: 82870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:06:00,845-Speed 5515.47 samples/sec Loss 0.2181 LearningRate 0.0000 Epoch: 19 Global Step: 82880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:06:08,339-Speed 5466.53 samples/sec Loss 0.2129 LearningRate 0.0000 Epoch: 19 Global Step: 82890 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:06:15,694-Speed 5569.91 samples/sec Loss 0.2108 LearningRate 0.0000 Epoch: 19 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:06:23,112-Speed 5522.46 samples/sec Loss 0.2201 LearningRate 0.0000 Epoch: 19 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:06:30,483-Speed 5557.69 samples/sec Loss 0.2124 LearningRate 0.0000 Epoch: 19 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:06:37,880-Speed 5538.20 samples/sec Loss 0.2182 LearningRate 0.0000 Epoch: 19 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:06:45,255-Speed 5554.89 samples/sec Loss 0.2195 LearningRate 0.0000 Epoch: 19 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:06:52,671-Speed 5523.84 samples/sec Loss 0.2152 LearningRate 0.0000 Epoch: 19 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:07:00,054-Speed 5548.67 samples/sec Loss 0.2141 LearningRate 0.0000 Epoch: 19 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:07:07,509-Speed 5495.88 samples/sec Loss 0.2152 LearningRate 0.0000 Epoch: 19 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:07:14,922-Speed 5526.11 samples/sec Loss 0.2086 LearningRate 0.0000 Epoch: 19 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:07:22,243-Speed 5596.19 samples/sec Loss 0.2090 LearningRate 0.0000 Epoch: 19 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:07:29,666-Speed 5518.76 samples/sec Loss 0.2150 LearningRate 0.0000 Epoch: 19 Global Step: 83000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:07:37,283-Speed 5377.94 samples/sec Loss 0.2150 LearningRate 0.0000 Epoch: 19 Global Step: 83010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:07:44,828-Speed 5429.55 samples/sec Loss 0.2114 LearningRate 0.0000 Epoch: 19 Global Step: 83020 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:07:52,287-Speed 5491.88 samples/sec Loss 0.2106 LearningRate 0.0000 Epoch: 19 Global Step: 83030 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:07:59,657-Speed 5558.77 samples/sec Loss 0.2223 LearningRate 0.0000 Epoch: 19 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:07,157-Speed 5462.50 samples/sec Loss 0.2094 LearningRate 0.0000 Epoch: 19 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:14,549-Speed 5541.06 samples/sec Loss 0.2157 LearningRate 0.0000 Epoch: 19 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:21,995-Speed 5502.28 samples/sec Loss 0.2119 LearningRate 0.0000 Epoch: 19 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:29,358-Speed 5563.61 samples/sec Loss 0.2216 LearningRate 0.0000 Epoch: 19 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:36,764-Speed 5531.31 samples/sec Loss 0.2222 LearningRate 0.0000 Epoch: 19 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:44,136-Speed 5556.59 samples/sec Loss 0.2191 LearningRate 0.0000 Epoch: 19 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:51,562-Speed 5516.74 samples/sec Loss 0.2107 LearningRate 0.0000 Epoch: 19 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:08:58,939-Speed 5552.95 samples/sec Loss 0.2060 LearningRate 0.0000 Epoch: 19 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:09:06,342-Speed 5534.20 samples/sec Loss 0.2055 LearningRate 0.0000 Epoch: 19 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:09:13,839-Speed 5464.36 samples/sec Loss 0.2085 LearningRate 0.0000 Epoch: 19 Global Step: 83140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:09:21,155-Speed 5599.54 samples/sec Loss 0.2157 LearningRate 0.0000 Epoch: 19 Global Step: 83150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:09:28,521-Speed 5561.78 samples/sec Loss 0.2193 LearningRate 0.0000 Epoch: 19 Global Step: 83160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:09:35,909-Speed 5544.60 samples/sec Loss 0.2107 LearningRate 0.0000 Epoch: 19 Global Step: 83170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:09:43,235-Speed 5592.14 samples/sec Loss 0.2134 LearningRate 0.0000 Epoch: 19 Global Step: 83180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:09:50,467-Speed 5664.30 samples/sec Loss 0.2235 LearningRate 0.0000 Epoch: 19 Global Step: 83190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:09:58,127-Speed 5348.29 samples/sec Loss 0.2102 LearningRate 0.0000 Epoch: 19 Global Step: 83200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:10:05,442-Speed 5601.38 samples/sec Loss 0.2127 LearningRate 0.0000 Epoch: 19 Global Step: 83210 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:10:12,791-Speed 5574.54 samples/sec Loss 0.2156 LearningRate 0.0000 Epoch: 19 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:10:20,184-Speed 5541.54 samples/sec Loss 0.2156 LearningRate 0.0000 Epoch: 19 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:10:27,743-Speed 5418.86 samples/sec Loss 0.2199 LearningRate 0.0000 Epoch: 19 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:10:35,297-Speed 5423.45 samples/sec Loss 0.2213 LearningRate 0.0000 Epoch: 19 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:10:42,794-Speed 5464.38 samples/sec Loss 0.2179 LearningRate 0.0000 Epoch: 19 Global Step: 83260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:10:50,447-Speed 5353.07 samples/sec Loss 0.2187 LearningRate 0.0000 Epoch: 19 Global Step: 83270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:10:57,851-Speed 5532.82 samples/sec Loss 0.2174 LearningRate 0.0000 Epoch: 19 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:11:05,192-Speed 5579.85 samples/sec Loss 0.2235 LearningRate 0.0000 Epoch: 19 Global Step: 83290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:11:12,540-Speed 5575.07 samples/sec Loss 0.2188 LearningRate 0.0000 Epoch: 19 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:11:19,904-Speed 5563.69 samples/sec Loss 0.2153 LearningRate 0.0000 Epoch: 19 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 17:11:27,242-Speed 5582.40 samples/sec Loss 0.2221 LearningRate 0.0000 Epoch: 19 Global Step: 83320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:11:34,593-Speed 5573.12 samples/sec Loss 0.2102 LearningRate 0.0000 Epoch: 19 Global Step: 83330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:11:41,920-Speed 5590.95 samples/sec Loss 0.2175 LearningRate 0.0000 Epoch: 19 Global Step: 83340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:11:49,274-Speed 5570.56 samples/sec Loss 0.2082 LearningRate 0.0000 Epoch: 19 Global Step: 83350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:11:56,625-Speed 5573.22 samples/sec Loss 0.2083 LearningRate 0.0000 Epoch: 19 Global Step: 83360 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:12:04,017-Speed 5541.60 samples/sec Loss 0.2240 LearningRate 0.0000 Epoch: 19 Global Step: 83370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:12:11,351-Speed 5585.88 samples/sec Loss 0.2154 LearningRate 0.0000 Epoch: 19 Global Step: 83380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:12:18,682-Speed 5587.89 samples/sec Loss 0.2206 LearningRate 0.0000 Epoch: 19 Global Step: 83390 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:12:26,101-Speed 5522.05 samples/sec Loss 0.2159 LearningRate 0.0000 Epoch: 19 Global Step: 83400 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:12:33,460-Speed 5566.30 samples/sec Loss 0.2149 LearningRate 0.0000 Epoch: 19 Global Step: 83410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:12:40,773-Speed 5602.55 samples/sec Loss 0.2192 LearningRate 0.0000 Epoch: 19 Global Step: 83420 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 17:12:48,461-Speed 5328.78 samples/sec Loss 0.2033 LearningRate 0.0000 Epoch: 19 Global Step: 83430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 17:12:55,740-Speed 5628.22 samples/sec Loss 0.2206 LearningRate 0.0000 Epoch: 19 Global Step: 83440 Fp16 Grad Scale: 131072 Required: -0 hours