Training: 2022-01-16 21:35:19,582-rank_id: 0 Training: 2022-01-16 21:35:47,776-: loss cosface Training: 2022-01-16 21:35:47,777-: network mbf Training: 2022-01-16 21:35:47,779-: resume False Training: 2022-01-16 21:35:47,780-: output work_dirs/glint360k_mobileface_lr02_bs4k Training: 2022-01-16 21:35:47,780-: embedding_size 512 Training: 2022-01-16 21:35:47,780-: sample_rate 1.0 Training: 2022-01-16 21:35:47,780-: fp16 True Training: 2022-01-16 21:35:47,781-: momentum 0.9 Training: 2022-01-16 21:35:47,781-: weight_decay 0.0001 Training: 2022-01-16 21:35:47,781-: batch_size 512 Training: 2022-01-16 21:35:47,782-: lr 0.4 Training: 2022-01-16 21:35:47,782-: dali False Training: 2022-01-16 21:35:47,782-: verbose 5000 Training: 2022-01-16 21:35:47,782-: frequent 10 Training: 2022-01-16 21:35:47,783-: score None Training: 2022-01-16 21:35:47,784-: rec /train_tmp/glint360k Training: 2022-01-16 21:35:47,784-: num_classes 360232 Training: 2022-01-16 21:35:47,784-: num_image 17091657 Training: 2022-01-16 21:35:47,784-: num_epoch 20 Training: 2022-01-16 21:35:47,784-: warmup_epoch 2 Training: 2022-01-16 21:35:47,784-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-01-16 21:35:47,787-: warmup_step 8344 Training: 2022-01-16 21:35:47,787-: total_step 83440 Training: 2022-01-16 21:37:19,598-Reducer buckets have been rebuilt in this iteration. Training: 2022-01-16 21:37:26,401-Speed 11800.04 samples/sec Loss 42.3526 LearningRate 0.0010 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-01-16 21:37:29,796-Speed 12067.57 samples/sec Loss 42.3427 LearningRate 0.0014 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-01-16 21:37:33,355-Speed 11517.53 samples/sec Loss 42.3639 LearningRate 0.0019 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-01-16 21:37:38,435-Speed 8066.77 samples/sec Loss 42.3336 LearningRate 0.0024 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-01-16 21:37:42,453-Speed 10196.88 samples/sec Loss 42.3310 LearningRate 0.0029 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-01-16 21:37:46,423-Speed 10328.91 samples/sec Loss 42.3667 LearningRate 0.0034 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-16 21:37:50,298-Speed 10575.69 samples/sec Loss 42.3169 LearningRate 0.0038 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-01-16 21:37:54,138-Speed 10669.58 samples/sec Loss 42.3110 LearningRate 0.0043 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-01-16 21:37:58,165-Speed 10175.00 samples/sec Loss 42.2937 LearningRate 0.0048 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-01-16 21:38:01,974-Speed 10756.47 samples/sec Loss 42.2813 LearningRate 0.0053 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-16 21:38:06,088-Speed 9960.28 samples/sec Loss 42.2621 LearningRate 0.0058 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-16 21:38:11,105-Speed 8166.29 samples/sec Loss 42.2495 LearningRate 0.0062 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-01-16 21:38:15,076-Speed 10318.64 samples/sec Loss 42.2128 LearningRate 0.0067 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-16 21:38:19,094-Speed 10200.21 samples/sec Loss 42.1193 LearningRate 0.0072 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-16 21:38:23,013-Speed 10456.19 samples/sec Loss 42.0281 LearningRate 0.0077 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-16 21:38:28,651-Speed 7266.83 samples/sec Loss 41.8726 LearningRate 0.0081 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-16 21:38:32,072-Speed 11979.13 samples/sec Loss 41.6260 LearningRate 0.0086 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-16 21:38:35,495-Speed 11970.37 samples/sec Loss 41.3147 LearningRate 0.0091 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-16 21:38:39,013-Speed 11647.68 samples/sec Loss 40.9236 LearningRate 0.0096 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-16 21:38:42,993-Speed 10293.77 samples/sec Loss 40.5337 LearningRate 0.0101 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-16 21:38:47,135-Speed 9892.64 samples/sec Loss 40.1499 LearningRate 0.0105 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:38:51,014-Speed 10564.72 samples/sec Loss 39.8074 LearningRate 0.0110 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:38:55,147-Speed 9913.39 samples/sec Loss 39.5633 LearningRate 0.0115 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:38:59,072-Speed 10436.25 samples/sec Loss 39.2883 LearningRate 0.0120 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:39:03,302-Speed 9686.10 samples/sec Loss 39.1116 LearningRate 0.0125 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:39:07,242-Speed 10399.49 samples/sec Loss 38.9349 LearningRate 0.0129 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:39:11,684-Speed 9223.33 samples/sec Loss 38.7797 LearningRate 0.0134 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:39:15,606-Speed 10446.80 samples/sec Loss 38.6355 LearningRate 0.0139 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:39:19,591-Speed 10281.26 samples/sec Loss 38.5212 LearningRate 0.0144 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 21:39:23,674-Speed 10033.22 samples/sec Loss 38.4132 LearningRate 0.0149 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 21:39:27,448-Speed 10857.80 samples/sec Loss 38.3286 LearningRate 0.0153 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:31,429-Speed 10293.31 samples/sec Loss 38.1847 LearningRate 0.0158 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:35,229-Speed 10780.01 samples/sec Loss 38.1547 LearningRate 0.0163 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:38,953-Speed 11003.16 samples/sec Loss 38.0911 LearningRate 0.0168 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:42,964-Speed 10214.47 samples/sec Loss 38.0091 LearningRate 0.0173 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:46,956-Speed 10264.10 samples/sec Loss 37.9577 LearningRate 0.0177 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:50,708-Speed 10924.84 samples/sec Loss 37.9245 LearningRate 0.0182 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:54,374-Speed 11176.27 samples/sec Loss 37.8578 LearningRate 0.0187 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:39:58,009-Speed 11272.45 samples/sec Loss 37.7956 LearningRate 0.0192 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 21:40:01,644-Speed 11272.64 samples/sec Loss 37.7728 LearningRate 0.0197 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 21:40:05,497-Speed 10633.04 samples/sec Loss 37.7356 LearningRate 0.0201 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 21:40:09,387-Speed 10534.54 samples/sec Loss 37.7035 LearningRate 0.0206 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 21:40:13,198-Speed 10749.68 samples/sec Loss 37.6341 LearningRate 0.0211 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 21:40:16,999-Speed 10782.24 samples/sec Loss 37.6272 LearningRate 0.0216 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 21:40:20,581-Speed 11441.30 samples/sec Loss 37.6124 LearningRate 0.0221 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 21:40:24,252-Speed 11160.21 samples/sec Loss 37.5767 LearningRate 0.0225 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 21:40:27,760-Speed 11681.76 samples/sec Loss 37.5603 LearningRate 0.0230 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:40:31,246-Speed 11751.01 samples/sec Loss 37.5227 LearningRate 0.0235 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:40:34,787-Speed 11572.93 samples/sec Loss 37.5093 LearningRate 0.0240 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:40:38,365-Speed 11450.48 samples/sec Loss 37.4705 LearningRate 0.0244 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:40:41,818-Speed 11865.84 samples/sec Loss 37.4687 LearningRate 0.0249 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:40:45,370-Speed 11534.80 samples/sec Loss 37.4226 LearningRate 0.0254 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:40:49,123-Speed 10916.75 samples/sec Loss 37.4053 LearningRate 0.0259 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:40:52,590-Speed 11821.06 samples/sec Loss 37.3706 LearningRate 0.0264 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:40:56,241-Speed 11223.67 samples/sec Loss 37.3317 LearningRate 0.0268 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:40:59,681-Speed 11911.32 samples/sec Loss 37.2831 LearningRate 0.0273 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:03,265-Speed 11430.84 samples/sec Loss 37.2582 LearningRate 0.0278 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:06,712-Speed 11890.31 samples/sec Loss 37.2426 LearningRate 0.0283 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:10,411-Speed 11075.45 samples/sec Loss 37.2139 LearningRate 0.0288 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:13,961-Speed 11543.65 samples/sec Loss 37.1659 LearningRate 0.0292 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:17,630-Speed 11167.90 samples/sec Loss 37.1159 LearningRate 0.0297 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:22,035-Speed 9299.80 samples/sec Loss 37.0935 LearningRate 0.0302 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:25,876-Speed 10667.75 samples/sec Loss 37.0618 LearningRate 0.0307 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:29,352-Speed 11789.74 samples/sec Loss 37.0005 LearningRate 0.0312 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:32,893-Speed 11572.03 samples/sec Loss 36.9523 LearningRate 0.0316 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 21:41:36,787-Speed 10519.96 samples/sec Loss 36.9092 LearningRate 0.0321 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:41:40,528-Speed 10952.20 samples/sec Loss 36.8882 LearningRate 0.0326 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:41:44,182-Speed 11212.79 samples/sec Loss 36.7997 LearningRate 0.0331 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:41:47,716-Speed 11595.00 samples/sec Loss 36.7810 LearningRate 0.0336 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:41:51,324-Speed 11356.56 samples/sec Loss 36.7211 LearningRate 0.0340 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:41:55,028-Speed 11063.58 samples/sec Loss 36.6647 LearningRate 0.0345 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:41:58,599-Speed 11471.93 samples/sec Loss 36.5856 LearningRate 0.0350 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:02,041-Speed 11904.13 samples/sec Loss 36.5114 LearningRate 0.0355 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:05,663-Speed 11312.53 samples/sec Loss 36.4823 LearningRate 0.0360 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:09,141-Speed 11780.79 samples/sec Loss 36.4136 LearningRate 0.0364 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:12,814-Speed 11155.79 samples/sec Loss 36.3338 LearningRate 0.0369 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:16,300-Speed 11753.02 samples/sec Loss 36.2989 LearningRate 0.0374 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:20,072-Speed 10862.48 samples/sec Loss 36.2280 LearningRate 0.0379 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:24,112-Speed 10140.78 samples/sec Loss 36.2118 LearningRate 0.0384 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:27,979-Speed 10595.51 samples/sec Loss 36.1183 LearningRate 0.0388 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:31,684-Speed 11060.83 samples/sec Loss 36.0451 LearningRate 0.0393 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:35,175-Speed 11735.10 samples/sec Loss 36.0207 LearningRate 0.0398 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:38,638-Speed 11834.02 samples/sec Loss 35.9091 LearningRate 0.0403 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:42,383-Speed 10939.50 samples/sec Loss 35.8554 LearningRate 0.0407 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:45,975-Speed 11405.61 samples/sec Loss 35.7314 LearningRate 0.0412 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:49,625-Speed 11227.30 samples/sec Loss 35.6863 LearningRate 0.0417 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 21:42:53,291-Speed 11174.20 samples/sec Loss 35.6032 LearningRate 0.0422 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:42:56,998-Speed 11055.33 samples/sec Loss 35.5083 LearningRate 0.0427 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:00,677-Speed 11137.46 samples/sec Loss 35.4657 LearningRate 0.0431 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:04,342-Speed 11179.09 samples/sec Loss 35.3861 LearningRate 0.0436 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:08,012-Speed 11162.83 samples/sec Loss 35.2714 LearningRate 0.0441 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:11,796-Speed 10828.44 samples/sec Loss 35.2519 LearningRate 0.0446 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:15,393-Speed 11392.27 samples/sec Loss 35.1559 LearningRate 0.0451 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:18,819-Speed 11959.56 samples/sec Loss 35.0886 LearningRate 0.0455 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:22,355-Speed 11586.30 samples/sec Loss 34.9938 LearningRate 0.0460 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:25,953-Speed 11388.30 samples/sec Loss 34.9081 LearningRate 0.0465 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:29,551-Speed 11389.69 samples/sec Loss 34.8380 LearningRate 0.0470 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:33,756-Speed 9743.05 samples/sec Loss 34.6791 LearningRate 0.0475 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:37,304-Speed 11546.09 samples/sec Loss 34.6311 LearningRate 0.0479 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:40,814-Speed 11677.45 samples/sec Loss 34.5333 LearningRate 0.0484 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:44,369-Speed 11525.96 samples/sec Loss 34.3680 LearningRate 0.0489 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:48,239-Speed 10584.93 samples/sec Loss 34.3101 LearningRate 0.0494 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:51,998-Speed 10900.32 samples/sec Loss 34.2277 LearningRate 0.0499 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:55,435-Speed 11919.48 samples/sec Loss 34.1633 LearningRate 0.0503 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:43:58,928-Speed 11731.57 samples/sec Loss 34.0869 LearningRate 0.0508 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:02,535-Speed 11360.05 samples/sec Loss 33.8950 LearningRate 0.0513 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:06,847-Speed 9500.92 samples/sec Loss 33.9023 LearningRate 0.0518 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:10,403-Speed 11523.26 samples/sec Loss 33.7472 LearningRate 0.0523 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:14,150-Speed 10933.02 samples/sec Loss 33.5811 LearningRate 0.0527 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:17,648-Speed 11716.15 samples/sec Loss 33.5295 LearningRate 0.0532 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:21,210-Speed 11504.71 samples/sec Loss 33.4311 LearningRate 0.0537 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:24,740-Speed 11606.47 samples/sec Loss 33.3662 LearningRate 0.0542 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:28,339-Speed 11384.78 samples/sec Loss 33.2153 LearningRate 0.0547 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:44:32,017-Speed 11139.44 samples/sec Loss 33.1396 LearningRate 0.0551 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:44:36,007-Speed 10269.05 samples/sec Loss 32.9439 LearningRate 0.0556 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:44:39,585-Speed 11451.88 samples/sec Loss 32.8668 LearningRate 0.0561 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:44:43,311-Speed 10996.89 samples/sec Loss 32.7955 LearningRate 0.0566 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:44:46,883-Speed 11470.81 samples/sec Loss 32.6983 LearningRate 0.0570 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:44:50,847-Speed 10336.72 samples/sec Loss 32.5425 LearningRate 0.0575 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:44:54,395-Speed 11548.00 samples/sec Loss 32.4675 LearningRate 0.0580 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:44:58,181-Speed 10821.07 samples/sec Loss 32.3796 LearningRate 0.0585 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:01,810-Speed 11291.62 samples/sec Loss 32.2582 LearningRate 0.0590 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:05,799-Speed 10273.93 samples/sec Loss 32.1070 LearningRate 0.0594 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:09,376-Speed 11452.42 samples/sec Loss 31.9863 LearningRate 0.0599 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:45:13,089-Speed 11035.45 samples/sec Loss 31.8893 LearningRate 0.0604 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:45:16,603-Speed 11659.35 samples/sec Loss 31.7492 LearningRate 0.0609 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:45:20,332-Speed 10986.78 samples/sec Loss 31.6883 LearningRate 0.0614 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:23,922-Speed 11415.88 samples/sec Loss 31.5829 LearningRate 0.0618 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:27,605-Speed 11125.51 samples/sec Loss 31.4634 LearningRate 0.0623 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:31,543-Speed 10402.15 samples/sec Loss 31.3163 LearningRate 0.0628 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:35,282-Speed 10958.92 samples/sec Loss 31.2083 LearningRate 0.0633 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:38,729-Speed 11886.74 samples/sec Loss 31.0825 LearningRate 0.0638 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:42,563-Speed 10687.53 samples/sec Loss 30.9665 LearningRate 0.0642 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:46,131-Speed 11486.35 samples/sec Loss 30.8049 LearningRate 0.0647 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:49,570-Speed 11911.18 samples/sec Loss 30.7580 LearningRate 0.0652 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:53,026-Speed 11855.50 samples/sec Loss 30.6305 LearningRate 0.0657 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:45:56,644-Speed 11326.39 samples/sec Loss 30.5151 LearningRate 0.0662 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:00,206-Speed 11500.42 samples/sec Loss 30.3418 LearningRate 0.0666 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:03,989-Speed 10832.25 samples/sec Loss 30.2802 LearningRate 0.0671 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:07,605-Speed 11335.73 samples/sec Loss 30.0933 LearningRate 0.0676 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:11,413-Speed 10758.33 samples/sec Loss 30.0576 LearningRate 0.0681 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:14,876-Speed 11832.61 samples/sec Loss 29.8600 LearningRate 0.0686 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:18,333-Speed 11850.05 samples/sec Loss 29.8121 LearningRate 0.0690 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:21,909-Speed 11461.24 samples/sec Loss 29.6853 LearningRate 0.0695 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:25,767-Speed 10620.09 samples/sec Loss 29.5770 LearningRate 0.0700 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:30,064-Speed 9534.57 samples/sec Loss 29.4410 LearningRate 0.0705 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:33,959-Speed 10518.47 samples/sec Loss 29.3863 LearningRate 0.0709 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:37,868-Speed 10481.10 samples/sec Loss 29.1999 LearningRate 0.0714 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:41,464-Speed 11395.05 samples/sec Loss 29.0606 LearningRate 0.0719 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:44,940-Speed 11787.24 samples/sec Loss 28.9691 LearningRate 0.0724 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:48,610-Speed 11162.81 samples/sec Loss 28.8105 LearningRate 0.0729 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:52,071-Speed 11841.65 samples/sec Loss 28.7130 LearningRate 0.0733 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:55,713-Speed 11248.20 samples/sec Loss 28.5915 LearningRate 0.0738 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:46:59,306-Speed 11404.90 samples/sec Loss 28.4914 LearningRate 0.0743 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:02,900-Speed 11401.20 samples/sec Loss 28.4236 LearningRate 0.0748 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:06,528-Speed 11294.41 samples/sec Loss 28.1891 LearningRate 0.0753 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:10,314-Speed 10821.37 samples/sec Loss 28.0495 LearningRate 0.0757 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:13,912-Speed 11386.98 samples/sec Loss 28.0215 LearningRate 0.0762 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:17,649-Speed 10964.87 samples/sec Loss 27.8659 LearningRate 0.0767 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:21,178-Speed 11611.69 samples/sec Loss 27.7854 LearningRate 0.0772 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:24,752-Speed 11463.69 samples/sec Loss 27.6744 LearningRate 0.0777 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:47:28,877-Speed 9932.53 samples/sec Loss 27.5920 LearningRate 0.0781 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:32,671-Speed 10800.55 samples/sec Loss 27.3985 LearningRate 0.0786 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:36,313-Speed 11251.44 samples/sec Loss 27.2644 LearningRate 0.0791 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:39,892-Speed 11447.88 samples/sec Loss 27.1807 LearningRate 0.0796 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:43,401-Speed 11677.29 samples/sec Loss 27.1413 LearningRate 0.0801 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:46,979-Speed 11453.14 samples/sec Loss 27.0085 LearningRate 0.0805 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:50,600-Speed 11314.19 samples/sec Loss 26.8350 LearningRate 0.0810 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:54,182-Speed 11440.67 samples/sec Loss 26.7171 LearningRate 0.0815 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:47:58,047-Speed 10599.57 samples/sec Loss 26.5958 LearningRate 0.0820 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:01,582-Speed 11593.13 samples/sec Loss 26.4823 LearningRate 0.0825 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:05,282-Speed 11075.83 samples/sec Loss 26.3976 LearningRate 0.0829 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:48:08,806-Speed 11624.29 samples/sec Loss 26.2138 LearningRate 0.0834 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:48:12,695-Speed 10537.42 samples/sec Loss 26.2475 LearningRate 0.0839 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:48:16,346-Speed 11221.77 samples/sec Loss 26.0889 LearningRate 0.0844 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:48:19,748-Speed 12042.05 samples/sec Loss 26.0007 LearningRate 0.0849 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:23,335-Speed 11423.75 samples/sec Loss 25.8391 LearningRate 0.0853 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:27,045-Speed 11046.34 samples/sec Loss 25.7382 LearningRate 0.0858 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:30,531-Speed 11752.70 samples/sec Loss 25.5678 LearningRate 0.0863 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:34,148-Speed 11329.22 samples/sec Loss 25.5318 LearningRate 0.0868 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:37,881-Speed 10974.14 samples/sec Loss 25.3771 LearningRate 0.0872 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:41,443-Speed 11504.35 samples/sec Loss 25.2610 LearningRate 0.0877 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:44,901-Speed 11850.03 samples/sec Loss 25.1745 LearningRate 0.0882 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:48,659-Speed 10903.14 samples/sec Loss 24.9896 LearningRate 0.0887 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:52,392-Speed 10974.11 samples/sec Loss 24.9556 LearningRate 0.0892 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:48:56,559-Speed 9832.68 samples/sec Loss 24.7946 LearningRate 0.0896 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:49:00,140-Speed 11443.06 samples/sec Loss 24.6956 LearningRate 0.0901 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:03,667-Speed 11618.84 samples/sec Loss 24.6839 LearningRate 0.0906 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:07,397-Speed 10984.29 samples/sec Loss 24.5547 LearningRate 0.0911 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:10,971-Speed 11464.82 samples/sec Loss 24.3205 LearningRate 0.0916 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:14,405-Speed 11931.64 samples/sec Loss 24.3098 LearningRate 0.0920 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:18,038-Speed 11280.58 samples/sec Loss 24.0780 LearningRate 0.0925 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:21,700-Speed 11187.23 samples/sec Loss 24.1427 LearningRate 0.0930 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:25,173-Speed 11798.45 samples/sec Loss 23.9191 LearningRate 0.0935 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:28,706-Speed 11597.07 samples/sec Loss 23.8516 LearningRate 0.0940 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:32,171-Speed 11830.99 samples/sec Loss 23.8094 LearningRate 0.0944 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:35,883-Speed 11039.87 samples/sec Loss 23.5968 LearningRate 0.0949 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:49:39,491-Speed 11355.87 samples/sec Loss 23.4728 LearningRate 0.0954 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:43,284-Speed 10802.98 samples/sec Loss 23.4163 LearningRate 0.0959 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:46,815-Speed 11603.91 samples/sec Loss 23.2996 LearningRate 0.0964 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:50,386-Speed 11472.80 samples/sec Loss 23.2414 LearningRate 0.0968 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:55,031-Speed 8819.80 samples/sec Loss 23.1124 LearningRate 0.0973 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:49:58,576-Speed 11557.74 samples/sec Loss 23.0064 LearningRate 0.0978 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:50:02,132-Speed 11523.32 samples/sec Loss 23.0090 LearningRate 0.0983 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:50:06,285-Speed 9865.70 samples/sec Loss 22.9111 LearningRate 0.0988 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:50:09,718-Speed 11936.75 samples/sec Loss 22.8334 LearningRate 0.0992 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:50:13,196-Speed 11783.02 samples/sec Loss 22.6111 LearningRate 0.0997 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:50:16,969-Speed 10859.53 samples/sec Loss 22.4912 LearningRate 0.1002 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:20,355-Speed 12101.14 samples/sec Loss 22.4693 LearningRate 0.1007 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:23,776-Speed 11974.16 samples/sec Loss 22.3609 LearningRate 0.1012 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:27,232-Speed 11855.91 samples/sec Loss 22.3199 LearningRate 0.1016 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:31,023-Speed 10807.04 samples/sec Loss 22.2078 LearningRate 0.1021 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:34,610-Speed 11425.70 samples/sec Loss 22.0648 LearningRate 0.1026 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:38,309-Speed 11076.90 samples/sec Loss 21.9396 LearningRate 0.1031 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:41,741-Speed 11937.41 samples/sec Loss 21.8251 LearningRate 0.1035 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:45,218-Speed 11783.37 samples/sec Loss 21.7891 LearningRate 0.1040 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:48,996-Speed 10847.80 samples/sec Loss 21.6891 LearningRate 0.1045 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:53,717-Speed 8677.99 samples/sec Loss 21.5512 LearningRate 0.1050 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:50:57,335-Speed 11327.56 samples/sec Loss 21.5262 LearningRate 0.1055 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:00,977-Speed 11249.11 samples/sec Loss 21.4408 LearningRate 0.1059 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:04,747-Speed 10869.63 samples/sec Loss 21.3360 LearningRate 0.1064 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:08,616-Speed 10589.94 samples/sec Loss 21.2206 LearningRate 0.1069 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:12,051-Speed 11928.48 samples/sec Loss 21.2713 LearningRate 0.1074 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:15,610-Speed 11512.75 samples/sec Loss 21.1252 LearningRate 0.1079 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:19,081-Speed 11803.20 samples/sec Loss 21.0767 LearningRate 0.1083 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:23,013-Speed 10420.12 samples/sec Loss 20.9535 LearningRate 0.1088 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:26,487-Speed 11795.80 samples/sec Loss 20.9344 LearningRate 0.1093 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:30,154-Speed 11173.93 samples/sec Loss 20.8220 LearningRate 0.1098 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:33,838-Speed 11121.10 samples/sec Loss 20.7208 LearningRate 0.1103 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:37,400-Speed 11501.37 samples/sec Loss 20.6016 LearningRate 0.1107 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:41,568-Speed 9830.74 samples/sec Loss 20.6299 LearningRate 0.1112 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:45,282-Speed 11032.63 samples/sec Loss 20.5897 LearningRate 0.1117 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:48,952-Speed 11166.03 samples/sec Loss 20.4563 LearningRate 0.1122 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:52,616-Speed 11181.17 samples/sec Loss 20.3643 LearningRate 0.1127 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:51:56,904-Speed 9555.99 samples/sec Loss 20.1409 LearningRate 0.1131 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:00,564-Speed 11194.79 samples/sec Loss 20.1297 LearningRate 0.1136 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:04,266-Speed 11067.69 samples/sec Loss 20.0840 LearningRate 0.1141 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:07,646-Speed 12122.46 samples/sec Loss 20.0530 LearningRate 0.1146 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:11,062-Speed 11995.79 samples/sec Loss 19.9232 LearningRate 0.1151 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:14,613-Speed 11537.75 samples/sec Loss 19.9146 LearningRate 0.1155 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:18,067-Speed 11862.73 samples/sec Loss 19.7975 LearningRate 0.1160 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:21,441-Speed 12144.67 samples/sec Loss 19.7352 LearningRate 0.1165 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:24,880-Speed 11911.62 samples/sec Loss 19.6854 LearningRate 0.1170 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:28,316-Speed 11928.69 samples/sec Loss 19.5690 LearningRate 0.1174 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:31,769-Speed 11863.68 samples/sec Loss 19.4717 LearningRate 0.1179 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:52:35,362-Speed 11403.44 samples/sec Loss 19.4301 LearningRate 0.1184 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:52:39,025-Speed 11185.98 samples/sec Loss 19.4296 LearningRate 0.1189 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:52:42,472-Speed 11886.08 samples/sec Loss 19.2574 LearningRate 0.1194 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:52:45,938-Speed 11822.74 samples/sec Loss 19.1953 LearningRate 0.1198 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:52:49,728-Speed 10811.84 samples/sec Loss 19.1533 LearningRate 0.1203 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:52:53,536-Speed 10759.38 samples/sec Loss 19.0609 LearningRate 0.1208 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:52:57,392-Speed 10624.22 samples/sec Loss 19.0250 LearningRate 0.1213 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:53:00,913-Speed 11636.03 samples/sec Loss 19.0224 LearningRate 0.1218 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:53:04,584-Speed 11160.84 samples/sec Loss 18.9215 LearningRate 0.1222 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:53:08,132-Speed 11550.92 samples/sec Loss 18.8533 LearningRate 0.1227 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 21:53:12,045-Speed 10470.24 samples/sec Loss 18.7685 LearningRate 0.1232 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:15,873-Speed 10701.91 samples/sec Loss 18.7435 LearningRate 0.1237 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:19,790-Speed 10464.16 samples/sec Loss 18.6499 LearningRate 0.1242 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:23,574-Speed 10825.68 samples/sec Loss 18.6078 LearningRate 0.1246 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:27,322-Speed 10934.61 samples/sec Loss 18.4294 LearningRate 0.1251 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:31,035-Speed 11034.01 samples/sec Loss 18.3806 LearningRate 0.1256 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:34,740-Speed 11058.11 samples/sec Loss 18.4165 LearningRate 0.1261 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:38,302-Speed 11503.71 samples/sec Loss 18.3653 LearningRate 0.1266 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:41,985-Speed 11131.62 samples/sec Loss 18.3275 LearningRate 0.1270 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:45,456-Speed 11806.23 samples/sec Loss 18.1427 LearningRate 0.1275 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:48,892-Speed 11924.27 samples/sec Loss 18.1810 LearningRate 0.1280 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:52,482-Speed 11411.37 samples/sec Loss 18.0548 LearningRate 0.1285 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:53:55,857-Speed 12139.80 samples/sec Loss 18.0337 LearningRate 0.1290 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:00,236-Speed 9357.58 samples/sec Loss 18.0339 LearningRate 0.1294 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:03,908-Speed 11158.55 samples/sec Loss 17.9259 LearningRate 0.1299 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:07,493-Speed 11429.07 samples/sec Loss 17.8716 LearningRate 0.1304 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:10,998-Speed 11688.02 samples/sec Loss 17.7888 LearningRate 0.1309 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:14,659-Speed 11192.84 samples/sec Loss 17.7777 LearningRate 0.1314 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:18,267-Speed 11354.53 samples/sec Loss 17.6906 LearningRate 0.1318 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:22,120-Speed 10633.50 samples/sec Loss 17.7667 LearningRate 0.1323 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:25,941-Speed 10724.24 samples/sec Loss 17.6648 LearningRate 0.1328 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:29,513-Speed 11468.67 samples/sec Loss 17.5021 LearningRate 0.1333 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:33,117-Speed 11371.17 samples/sec Loss 17.5286 LearningRate 0.1337 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:36,964-Speed 10648.71 samples/sec Loss 17.4829 LearningRate 0.1342 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:40,521-Speed 11520.33 samples/sec Loss 17.4353 LearningRate 0.1347 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:44,533-Speed 10212.73 samples/sec Loss 17.3202 LearningRate 0.1352 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:48,078-Speed 11556.35 samples/sec Loss 17.3400 LearningRate 0.1357 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:51,535-Speed 11854.00 samples/sec Loss 17.2323 LearningRate 0.1361 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:55,169-Speed 11274.07 samples/sec Loss 17.2184 LearningRate 0.1366 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:54:58,654-Speed 11842.17 samples/sec Loss 17.1964 LearningRate 0.1371 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:02,856-Speed 9751.07 samples/sec Loss 17.1209 LearningRate 0.1376 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 21:55:06,301-Speed 11894.00 samples/sec Loss 17.1422 LearningRate 0.1381 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 21:55:10,249-Speed 10377.13 samples/sec Loss 16.9807 LearningRate 0.1385 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:13,699-Speed 11876.36 samples/sec Loss 17.0301 LearningRate 0.1390 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:17,159-Speed 11840.53 samples/sec Loss 16.9020 LearningRate 0.1395 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:20,854-Speed 11088.86 samples/sec Loss 16.9289 LearningRate 0.1400 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:24,720-Speed 10599.78 samples/sec Loss 16.8738 LearningRate 0.1405 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:28,558-Speed 10673.92 samples/sec Loss 16.8823 LearningRate 0.1409 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:31,998-Speed 11913.58 samples/sec Loss 16.8408 LearningRate 0.1414 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:35,529-Speed 11603.41 samples/sec Loss 16.6585 LearningRate 0.1419 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:39,236-Speed 11049.96 samples/sec Loss 16.6915 LearningRate 0.1424 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:42,790-Speed 11529.76 samples/sec Loss 16.5763 LearningRate 0.1429 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:46,763-Speed 10314.82 samples/sec Loss 16.6411 LearningRate 0.1433 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:50,406-Speed 11245.81 samples/sec Loss 16.5517 LearningRate 0.1438 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 21:55:53,863-Speed 11850.92 samples/sec Loss 16.5305 LearningRate 0.1443 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:55:57,289-Speed 11959.15 samples/sec Loss 16.4616 LearningRate 0.1448 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:00,774-Speed 11758.72 samples/sec Loss 16.4637 LearningRate 0.1453 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:05,673-Speed 8363.52 samples/sec Loss 16.3009 LearningRate 0.1457 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:09,285-Speed 11343.22 samples/sec Loss 16.3622 LearningRate 0.1462 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:12,780-Speed 11724.52 samples/sec Loss 16.3207 LearningRate 0.1467 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:16,308-Speed 11614.68 samples/sec Loss 16.3032 LearningRate 0.1472 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:19,998-Speed 11102.68 samples/sec Loss 16.2289 LearningRate 0.1477 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:23,563-Speed 11492.87 samples/sec Loss 16.1925 LearningRate 0.1481 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 21:56:27,189-Speed 11298.89 samples/sec Loss 16.1152 LearningRate 0.1486 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:30,667-Speed 11782.70 samples/sec Loss 16.1275 LearningRate 0.1491 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:34,588-Speed 10447.75 samples/sec Loss 16.1162 LearningRate 0.1496 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:38,125-Speed 11583.63 samples/sec Loss 16.0331 LearningRate 0.1500 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:41,765-Speed 11257.90 samples/sec Loss 15.9818 LearningRate 0.1505 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:45,490-Speed 11000.45 samples/sec Loss 15.9504 LearningRate 0.1510 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:48,989-Speed 11711.84 samples/sec Loss 15.9721 LearningRate 0.1515 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:52,401-Speed 12005.28 samples/sec Loss 15.9905 LearningRate 0.1520 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:56,231-Speed 10698.69 samples/sec Loss 15.8727 LearningRate 0.1524 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:56:59,946-Speed 11030.29 samples/sec Loss 15.8064 LearningRate 0.1529 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:03,896-Speed 10371.84 samples/sec Loss 15.8514 LearningRate 0.1534 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 21:57:07,988-Speed 10012.68 samples/sec Loss 15.7928 LearningRate 0.1539 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 21:57:11,453-Speed 11826.82 samples/sec Loss 15.7704 LearningRate 0.1544 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:15,475-Speed 10187.21 samples/sec Loss 15.6698 LearningRate 0.1548 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:19,422-Speed 10380.70 samples/sec Loss 15.6920 LearningRate 0.1553 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:23,313-Speed 10530.01 samples/sec Loss 15.6475 LearningRate 0.1558 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:27,251-Speed 10404.32 samples/sec Loss 15.5491 LearningRate 0.1563 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:30,751-Speed 11705.06 samples/sec Loss 15.5482 LearningRate 0.1568 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:34,707-Speed 10355.85 samples/sec Loss 15.4161 LearningRate 0.1572 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:38,460-Speed 10919.42 samples/sec Loss 15.4097 LearningRate 0.1577 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:42,151-Speed 11102.78 samples/sec Loss 15.4201 LearningRate 0.1582 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:46,211-Speed 10104.98 samples/sec Loss 15.4490 LearningRate 0.1587 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:57:49,735-Speed 11627.04 samples/sec Loss 15.4073 LearningRate 0.1592 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 21:57:53,332-Speed 11391.07 samples/sec Loss 15.3902 LearningRate 0.1596 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 21:57:56,848-Speed 11652.35 samples/sec Loss 15.2875 LearningRate 0.1601 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:00,408-Speed 11510.68 samples/sec Loss 15.2426 LearningRate 0.1606 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:03,975-Speed 11489.93 samples/sec Loss 15.1720 LearningRate 0.1611 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:07,469-Speed 11728.03 samples/sec Loss 15.1550 LearningRate 0.1616 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:11,373-Speed 10494.41 samples/sec Loss 15.0932 LearningRate 0.1620 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:14,846-Speed 11799.26 samples/sec Loss 15.1672 LearningRate 0.1625 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:19,145-Speed 9529.99 samples/sec Loss 15.2173 LearningRate 0.1630 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:22,837-Speed 11097.39 samples/sec Loss 15.0609 LearningRate 0.1635 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:26,285-Speed 11884.64 samples/sec Loss 15.0431 LearningRate 0.1640 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:29,844-Speed 11512.23 samples/sec Loss 15.1235 LearningRate 0.1644 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:33,213-Speed 12159.78 samples/sec Loss 15.0214 LearningRate 0.1649 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 21:58:36,787-Speed 11463.93 samples/sec Loss 14.9797 LearningRate 0.1654 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:40,356-Speed 11483.65 samples/sec Loss 14.9778 LearningRate 0.1659 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:43,850-Speed 11725.68 samples/sec Loss 14.9119 LearningRate 0.1663 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:47,484-Speed 11275.40 samples/sec Loss 14.9685 LearningRate 0.1668 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:51,018-Speed 11594.26 samples/sec Loss 14.9001 LearningRate 0.1673 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:54,671-Speed 11216.62 samples/sec Loss 14.7943 LearningRate 0.1678 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:58:58,478-Speed 10762.88 samples/sec Loss 14.7676 LearningRate 0.1683 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:02,106-Speed 11291.43 samples/sec Loss 14.7332 LearningRate 0.1687 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:05,692-Speed 11439.14 samples/sec Loss 14.8226 LearningRate 0.1692 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:09,586-Speed 10522.28 samples/sec Loss 14.7853 LearningRate 0.1697 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:13,228-Speed 11249.79 samples/sec Loss 14.7891 LearningRate 0.1702 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:16,719-Speed 11734.48 samples/sec Loss 14.6969 LearningRate 0.1707 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:20,570-Speed 10641.13 samples/sec Loss 14.6116 LearningRate 0.1711 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:24,034-Speed 11826.53 samples/sec Loss 14.6296 LearningRate 0.1716 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:27,576-Speed 11569.40 samples/sec Loss 14.6283 LearningRate 0.1721 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:31,285-Speed 11046.75 samples/sec Loss 14.6339 LearningRate 0.1726 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:35,335-Speed 10116.63 samples/sec Loss 14.5660 LearningRate 0.1731 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:38,806-Speed 11803.59 samples/sec Loss 14.5879 LearningRate 0.1735 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:42,435-Speed 11291.73 samples/sec Loss 14.5755 LearningRate 0.1740 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:45,941-Speed 11685.63 samples/sec Loss 14.5156 LearningRate 0.1745 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:49,723-Speed 10835.76 samples/sec Loss 14.4412 LearningRate 0.1750 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 21:59:53,191-Speed 11813.62 samples/sec Loss 14.3924 LearningRate 0.1755 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 21:59:56,625-Speed 11940.73 samples/sec Loss 14.3660 LearningRate 0.1759 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:00,092-Speed 11818.69 samples/sec Loss 14.4460 LearningRate 0.1764 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:03,748-Speed 11207.68 samples/sec Loss 14.3845 LearningRate 0.1769 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:07,256-Speed 11678.07 samples/sec Loss 14.3606 LearningRate 0.1774 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:11,036-Speed 10838.71 samples/sec Loss 14.3182 LearningRate 0.1779 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:14,764-Speed 10990.54 samples/sec Loss 14.2959 LearningRate 0.1783 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:19,378-Speed 8880.92 samples/sec Loss 14.3603 LearningRate 0.1788 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:23,192-Speed 10743.35 samples/sec Loss 14.2379 LearningRate 0.1793 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:26,757-Speed 11491.35 samples/sec Loss 14.1720 LearningRate 0.1798 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:30,359-Speed 11373.98 samples/sec Loss 14.2620 LearningRate 0.1802 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:34,225-Speed 10600.24 samples/sec Loss 14.2221 LearningRate 0.1807 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:37,861-Speed 11268.66 samples/sec Loss 14.1258 LearningRate 0.1812 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:41,489-Speed 11294.32 samples/sec Loss 14.1479 LearningRate 0.1817 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:45,481-Speed 10264.32 samples/sec Loss 14.0170 LearningRate 0.1822 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:49,212-Speed 10980.27 samples/sec Loss 14.0951 LearningRate 0.1826 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:53,175-Speed 10337.71 samples/sec Loss 14.0039 LearningRate 0.1831 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:00:57,114-Speed 10400.72 samples/sec Loss 14.0569 LearningRate 0.1836 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:00,556-Speed 11908.77 samples/sec Loss 14.0392 LearningRate 0.1841 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:03,968-Speed 12009.94 samples/sec Loss 13.9454 LearningRate 0.1846 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:07,577-Speed 11355.87 samples/sec Loss 13.9335 LearningRate 0.1850 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:01:11,218-Speed 11253.78 samples/sec Loss 13.9882 LearningRate 0.1855 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:01:14,982-Speed 10884.56 samples/sec Loss 13.8502 LearningRate 0.1860 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:18,570-Speed 11419.95 samples/sec Loss 13.8308 LearningRate 0.1865 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:22,233-Speed 11186.17 samples/sec Loss 13.8917 LearningRate 0.1870 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:27,060-Speed 8489.12 samples/sec Loss 13.8573 LearningRate 0.1874 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:30,618-Speed 11516.08 samples/sec Loss 13.8853 LearningRate 0.1879 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:34,136-Speed 11646.97 samples/sec Loss 13.7871 LearningRate 0.1884 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:37,802-Speed 11175.81 samples/sec Loss 13.7902 LearningRate 0.1889 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:41,414-Speed 11345.02 samples/sec Loss 13.7757 LearningRate 0.1894 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:44,895-Speed 11770.90 samples/sec Loss 13.7343 LearningRate 0.1898 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:48,500-Speed 11365.49 samples/sec Loss 13.6751 LearningRate 0.1903 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:01:52,162-Speed 11189.44 samples/sec Loss 13.7063 LearningRate 0.1908 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:01:55,596-Speed 11929.90 samples/sec Loss 13.6697 LearningRate 0.1913 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:01:59,226-Speed 11287.24 samples/sec Loss 13.6585 LearningRate 0.1918 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:02,930-Speed 11062.23 samples/sec Loss 13.6314 LearningRate 0.1922 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:06,373-Speed 11900.70 samples/sec Loss 13.6734 LearningRate 0.1927 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:09,819-Speed 11892.26 samples/sec Loss 13.6237 LearningRate 0.1932 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:13,472-Speed 11215.03 samples/sec Loss 13.6847 LearningRate 0.1937 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:17,102-Speed 11284.76 samples/sec Loss 13.5883 LearningRate 0.1942 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:20,489-Speed 12101.17 samples/sec Loss 13.5595 LearningRate 0.1946 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:24,072-Speed 11432.38 samples/sec Loss 13.6155 LearningRate 0.1951 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:28,297-Speed 9698.29 samples/sec Loss 13.5233 LearningRate 0.1956 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:32,020-Speed 11006.42 samples/sec Loss 13.4963 LearningRate 0.1961 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:35,686-Speed 11175.47 samples/sec Loss 13.4453 LearningRate 0.1965 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:02:39,128-Speed 11907.06 samples/sec Loss 13.4856 LearningRate 0.1970 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:42,758-Speed 11285.73 samples/sec Loss 13.4743 LearningRate 0.1975 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:46,556-Speed 10787.53 samples/sec Loss 13.4227 LearningRate 0.1980 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:50,334-Speed 10843.57 samples/sec Loss 13.4446 LearningRate 0.1985 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:54,107-Speed 10859.34 samples/sec Loss 13.3872 LearningRate 0.1989 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:02:57,758-Speed 11224.37 samples/sec Loss 13.3828 LearningRate 0.1994 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:03:01,923-Speed 9836.21 samples/sec Loss 13.3528 LearningRate 0.1999 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:03:44,591-Speed 960.01 samples/sec Loss 12.6209 LearningRate 0.2004 Epoch: 1 Global Step: 4180 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:03:48,266-Speed 11150.93 samples/sec Loss 12.3951 LearningRate 0.2009 Epoch: 1 Global Step: 4190 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:03:52,903-Speed 8835.67 samples/sec Loss 12.3900 LearningRate 0.2013 Epoch: 1 Global Step: 4200 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:03:56,663-Speed 10899.72 samples/sec Loss 12.3884 LearningRate 0.2018 Epoch: 1 Global Step: 4210 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:04:00,424-Speed 10892.98 samples/sec Loss 12.4746 LearningRate 0.2023 Epoch: 1 Global Step: 4220 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:04:04,542-Speed 9948.10 samples/sec Loss 12.4176 LearningRate 0.2028 Epoch: 1 Global Step: 4230 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:04:08,182-Speed 11259.52 samples/sec Loss 12.4475 LearningRate 0.2033 Epoch: 1 Global Step: 4240 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:04:11,970-Speed 10817.32 samples/sec Loss 12.4541 LearningRate 0.2037 Epoch: 1 Global Step: 4250 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:04:15,387-Speed 11990.29 samples/sec Loss 12.4073 LearningRate 0.2042 Epoch: 1 Global Step: 4260 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:04:19,301-Speed 10468.43 samples/sec Loss 12.4048 LearningRate 0.2047 Epoch: 1 Global Step: 4270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:22,857-Speed 11521.34 samples/sec Loss 12.4624 LearningRate 0.2052 Epoch: 1 Global Step: 4280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:26,544-Speed 11113.98 samples/sec Loss 12.4624 LearningRate 0.2057 Epoch: 1 Global Step: 4290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:30,876-Speed 9457.48 samples/sec Loss 12.4679 LearningRate 0.2061 Epoch: 1 Global Step: 4300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:34,361-Speed 11755.55 samples/sec Loss 12.4460 LearningRate 0.2066 Epoch: 1 Global Step: 4310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:38,121-Speed 10898.10 samples/sec Loss 12.4603 LearningRate 0.2071 Epoch: 1 Global Step: 4320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:41,604-Speed 11765.21 samples/sec Loss 12.4989 LearningRate 0.2076 Epoch: 1 Global Step: 4330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:45,073-Speed 11813.30 samples/sec Loss 12.3982 LearningRate 0.2081 Epoch: 1 Global Step: 4340 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:04:48,791-Speed 11018.97 samples/sec Loss 12.4511 LearningRate 0.2085 Epoch: 1 Global Step: 4350 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:04:52,407-Speed 11331.36 samples/sec Loss 12.4082 LearningRate 0.2090 Epoch: 1 Global Step: 4360 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:04:55,949-Speed 11568.93 samples/sec Loss 12.4909 LearningRate 0.2095 Epoch: 1 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:04:59,609-Speed 11194.80 samples/sec Loss 12.3902 LearningRate 0.2100 Epoch: 1 Global Step: 4380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:03,159-Speed 11544.53 samples/sec Loss 12.4819 LearningRate 0.2105 Epoch: 1 Global Step: 4390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:06,747-Speed 11417.87 samples/sec Loss 12.4520 LearningRate 0.2109 Epoch: 1 Global Step: 4400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:10,394-Speed 11234.73 samples/sec Loss 12.4913 LearningRate 0.2114 Epoch: 1 Global Step: 4410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:14,094-Speed 11075.87 samples/sec Loss 12.4621 LearningRate 0.2119 Epoch: 1 Global Step: 4420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:17,816-Speed 11011.45 samples/sec Loss 12.3871 LearningRate 0.2124 Epoch: 1 Global Step: 4430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:21,344-Speed 11612.49 samples/sec Loss 12.4404 LearningRate 0.2128 Epoch: 1 Global Step: 4440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:25,023-Speed 11140.09 samples/sec Loss 12.4168 LearningRate 0.2133 Epoch: 1 Global Step: 4450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:28,469-Speed 11888.83 samples/sec Loss 12.5047 LearningRate 0.2138 Epoch: 1 Global Step: 4460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:32,146-Speed 11143.96 samples/sec Loss 12.4481 LearningRate 0.2143 Epoch: 1 Global Step: 4470 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:05:35,803-Speed 11203.81 samples/sec Loss 12.3958 LearningRate 0.2148 Epoch: 1 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:39,757-Speed 10362.49 samples/sec Loss 12.3574 LearningRate 0.2152 Epoch: 1 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:43,639-Speed 10553.98 samples/sec Loss 12.3668 LearningRate 0.2157 Epoch: 1 Global Step: 4500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:47,208-Speed 11478.36 samples/sec Loss 12.4622 LearningRate 0.2162 Epoch: 1 Global Step: 4510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:51,016-Speed 10760.62 samples/sec Loss 12.4042 LearningRate 0.2167 Epoch: 1 Global Step: 4520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:54,953-Speed 10407.62 samples/sec Loss 12.3744 LearningRate 0.2172 Epoch: 1 Global Step: 4530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:05:58,712-Speed 10898.65 samples/sec Loss 12.3800 LearningRate 0.2176 Epoch: 1 Global Step: 4540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:02,542-Speed 10699.42 samples/sec Loss 12.3880 LearningRate 0.2181 Epoch: 1 Global Step: 4550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:06,107-Speed 11490.76 samples/sec Loss 12.3571 LearningRate 0.2186 Epoch: 1 Global Step: 4560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:09,973-Speed 10599.54 samples/sec Loss 12.4160 LearningRate 0.2191 Epoch: 1 Global Step: 4570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:13,486-Speed 11664.22 samples/sec Loss 12.3675 LearningRate 0.2196 Epoch: 1 Global Step: 4580 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:06:17,366-Speed 10561.28 samples/sec Loss 12.3726 LearningRate 0.2200 Epoch: 1 Global Step: 4590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:21,532-Speed 9833.78 samples/sec Loss 12.3441 LearningRate 0.2205 Epoch: 1 Global Step: 4600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:25,150-Speed 11325.34 samples/sec Loss 12.2711 LearningRate 0.2210 Epoch: 1 Global Step: 4610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:28,746-Speed 11394.69 samples/sec Loss 12.3476 LearningRate 0.2215 Epoch: 1 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:32,481-Speed 10970.52 samples/sec Loss 12.3445 LearningRate 0.2220 Epoch: 1 Global Step: 4630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:36,075-Speed 11399.44 samples/sec Loss 12.3612 LearningRate 0.2224 Epoch: 1 Global Step: 4640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:39,714-Speed 11259.01 samples/sec Loss 12.3481 LearningRate 0.2229 Epoch: 1 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:43,357-Speed 11245.48 samples/sec Loss 12.3678 LearningRate 0.2234 Epoch: 1 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:46,830-Speed 11799.87 samples/sec Loss 12.3371 LearningRate 0.2239 Epoch: 1 Global Step: 4670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:50,351-Speed 11636.82 samples/sec Loss 12.3084 LearningRate 0.2244 Epoch: 1 Global Step: 4680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:06:54,202-Speed 10641.43 samples/sec Loss 12.2410 LearningRate 0.2248 Epoch: 1 Global Step: 4690 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:06:57,664-Speed 11836.24 samples/sec Loss 12.3155 LearningRate 0.2253 Epoch: 1 Global Step: 4700 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:07:01,517-Speed 10633.21 samples/sec Loss 12.2962 LearningRate 0.2258 Epoch: 1 Global Step: 4710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:05,156-Speed 11261.93 samples/sec Loss 12.3402 LearningRate 0.2263 Epoch: 1 Global Step: 4720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:08,687-Speed 11648.90 samples/sec Loss 12.3421 LearningRate 0.2267 Epoch: 1 Global Step: 4730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:12,968-Speed 9572.46 samples/sec Loss 12.2762 LearningRate 0.2272 Epoch: 1 Global Step: 4740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:16,431-Speed 11830.15 samples/sec Loss 12.3072 LearningRate 0.2277 Epoch: 1 Global Step: 4750 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:19,893-Speed 11836.98 samples/sec Loss 12.2855 LearningRate 0.2282 Epoch: 1 Global Step: 4760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:23,297-Speed 12035.12 samples/sec Loss 12.2149 LearningRate 0.2287 Epoch: 1 Global Step: 4770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:26,983-Speed 11116.87 samples/sec Loss 12.2374 LearningRate 0.2291 Epoch: 1 Global Step: 4780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:30,516-Speed 11596.65 samples/sec Loss 12.2686 LearningRate 0.2296 Epoch: 1 Global Step: 4790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:34,548-Speed 10163.50 samples/sec Loss 12.2656 LearningRate 0.2301 Epoch: 1 Global Step: 4800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:38,215-Speed 11174.05 samples/sec Loss 12.3117 LearningRate 0.2306 Epoch: 1 Global Step: 4810 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:07:41,722-Speed 11681.52 samples/sec Loss 12.2719 LearningRate 0.2311 Epoch: 1 Global Step: 4820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:45,471-Speed 10929.31 samples/sec Loss 12.2078 LearningRate 0.2315 Epoch: 1 Global Step: 4830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:49,540-Speed 10069.45 samples/sec Loss 12.2624 LearningRate 0.2320 Epoch: 1 Global Step: 4840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:53,664-Speed 9933.63 samples/sec Loss 12.2387 LearningRate 0.2325 Epoch: 1 Global Step: 4850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:07:57,425-Speed 10895.77 samples/sec Loss 12.1942 LearningRate 0.2330 Epoch: 1 Global Step: 4860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:00,948-Speed 11629.56 samples/sec Loss 12.2029 LearningRate 0.2335 Epoch: 1 Global Step: 4870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:04,980-Speed 10161.47 samples/sec Loss 12.2242 LearningRate 0.2339 Epoch: 1 Global Step: 4880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:08,517-Speed 11586.73 samples/sec Loss 12.1620 LearningRate 0.2344 Epoch: 1 Global Step: 4890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:12,091-Speed 11464.21 samples/sec Loss 12.2245 LearningRate 0.2349 Epoch: 1 Global Step: 4900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:16,451-Speed 9397.52 samples/sec Loss 12.1759 LearningRate 0.2354 Epoch: 1 Global Step: 4910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:20,012-Speed 11507.90 samples/sec Loss 12.1471 LearningRate 0.2359 Epoch: 1 Global Step: 4920 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:08:23,469-Speed 11851.81 samples/sec Loss 12.1871 LearningRate 0.2363 Epoch: 1 Global Step: 4930 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:08:27,149-Speed 11133.08 samples/sec Loss 12.1360 LearningRate 0.2368 Epoch: 1 Global Step: 4940 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:08:30,949-Speed 10782.65 samples/sec Loss 12.1740 LearningRate 0.2373 Epoch: 1 Global Step: 4950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:34,611-Speed 11190.49 samples/sec Loss 12.2240 LearningRate 0.2378 Epoch: 1 Global Step: 4960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:38,083-Speed 11802.30 samples/sec Loss 12.1478 LearningRate 0.2383 Epoch: 1 Global Step: 4970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:41,816-Speed 10975.05 samples/sec Loss 12.1347 LearningRate 0.2387 Epoch: 1 Global Step: 4980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:45,613-Speed 10790.05 samples/sec Loss 12.0933 LearningRate 0.2392 Epoch: 1 Global Step: 4990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:08:49,151-Speed 11581.91 samples/sec Loss 12.0744 LearningRate 0.2397 Epoch: 1 Global Step: 5000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:09:12,445-[lfw][5000]XNorm: 14.774082 Training: 2022-01-16 22:09:12,446-[lfw][5000]Accuracy-Flip: 0.99267+-0.00374 Training: 2022-01-16 22:09:12,446-[lfw][5000]Accuracy-Highest: 0.99267 Training: 2022-01-16 22:09:37,107-[cfp_fp][5000]XNorm: 12.635146 Training: 2022-01-16 22:09:37,108-[cfp_fp][5000]Accuracy-Flip: 0.91643+-0.01147 Training: 2022-01-16 22:09:37,110-[cfp_fp][5000]Accuracy-Highest: 0.91643 Training: 2022-01-16 22:09:58,185-[agedb_30][5000]XNorm: 14.352199 Training: 2022-01-16 22:09:58,186-[agedb_30][5000]Accuracy-Flip: 0.92617+-0.01747 Training: 2022-01-16 22:09:58,186-[agedb_30][5000]Accuracy-Highest: 0.92617 Training: 2022-01-16 22:10:01,581-Speed 565.52 samples/sec Loss 12.0653 LearningRate 0.2402 Epoch: 1 Global Step: 5010 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:04,935-Speed 12217.32 samples/sec Loss 12.0794 LearningRate 0.2407 Epoch: 1 Global Step: 5020 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:08,412-Speed 11784.36 samples/sec Loss 12.0835 LearningRate 0.2411 Epoch: 1 Global Step: 5030 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:12,351-Speed 10400.24 samples/sec Loss 12.0854 LearningRate 0.2416 Epoch: 1 Global Step: 5040 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:15,957-Speed 11364.69 samples/sec Loss 12.1373 LearningRate 0.2421 Epoch: 1 Global Step: 5050 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:10:19,706-Speed 10928.51 samples/sec Loss 12.0755 LearningRate 0.2426 Epoch: 1 Global Step: 5060 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:10:23,626-Speed 10453.40 samples/sec Loss 12.0865 LearningRate 0.2430 Epoch: 1 Global Step: 5070 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:10:27,451-Speed 10709.86 samples/sec Loss 12.0858 LearningRate 0.2435 Epoch: 1 Global Step: 5080 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:30,924-Speed 11800.07 samples/sec Loss 12.0915 LearningRate 0.2440 Epoch: 1 Global Step: 5090 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:34,391-Speed 11815.58 samples/sec Loss 12.0720 LearningRate 0.2445 Epoch: 1 Global Step: 5100 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:37,855-Speed 11830.36 samples/sec Loss 12.0557 LearningRate 0.2450 Epoch: 1 Global Step: 5110 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:41,286-Speed 11940.16 samples/sec Loss 12.1275 LearningRate 0.2454 Epoch: 1 Global Step: 5120 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:44,775-Speed 11746.45 samples/sec Loss 12.0489 LearningRate 0.2459 Epoch: 1 Global Step: 5130 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:48,353-Speed 11451.61 samples/sec Loss 12.0187 LearningRate 0.2464 Epoch: 1 Global Step: 5140 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:52,165-Speed 10748.10 samples/sec Loss 12.0896 LearningRate 0.2469 Epoch: 1 Global Step: 5150 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:55,982-Speed 10731.93 samples/sec Loss 12.0337 LearningRate 0.2474 Epoch: 1 Global Step: 5160 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:10:59,793-Speed 10751.12 samples/sec Loss 12.0006 LearningRate 0.2478 Epoch: 1 Global Step: 5170 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:04,189-Speed 9320.82 samples/sec Loss 11.9534 LearningRate 0.2483 Epoch: 1 Global Step: 5180 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:11:07,787-Speed 11387.73 samples/sec Loss 12.0367 LearningRate 0.2488 Epoch: 1 Global Step: 5190 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:11,389-Speed 11375.81 samples/sec Loss 11.9924 LearningRate 0.2493 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:15,003-Speed 11337.16 samples/sec Loss 12.0121 LearningRate 0.2498 Epoch: 1 Global Step: 5210 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:18,602-Speed 11382.60 samples/sec Loss 12.0352 LearningRate 0.2502 Epoch: 1 Global Step: 5220 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:22,523-Speed 10451.20 samples/sec Loss 11.9325 LearningRate 0.2507 Epoch: 1 Global Step: 5230 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:26,073-Speed 11543.08 samples/sec Loss 12.0179 LearningRate 0.2512 Epoch: 1 Global Step: 5240 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:29,536-Speed 11831.11 samples/sec Loss 11.9287 LearningRate 0.2517 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:33,423-Speed 10542.76 samples/sec Loss 11.9649 LearningRate 0.2522 Epoch: 1 Global Step: 5260 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:37,003-Speed 11443.60 samples/sec Loss 11.8881 LearningRate 0.2526 Epoch: 1 Global Step: 5270 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:40,864-Speed 10612.85 samples/sec Loss 11.8760 LearningRate 0.2531 Epoch: 1 Global Step: 5280 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:44,628-Speed 10885.85 samples/sec Loss 11.8287 LearningRate 0.2536 Epoch: 1 Global Step: 5290 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:48,460-Speed 10690.18 samples/sec Loss 11.9066 LearningRate 0.2541 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:52,045-Speed 11431.41 samples/sec Loss 11.9466 LearningRate 0.2546 Epoch: 1 Global Step: 5310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:55,570-Speed 11621.14 samples/sec Loss 11.9407 LearningRate 0.2550 Epoch: 1 Global Step: 5320 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:11:59,039-Speed 11812.42 samples/sec Loss 11.8945 LearningRate 0.2555 Epoch: 1 Global Step: 5330 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:02,448-Speed 12022.03 samples/sec Loss 11.8877 LearningRate 0.2560 Epoch: 1 Global Step: 5340 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:06,800-Speed 9412.74 samples/sec Loss 11.8752 LearningRate 0.2565 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:10,372-Speed 11471.96 samples/sec Loss 11.8222 LearningRate 0.2570 Epoch: 1 Global Step: 5360 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:13,896-Speed 11628.48 samples/sec Loss 11.8745 LearningRate 0.2574 Epoch: 1 Global Step: 5370 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:17,843-Speed 10381.25 samples/sec Loss 11.7931 LearningRate 0.2579 Epoch: 1 Global Step: 5380 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:22,171-Speed 9468.42 samples/sec Loss 11.8519 LearningRate 0.2584 Epoch: 1 Global Step: 5390 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:12:26,213-Speed 10136.20 samples/sec Loss 11.8392 LearningRate 0.2589 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:29,933-Speed 11014.18 samples/sec Loss 11.8884 LearningRate 0.2593 Epoch: 1 Global Step: 5410 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:34,014-Speed 10039.94 samples/sec Loss 11.7747 LearningRate 0.2598 Epoch: 1 Global Step: 5420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:37,685-Speed 11159.62 samples/sec Loss 11.8493 LearningRate 0.2603 Epoch: 1 Global Step: 5430 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:41,321-Speed 11269.40 samples/sec Loss 11.8374 LearningRate 0.2608 Epoch: 1 Global Step: 5440 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:44,774-Speed 11867.63 samples/sec Loss 11.8707 LearningRate 0.2613 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:48,718-Speed 10389.84 samples/sec Loss 11.8357 LearningRate 0.2617 Epoch: 1 Global Step: 5460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:52,289-Speed 11472.58 samples/sec Loss 11.7753 LearningRate 0.2622 Epoch: 1 Global Step: 5470 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:56,062-Speed 10861.62 samples/sec Loss 11.8659 LearningRate 0.2627 Epoch: 1 Global Step: 5480 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:12:59,936-Speed 10575.52 samples/sec Loss 11.7875 LearningRate 0.2632 Epoch: 1 Global Step: 5490 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:03,367-Speed 11942.13 samples/sec Loss 11.7969 LearningRate 0.2637 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:13:06,898-Speed 11602.54 samples/sec Loss 11.8049 LearningRate 0.2641 Epoch: 1 Global Step: 5510 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:13:10,842-Speed 10389.77 samples/sec Loss 11.8033 LearningRate 0.2646 Epoch: 1 Global Step: 5520 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:14,277-Speed 11927.85 samples/sec Loss 11.8722 LearningRate 0.2651 Epoch: 1 Global Step: 5530 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:17,776-Speed 11710.83 samples/sec Loss 11.7648 LearningRate 0.2656 Epoch: 1 Global Step: 5540 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:21,638-Speed 10608.18 samples/sec Loss 11.6876 LearningRate 0.2661 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:25,626-Speed 10273.49 samples/sec Loss 11.7662 LearningRate 0.2665 Epoch: 1 Global Step: 5560 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:29,125-Speed 11710.69 samples/sec Loss 11.6476 LearningRate 0.2670 Epoch: 1 Global Step: 5570 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:32,710-Speed 11428.23 samples/sec Loss 11.7213 LearningRate 0.2675 Epoch: 1 Global Step: 5580 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:36,333-Speed 11311.47 samples/sec Loss 11.7067 LearningRate 0.2680 Epoch: 1 Global Step: 5590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:39,973-Speed 11255.89 samples/sec Loss 11.6873 LearningRate 0.2685 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:43,514-Speed 11572.18 samples/sec Loss 11.6736 LearningRate 0.2689 Epoch: 1 Global Step: 5610 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:46,991-Speed 11784.14 samples/sec Loss 11.7654 LearningRate 0.2694 Epoch: 1 Global Step: 5620 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:50,816-Speed 10711.16 samples/sec Loss 11.7474 LearningRate 0.2699 Epoch: 1 Global Step: 5630 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:54,308-Speed 11732.79 samples/sec Loss 11.7150 LearningRate 0.2704 Epoch: 1 Global Step: 5640 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:13:58,295-Speed 10276.43 samples/sec Loss 11.6844 LearningRate 0.2709 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:01,745-Speed 11882.30 samples/sec Loss 11.7364 LearningRate 0.2713 Epoch: 1 Global Step: 5660 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:05,265-Speed 11640.50 samples/sec Loss 11.6828 LearningRate 0.2718 Epoch: 1 Global Step: 5670 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:09,721-Speed 9193.90 samples/sec Loss 11.6090 LearningRate 0.2723 Epoch: 1 Global Step: 5680 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:13,432-Speed 11041.20 samples/sec Loss 11.6313 LearningRate 0.2728 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:17,217-Speed 10824.90 samples/sec Loss 11.6149 LearningRate 0.2733 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:21,070-Speed 10637.02 samples/sec Loss 11.6851 LearningRate 0.2737 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:24,606-Speed 11588.98 samples/sec Loss 11.6336 LearningRate 0.2742 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 524288 Required: 9 hours Training: 2022-01-16 22:14:28,094-Speed 11747.21 samples/sec Loss 11.6616 LearningRate 0.2747 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:31,702-Speed 11356.70 samples/sec Loss 11.6132 LearningRate 0.2752 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:35,263-Speed 11506.99 samples/sec Loss 11.5869 LearningRate 0.2756 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:38,742-Speed 11774.35 samples/sec Loss 11.6889 LearningRate 0.2761 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 22:14:42,267-Speed 11623.71 samples/sec Loss 11.7056 LearningRate 0.2766 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:14:46,035-Speed 10875.66 samples/sec Loss 11.6018 LearningRate 0.2771 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:14:49,528-Speed 11730.76 samples/sec Loss 11.6754 LearningRate 0.2776 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:14:53,082-Speed 11528.31 samples/sec Loss 11.5537 LearningRate 0.2780 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:14:56,916-Speed 10685.97 samples/sec Loss 11.5324 LearningRate 0.2785 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:00,695-Speed 10843.09 samples/sec Loss 11.5681 LearningRate 0.2790 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:04,519-Speed 10714.95 samples/sec Loss 11.6223 LearningRate 0.2795 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:08,298-Speed 10842.30 samples/sec Loss 11.6370 LearningRate 0.2800 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:12,093-Speed 10796.50 samples/sec Loss 11.6375 LearningRate 0.2804 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:15,684-Speed 11409.62 samples/sec Loss 11.5392 LearningRate 0.2809 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:19,250-Speed 11490.35 samples/sec Loss 11.5803 LearningRate 0.2814 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:22,811-Speed 11505.69 samples/sec Loss 11.5806 LearningRate 0.2819 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:27,128-Speed 9492.30 samples/sec Loss 11.5413 LearningRate 0.2824 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:30,739-Speed 11346.27 samples/sec Loss 11.5665 LearningRate 0.2828 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:34,366-Speed 11295.69 samples/sec Loss 11.5554 LearningRate 0.2833 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:37,903-Speed 11585.09 samples/sec Loss 11.4226 LearningRate 0.2838 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:15:41,482-Speed 11451.44 samples/sec Loss 11.5162 LearningRate 0.2843 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:15:45,514-Speed 10161.63 samples/sec Loss 11.5200 LearningRate 0.2848 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:15:49,009-Speed 11723.72 samples/sec Loss 11.5506 LearningRate 0.2852 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:15:52,566-Speed 11518.15 samples/sec Loss 11.4910 LearningRate 0.2857 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:15:56,012-Speed 11891.22 samples/sec Loss 11.5042 LearningRate 0.2862 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:15:59,631-Speed 11320.61 samples/sec Loss 11.5249 LearningRate 0.2867 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:16:03,244-Speed 11340.31 samples/sec Loss 11.5247 LearningRate 0.2872 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:16:06,949-Speed 11058.98 samples/sec Loss 11.4949 LearningRate 0.2876 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:16:10,484-Speed 11593.49 samples/sec Loss 11.4835 LearningRate 0.2881 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:16:13,943-Speed 11845.35 samples/sec Loss 11.4617 LearningRate 0.2886 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:16:17,841-Speed 10510.77 samples/sec Loss 11.4365 LearningRate 0.2891 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:21,463-Speed 11312.86 samples/sec Loss 11.4709 LearningRate 0.2895 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:25,790-Speed 9470.43 samples/sec Loss 11.4305 LearningRate 0.2900 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:29,400-Speed 11348.45 samples/sec Loss 11.3896 LearningRate 0.2905 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:32,883-Speed 11766.26 samples/sec Loss 11.4860 LearningRate 0.2910 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:36,494-Speed 11345.83 samples/sec Loss 11.4490 LearningRate 0.2915 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:40,049-Speed 11525.69 samples/sec Loss 11.4785 LearningRate 0.2919 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:43,830-Speed 10836.53 samples/sec Loss 11.3965 LearningRate 0.2924 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:47,448-Speed 11326.48 samples/sec Loss 11.4020 LearningRate 0.2929 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:51,287-Speed 10673.68 samples/sec Loss 11.4148 LearningRate 0.2934 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:16:54,790-Speed 11695.75 samples/sec Loss 11.3999 LearningRate 0.2939 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:16:58,384-Speed 11400.90 samples/sec Loss 11.4709 LearningRate 0.2943 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:17:01,912-Speed 11611.69 samples/sec Loss 11.3876 LearningRate 0.2948 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:17:05,437-Speed 11626.09 samples/sec Loss 11.3703 LearningRate 0.2953 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:08,974-Speed 11583.13 samples/sec Loss 11.4372 LearningRate 0.2958 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:13,132-Speed 9853.77 samples/sec Loss 11.3861 LearningRate 0.2963 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:16,774-Speed 11248.18 samples/sec Loss 11.4028 LearningRate 0.2967 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:20,470-Speed 11088.08 samples/sec Loss 11.2985 LearningRate 0.2972 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:24,164-Speed 11092.92 samples/sec Loss 11.3793 LearningRate 0.2977 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:28,598-Speed 9240.72 samples/sec Loss 11.3385 LearningRate 0.2982 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:32,103-Speed 11688.24 samples/sec Loss 11.3943 LearningRate 0.2987 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:36,155-Speed 10112.41 samples/sec Loss 11.3324 LearningRate 0.2991 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:39,638-Speed 11767.87 samples/sec Loss 11.3639 LearningRate 0.2996 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:43,331-Speed 11095.88 samples/sec Loss 11.3267 LearningRate 0.3001 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:17:47,279-Speed 10377.38 samples/sec Loss 11.3460 LearningRate 0.3006 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:17:50,733-Speed 11862.76 samples/sec Loss 11.3012 LearningRate 0.3011 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:17:54,640-Speed 10486.77 samples/sec Loss 11.2965 LearningRate 0.3015 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:17:58,178-Speed 11580.57 samples/sec Loss 11.2342 LearningRate 0.3020 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:01,664-Speed 11753.56 samples/sec Loss 11.2895 LearningRate 0.3025 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:05,415-Speed 10924.38 samples/sec Loss 11.2728 LearningRate 0.3030 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:09,124-Speed 11045.20 samples/sec Loss 11.3049 LearningRate 0.3035 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:12,920-Speed 10795.63 samples/sec Loss 11.3717 LearningRate 0.3039 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:16,662-Speed 10950.92 samples/sec Loss 11.2869 LearningRate 0.3044 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:20,244-Speed 11435.70 samples/sec Loss 11.3114 LearningRate 0.3049 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:24,173-Speed 10430.49 samples/sec Loss 11.3347 LearningRate 0.3054 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:27,615-Speed 11903.84 samples/sec Loss 11.2281 LearningRate 0.3058 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:32,839-Speed 7841.16 samples/sec Loss 11.3022 LearningRate 0.3063 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:18:36,554-Speed 11031.21 samples/sec Loss 11.3080 LearningRate 0.3068 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:40,000-Speed 11889.17 samples/sec Loss 11.2880 LearningRate 0.3073 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:43,443-Speed 11907.13 samples/sec Loss 11.2875 LearningRate 0.3078 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:46,866-Speed 11970.90 samples/sec Loss 11.2906 LearningRate 0.3082 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:50,357-Speed 11735.68 samples/sec Loss 11.2587 LearningRate 0.3087 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:53,992-Speed 11272.33 samples/sec Loss 11.2286 LearningRate 0.3092 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:18:57,678-Speed 11117.58 samples/sec Loss 11.1943 LearningRate 0.3097 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:01,727-Speed 10117.33 samples/sec Loss 11.1589 LearningRate 0.3102 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:05,484-Speed 10907.73 samples/sec Loss 11.2043 LearningRate 0.3106 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:09,310-Speed 10710.27 samples/sec Loss 11.1835 LearningRate 0.3111 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:13,184-Speed 10576.32 samples/sec Loss 11.2342 LearningRate 0.3116 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:16,886-Speed 11067.29 samples/sec Loss 11.1453 LearningRate 0.3121 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:20,391-Speed 11688.17 samples/sec Loss 11.2459 LearningRate 0.3126 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:24,033-Speed 11252.78 samples/sec Loss 11.2040 LearningRate 0.3130 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:27,550-Speed 11647.89 samples/sec Loss 11.2637 LearningRate 0.3135 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:31,020-Speed 11808.41 samples/sec Loss 11.2019 LearningRate 0.3140 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:35,397-Speed 9359.37 samples/sec Loss 11.1644 LearningRate 0.3145 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:39,308-Speed 10477.90 samples/sec Loss 11.2178 LearningRate 0.3150 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:42,899-Speed 11408.81 samples/sec Loss 11.1727 LearningRate 0.3154 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:46,770-Speed 10586.30 samples/sec Loss 11.0926 LearningRate 0.3159 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:19:50,651-Speed 10554.75 samples/sec Loss 11.1704 LearningRate 0.3164 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:19:54,288-Speed 11267.37 samples/sec Loss 11.1731 LearningRate 0.3169 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:19:58,123-Speed 10682.77 samples/sec Loss 11.1979 LearningRate 0.3174 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:01,592-Speed 11813.94 samples/sec Loss 11.2368 LearningRate 0.3178 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:05,181-Speed 11416.13 samples/sec Loss 11.1170 LearningRate 0.3183 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:08,702-Speed 11636.11 samples/sec Loss 11.1856 LearningRate 0.3188 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:12,238-Speed 11585.82 samples/sec Loss 11.0950 LearningRate 0.3193 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:15,804-Speed 11491.22 samples/sec Loss 11.1506 LearningRate 0.3198 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:19,564-Speed 10897.21 samples/sec Loss 11.1020 LearningRate 0.3202 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:23,127-Speed 11498.76 samples/sec Loss 11.1001 LearningRate 0.3207 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:26,759-Speed 11282.28 samples/sec Loss 11.1786 LearningRate 0.3212 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:30,271-Speed 11710.78 samples/sec Loss 11.1019 LearningRate 0.3217 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:20:33,930-Speed 11198.09 samples/sec Loss 11.1361 LearningRate 0.3221 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:20:39,041-Speed 8015.69 samples/sec Loss 11.0418 LearningRate 0.3226 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:20:42,542-Speed 11703.69 samples/sec Loss 11.0763 LearningRate 0.3231 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:20:46,110-Speed 11483.19 samples/sec Loss 11.1203 LearningRate 0.3236 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:20:49,981-Speed 10584.92 samples/sec Loss 11.0442 LearningRate 0.3241 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:20:53,903-Speed 10446.09 samples/sec Loss 11.0655 LearningRate 0.3245 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:20:57,337-Speed 11934.17 samples/sec Loss 11.0822 LearningRate 0.3250 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:01,060-Speed 11003.38 samples/sec Loss 11.0615 LearningRate 0.3255 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:04,970-Speed 10479.28 samples/sec Loss 11.1347 LearningRate 0.3260 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:09,089-Speed 9947.51 samples/sec Loss 11.0096 LearningRate 0.3265 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:12,671-Speed 11438.10 samples/sec Loss 11.0527 LearningRate 0.3269 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:16,145-Speed 11795.43 samples/sec Loss 11.0509 LearningRate 0.3274 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:19,833-Speed 11110.74 samples/sec Loss 11.0421 LearningRate 0.3279 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:23,362-Speed 11609.81 samples/sec Loss 10.9776 LearningRate 0.3284 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:27,043-Speed 11132.67 samples/sec Loss 11.0151 LearningRate 0.3289 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:30,724-Speed 11130.81 samples/sec Loss 11.0189 LearningRate 0.3293 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:34,406-Speed 11129.12 samples/sec Loss 11.0651 LearningRate 0.3298 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:37,889-Speed 11762.43 samples/sec Loss 11.0453 LearningRate 0.3303 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:41,879-Speed 10269.23 samples/sec Loss 11.0290 LearningRate 0.3308 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:45,594-Speed 11029.93 samples/sec Loss 10.9453 LearningRate 0.3313 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:49,229-Speed 11271.30 samples/sec Loss 10.9402 LearningRate 0.3317 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:53,181-Speed 10368.07 samples/sec Loss 10.8953 LearningRate 0.3322 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:21:56,769-Speed 11421.44 samples/sec Loss 10.9415 LearningRate 0.3327 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:00,571-Speed 10774.54 samples/sec Loss 10.9676 LearningRate 0.3332 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:04,523-Speed 10367.52 samples/sec Loss 10.8957 LearningRate 0.3337 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:08,096-Speed 11469.93 samples/sec Loss 10.9527 LearningRate 0.3341 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:11,770-Speed 11150.91 samples/sec Loss 10.9036 LearningRate 0.3346 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:15,414-Speed 11245.87 samples/sec Loss 10.9434 LearningRate 0.3351 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:19,129-Speed 11026.47 samples/sec Loss 10.9157 LearningRate 0.3356 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:22,810-Speed 11131.11 samples/sec Loss 10.9383 LearningRate 0.3360 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:26,331-Speed 11637.72 samples/sec Loss 10.8634 LearningRate 0.3365 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:30,177-Speed 10652.75 samples/sec Loss 10.9443 LearningRate 0.3370 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:33,895-Speed 11021.86 samples/sec Loss 10.8752 LearningRate 0.3375 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:37,542-Speed 11233.69 samples/sec Loss 10.8648 LearningRate 0.3380 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:41,188-Speed 11237.52 samples/sec Loss 10.9134 LearningRate 0.3384 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:45,590-Speed 9308.09 samples/sec Loss 10.8936 LearningRate 0.3389 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:22:49,489-Speed 10506.97 samples/sec Loss 10.9723 LearningRate 0.3394 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:22:53,103-Speed 11338.13 samples/sec Loss 10.8694 LearningRate 0.3399 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:22:56,542-Speed 11915.60 samples/sec Loss 10.8752 LearningRate 0.3404 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:23:00,363-Speed 10723.30 samples/sec Loss 10.9217 LearningRate 0.3408 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:23:03,963-Speed 11379.95 samples/sec Loss 10.8677 LearningRate 0.3413 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:07,488-Speed 11624.60 samples/sec Loss 10.8382 LearningRate 0.3418 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:10,938-Speed 11875.05 samples/sec Loss 10.8874 LearningRate 0.3423 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:14,486-Speed 11549.61 samples/sec Loss 10.8043 LearningRate 0.3428 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:18,077-Speed 11409.72 samples/sec Loss 10.9196 LearningRate 0.3432 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:21,581-Speed 11693.98 samples/sec Loss 10.8322 LearningRate 0.3437 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:25,233-Speed 11218.61 samples/sec Loss 10.8434 LearningRate 0.3442 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:28,732-Speed 11710.82 samples/sec Loss 10.8667 LearningRate 0.3447 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:32,190-Speed 11848.86 samples/sec Loss 10.8386 LearningRate 0.3452 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:35,824-Speed 11276.49 samples/sec Loss 10.7964 LearningRate 0.3456 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:39,647-Speed 10717.58 samples/sec Loss 10.8818 LearningRate 0.3461 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:23:43,391-Speed 10942.83 samples/sec Loss 10.8690 LearningRate 0.3466 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:23:47,990-Speed 8909.26 samples/sec Loss 10.8466 LearningRate 0.3471 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:23:51,846-Speed 10626.92 samples/sec Loss 10.7612 LearningRate 0.3476 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:23:55,613-Speed 10874.59 samples/sec Loss 10.8430 LearningRate 0.3480 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:23:59,474-Speed 10611.79 samples/sec Loss 10.8283 LearningRate 0.3485 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:02,896-Speed 11973.57 samples/sec Loss 10.7450 LearningRate 0.3490 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:06,504-Speed 11358.99 samples/sec Loss 10.7728 LearningRate 0.3495 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:10,332-Speed 10704.27 samples/sec Loss 10.7783 LearningRate 0.3500 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:13,833-Speed 11704.79 samples/sec Loss 10.7620 LearningRate 0.3504 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:17,357-Speed 11626.72 samples/sec Loss 10.8514 LearningRate 0.3509 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:20,968-Speed 11345.82 samples/sec Loss 10.8687 LearningRate 0.3514 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:24,613-Speed 11242.72 samples/sec Loss 10.8102 LearningRate 0.3519 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:28,338-Speed 11001.95 samples/sec Loss 10.7870 LearningRate 0.3523 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:32,128-Speed 10810.45 samples/sec Loss 10.7438 LearningRate 0.3528 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:24:35,754-Speed 11298.61 samples/sec Loss 10.8692 LearningRate 0.3533 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:24:39,773-Speed 10195.21 samples/sec Loss 10.8085 LearningRate 0.3538 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:24:43,390-Speed 11326.31 samples/sec Loss 10.8211 LearningRate 0.3543 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:24:47,266-Speed 10573.97 samples/sec Loss 10.7820 LearningRate 0.3547 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:50,742-Speed 11787.44 samples/sec Loss 10.8573 LearningRate 0.3552 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:54,652-Speed 10478.11 samples/sec Loss 10.6961 LearningRate 0.3557 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:24:58,196-Speed 11563.43 samples/sec Loss 10.7655 LearningRate 0.3562 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:01,883-Speed 11110.84 samples/sec Loss 10.6785 LearningRate 0.3567 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:05,962-Speed 10047.64 samples/sec Loss 10.7763 LearningRate 0.3571 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:09,732-Speed 10866.72 samples/sec Loss 10.7082 LearningRate 0.3576 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:14,361-Speed 8849.85 samples/sec Loss 10.8240 LearningRate 0.3581 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:18,096-Speed 10972.19 samples/sec Loss 10.7012 LearningRate 0.3586 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:21,939-Speed 10659.50 samples/sec Loss 10.7161 LearningRate 0.3591 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:25,473-Speed 11595.49 samples/sec Loss 10.8076 LearningRate 0.3595 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:25:28,995-Speed 11635.06 samples/sec Loss 10.7680 LearningRate 0.3600 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:32,460-Speed 11823.31 samples/sec Loss 10.7809 LearningRate 0.3605 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:35,892-Speed 11937.30 samples/sec Loss 10.6983 LearningRate 0.3610 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:39,344-Speed 11870.83 samples/sec Loss 10.7265 LearningRate 0.3615 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:43,264-Speed 10453.69 samples/sec Loss 10.6780 LearningRate 0.3619 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:47,018-Speed 10915.78 samples/sec Loss 10.7533 LearningRate 0.3624 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:50,649-Speed 11284.30 samples/sec Loss 10.7078 LearningRate 0.3629 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:54,074-Speed 11961.21 samples/sec Loss 10.7308 LearningRate 0.3634 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:25:58,064-Speed 10267.31 samples/sec Loss 10.7279 LearningRate 0.3639 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:01,564-Speed 11706.41 samples/sec Loss 10.6570 LearningRate 0.3643 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:05,396-Speed 10692.71 samples/sec Loss 10.6717 LearningRate 0.3648 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:26:09,092-Speed 11088.37 samples/sec Loss 10.7484 LearningRate 0.3653 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:13,109-Speed 10200.54 samples/sec Loss 10.6512 LearningRate 0.3658 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:17,312-Speed 9746.25 samples/sec Loss 10.6272 LearningRate 0.3663 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:20,842-Speed 11607.80 samples/sec Loss 10.6440 LearningRate 0.3667 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:24,987-Speed 9884.86 samples/sec Loss 10.5967 LearningRate 0.3672 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:28,759-Speed 10861.43 samples/sec Loss 10.6995 LearningRate 0.3677 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:32,240-Speed 11771.05 samples/sec Loss 10.6503 LearningRate 0.3682 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:35,913-Speed 11154.67 samples/sec Loss 10.7355 LearningRate 0.3686 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:39,592-Speed 11137.46 samples/sec Loss 10.7304 LearningRate 0.3691 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:43,207-Speed 11335.48 samples/sec Loss 10.6261 LearningRate 0.3696 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:26:46,901-Speed 11094.83 samples/sec Loss 10.6585 LearningRate 0.3701 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:26:50,412-Speed 11672.50 samples/sec Loss 10.6695 LearningRate 0.3706 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:26:53,994-Speed 11438.57 samples/sec Loss 10.6061 LearningRate 0.3710 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:26:57,627-Speed 11278.90 samples/sec Loss 10.5956 LearningRate 0.3715 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:27:01,551-Speed 10439.96 samples/sec Loss 10.6960 LearningRate 0.3720 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:27:05,040-Speed 11744.82 samples/sec Loss 10.6227 LearningRate 0.3725 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:27:08,641-Speed 11379.11 samples/sec Loss 10.5965 LearningRate 0.3730 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:12,411-Speed 10868.26 samples/sec Loss 10.5804 LearningRate 0.3734 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:16,127-Speed 11026.53 samples/sec Loss 10.6378 LearningRate 0.3739 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:19,752-Speed 11302.62 samples/sec Loss 10.5773 LearningRate 0.3744 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:24,226-Speed 9158.97 samples/sec Loss 10.7017 LearningRate 0.3749 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:27,804-Speed 11450.80 samples/sec Loss 10.6367 LearningRate 0.3754 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:31,613-Speed 10754.97 samples/sec Loss 10.6244 LearningRate 0.3758 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:35,172-Speed 11514.46 samples/sec Loss 10.6411 LearningRate 0.3763 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:38,858-Speed 11113.92 samples/sec Loss 10.6447 LearningRate 0.3768 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:42,758-Speed 10507.05 samples/sec Loss 10.6183 LearningRate 0.3773 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:46,344-Speed 11427.85 samples/sec Loss 10.6511 LearningRate 0.3778 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:27:49,912-Speed 11481.24 samples/sec Loss 10.5595 LearningRate 0.3782 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:27:53,541-Speed 11289.96 samples/sec Loss 10.6194 LearningRate 0.3787 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:27:56,980-Speed 11915.92 samples/sec Loss 10.5896 LearningRate 0.3792 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:00,426-Speed 11891.99 samples/sec Loss 10.5976 LearningRate 0.3797 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:03,902-Speed 11785.51 samples/sec Loss 10.5952 LearningRate 0.3802 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:07,935-Speed 10160.96 samples/sec Loss 10.6475 LearningRate 0.3806 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:11,716-Speed 10836.14 samples/sec Loss 10.5915 LearningRate 0.3811 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:15,863-Speed 9878.18 samples/sec Loss 10.5980 LearningRate 0.3816 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:19,545-Speed 11129.21 samples/sec Loss 10.6211 LearningRate 0.3821 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:23,149-Speed 11367.29 samples/sec Loss 10.5835 LearningRate 0.3826 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:26,959-Speed 10754.21 samples/sec Loss 10.6255 LearningRate 0.3830 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:30,541-Speed 11441.09 samples/sec Loss 10.5423 LearningRate 0.3835 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:28:34,270-Speed 10986.57 samples/sec Loss 10.5916 LearningRate 0.3840 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:28:38,084-Speed 10742.10 samples/sec Loss 10.6272 LearningRate 0.3845 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:41,820-Speed 10967.38 samples/sec Loss 10.6358 LearningRate 0.3849 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:45,371-Speed 11540.00 samples/sec Loss 10.6071 LearningRate 0.3854 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:49,290-Speed 10454.75 samples/sec Loss 10.4984 LearningRate 0.3859 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:52,928-Speed 11261.70 samples/sec Loss 10.5935 LearningRate 0.3864 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:28:56,895-Speed 10326.48 samples/sec Loss 10.5648 LearningRate 0.3869 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:00,380-Speed 11761.81 samples/sec Loss 10.5511 LearningRate 0.3873 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:03,951-Speed 11472.60 samples/sec Loss 10.5010 LearningRate 0.3878 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:07,597-Speed 11238.92 samples/sec Loss 10.5309 LearningRate 0.3883 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:11,553-Speed 10356.41 samples/sec Loss 10.5235 LearningRate 0.3888 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:15,487-Speed 10413.51 samples/sec Loss 10.5553 LearningRate 0.3893 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:29:19,018-Speed 11605.38 samples/sec Loss 10.5323 LearningRate 0.3897 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:29:22,556-Speed 11580.03 samples/sec Loss 10.4918 LearningRate 0.3902 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:29:26,270-Speed 11032.81 samples/sec Loss 10.5567 LearningRate 0.3907 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:30,620-Speed 9417.71 samples/sec Loss 10.4824 LearningRate 0.3912 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:34,172-Speed 11535.90 samples/sec Loss 10.5364 LearningRate 0.3917 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:37,861-Speed 11107.78 samples/sec Loss 10.5911 LearningRate 0.3921 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:41,573-Speed 11037.98 samples/sec Loss 10.4543 LearningRate 0.3926 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:45,368-Speed 10796.46 samples/sec Loss 10.4987 LearningRate 0.3931 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:48,872-Speed 11692.48 samples/sec Loss 10.4821 LearningRate 0.3936 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:52,708-Speed 10680.99 samples/sec Loss 10.4576 LearningRate 0.3941 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:56,325-Speed 11326.96 samples/sec Loss 10.4486 LearningRate 0.3945 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:29:59,898-Speed 11467.51 samples/sec Loss 10.4159 LearningRate 0.3950 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:03,431-Speed 11597.81 samples/sec Loss 10.5344 LearningRate 0.3955 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:30:06,988-Speed 11520.03 samples/sec Loss 10.4568 LearningRate 0.3960 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:10,503-Speed 11656.96 samples/sec Loss 10.4702 LearningRate 0.3965 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:14,296-Speed 10800.87 samples/sec Loss 10.4777 LearningRate 0.3969 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:17,862-Speed 11491.06 samples/sec Loss 10.4738 LearningRate 0.3974 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:21,270-Speed 12022.47 samples/sec Loss 10.7837 LearningRate 0.3979 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:24,720-Speed 11878.89 samples/sec Loss 10.6044 LearningRate 0.3984 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:28,216-Speed 11717.80 samples/sec Loss 10.5050 LearningRate 0.3988 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:31,677-Speed 11840.49 samples/sec Loss 10.5197 LearningRate 0.3993 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:30:36,372-Speed 8725.26 samples/sec Loss 10.5059 LearningRate 0.3998 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:31:08,537-Speed 1273.51 samples/sec Loss 10.0542 LearningRate 0.3999 Epoch: 2 Global Step: 8350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:31:12,303-Speed 10880.91 samples/sec Loss 9.6655 LearningRate 0.3998 Epoch: 2 Global Step: 8360 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:16,843-Speed 9023.70 samples/sec Loss 9.6641 LearningRate 0.3997 Epoch: 2 Global Step: 8370 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:20,605-Speed 10891.78 samples/sec Loss 9.7492 LearningRate 0.3996 Epoch: 2 Global Step: 8380 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:24,555-Speed 10380.81 samples/sec Loss 9.7020 LearningRate 0.3995 Epoch: 2 Global Step: 8390 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:29,301-Speed 8632.51 samples/sec Loss 9.7208 LearningRate 0.3994 Epoch: 2 Global Step: 8400 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:33,103-Speed 10777.37 samples/sec Loss 9.7875 LearningRate 0.3993 Epoch: 2 Global Step: 8410 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:36,613-Speed 11671.86 samples/sec Loss 9.6642 LearningRate 0.3992 Epoch: 2 Global Step: 8420 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:40,200-Speed 11424.48 samples/sec Loss 9.6972 LearningRate 0.3991 Epoch: 2 Global Step: 8430 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:43,768-Speed 11482.80 samples/sec Loss 9.7238 LearningRate 0.3990 Epoch: 2 Global Step: 8440 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:47,219-Speed 11874.84 samples/sec Loss 9.7562 LearningRate 0.3989 Epoch: 2 Global Step: 8450 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:31:51,021-Speed 10776.65 samples/sec Loss 9.7863 LearningRate 0.3988 Epoch: 2 Global Step: 8460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:31:54,924-Speed 10496.49 samples/sec Loss 9.7901 LearningRate 0.3987 Epoch: 2 Global Step: 8470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:31:58,964-Speed 10141.78 samples/sec Loss 9.7594 LearningRate 0.3986 Epoch: 2 Global Step: 8480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:02,378-Speed 12002.89 samples/sec Loss 9.8149 LearningRate 0.3984 Epoch: 2 Global Step: 8490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:06,334-Speed 10356.66 samples/sec Loss 9.7736 LearningRate 0.3983 Epoch: 2 Global Step: 8500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:10,077-Speed 10945.59 samples/sec Loss 9.8242 LearningRate 0.3982 Epoch: 2 Global Step: 8510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:13,691-Speed 11340.10 samples/sec Loss 9.7921 LearningRate 0.3981 Epoch: 2 Global Step: 8520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:17,133-Speed 11906.75 samples/sec Loss 9.7915 LearningRate 0.3980 Epoch: 2 Global Step: 8530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:20,603-Speed 11805.72 samples/sec Loss 9.7371 LearningRate 0.3979 Epoch: 2 Global Step: 8540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:24,322-Speed 11018.32 samples/sec Loss 9.7540 LearningRate 0.3978 Epoch: 2 Global Step: 8550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:27,719-Speed 12061.18 samples/sec Loss 9.8183 LearningRate 0.3977 Epoch: 2 Global Step: 8560 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:32:31,715-Speed 10254.07 samples/sec Loss 9.8474 LearningRate 0.3976 Epoch: 2 Global Step: 8570 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:32:35,116-Speed 12047.05 samples/sec Loss 9.7920 LearningRate 0.3975 Epoch: 2 Global Step: 8580 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:32:38,672-Speed 11522.98 samples/sec Loss 9.7828 LearningRate 0.3974 Epoch: 2 Global Step: 8590 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:32:42,545-Speed 10578.91 samples/sec Loss 9.8983 LearningRate 0.3973 Epoch: 2 Global Step: 8600 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:32:46,473-Speed 10431.78 samples/sec Loss 9.9260 LearningRate 0.3972 Epoch: 2 Global Step: 8610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:49,990-Speed 11648.46 samples/sec Loss 9.9097 LearningRate 0.3971 Epoch: 2 Global Step: 8620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:53,649-Speed 11199.00 samples/sec Loss 9.8948 LearningRate 0.3970 Epoch: 2 Global Step: 8630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:32:57,309-Speed 11196.74 samples/sec Loss 9.8811 LearningRate 0.3969 Epoch: 2 Global Step: 8640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:01,074-Speed 10881.18 samples/sec Loss 9.8565 LearningRate 0.3967 Epoch: 2 Global Step: 8650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:04,547-Speed 11798.27 samples/sec Loss 9.8497 LearningRate 0.3966 Epoch: 2 Global Step: 8660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:08,020-Speed 11795.05 samples/sec Loss 9.8617 LearningRate 0.3965 Epoch: 2 Global Step: 8670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:11,443-Speed 11974.22 samples/sec Loss 9.8663 LearningRate 0.3964 Epoch: 2 Global Step: 8680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:14,928-Speed 11755.62 samples/sec Loss 9.9273 LearningRate 0.3963 Epoch: 2 Global Step: 8690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:18,399-Speed 11803.16 samples/sec Loss 9.8402 LearningRate 0.3962 Epoch: 2 Global Step: 8700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:21,930-Speed 11603.94 samples/sec Loss 9.8413 LearningRate 0.3961 Epoch: 2 Global Step: 8710 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:25,699-Speed 10883.31 samples/sec Loss 9.9284 LearningRate 0.3960 Epoch: 2 Global Step: 8720 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:29,137-Speed 11917.45 samples/sec Loss 9.8148 LearningRate 0.3959 Epoch: 2 Global Step: 8730 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:32,514-Speed 12136.73 samples/sec Loss 9.8392 LearningRate 0.3958 Epoch: 2 Global Step: 8740 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:36,056-Speed 11565.29 samples/sec Loss 9.9143 LearningRate 0.3957 Epoch: 2 Global Step: 8750 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:39,734-Speed 11142.43 samples/sec Loss 9.8704 LearningRate 0.3956 Epoch: 2 Global Step: 8760 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:43,150-Speed 11992.86 samples/sec Loss 9.9185 LearningRate 0.3955 Epoch: 2 Global Step: 8770 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:46,753-Speed 11371.49 samples/sec Loss 9.9296 LearningRate 0.3954 Epoch: 2 Global Step: 8780 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:33:50,345-Speed 11408.32 samples/sec Loss 9.9246 LearningRate 0.3953 Epoch: 2 Global Step: 8790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:54,161-Speed 10737.60 samples/sec Loss 9.8417 LearningRate 0.3952 Epoch: 2 Global Step: 8800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:33:57,803-Speed 11249.50 samples/sec Loss 9.8381 LearningRate 0.3951 Epoch: 2 Global Step: 8810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:02,160-Speed 9402.91 samples/sec Loss 9.8139 LearningRate 0.3949 Epoch: 2 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:05,690-Speed 11606.17 samples/sec Loss 9.8858 LearningRate 0.3948 Epoch: 2 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:09,136-Speed 11893.99 samples/sec Loss 9.9432 LearningRate 0.3947 Epoch: 2 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:12,678-Speed 11567.22 samples/sec Loss 9.9428 LearningRate 0.3946 Epoch: 2 Global Step: 8850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:16,097-Speed 11985.42 samples/sec Loss 9.9171 LearningRate 0.3945 Epoch: 2 Global Step: 8860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:19,591-Speed 11727.33 samples/sec Loss 9.9458 LearningRate 0.3944 Epoch: 2 Global Step: 8870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:23,066-Speed 11791.91 samples/sec Loss 10.0196 LearningRate 0.3943 Epoch: 2 Global Step: 8880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:26,691-Speed 11304.66 samples/sec Loss 9.9356 LearningRate 0.3942 Epoch: 2 Global Step: 8890 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:34:30,466-Speed 10852.95 samples/sec Loss 9.9923 LearningRate 0.3941 Epoch: 2 Global Step: 8900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:34,301-Speed 10682.94 samples/sec Loss 9.8759 LearningRate 0.3940 Epoch: 2 Global Step: 8910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:37,791-Speed 11740.87 samples/sec Loss 9.9117 LearningRate 0.3939 Epoch: 2 Global Step: 8920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:41,528-Speed 10963.34 samples/sec Loss 9.8780 LearningRate 0.3938 Epoch: 2 Global Step: 8930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:45,558-Speed 10166.78 samples/sec Loss 10.0033 LearningRate 0.3937 Epoch: 2 Global Step: 8940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:49,115-Speed 11520.19 samples/sec Loss 9.9078 LearningRate 0.3936 Epoch: 2 Global Step: 8950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:52,628-Speed 11661.68 samples/sec Loss 9.8771 LearningRate 0.3935 Epoch: 2 Global Step: 8960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:34:56,195-Speed 11486.47 samples/sec Loss 9.9096 LearningRate 0.3934 Epoch: 2 Global Step: 8970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:00,131-Speed 10410.85 samples/sec Loss 9.8948 LearningRate 0.3933 Epoch: 2 Global Step: 8980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:03,609-Speed 11779.50 samples/sec Loss 9.9303 LearningRate 0.3931 Epoch: 2 Global Step: 8990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:07,039-Speed 11945.22 samples/sec Loss 10.0299 LearningRate 0.3930 Epoch: 2 Global Step: 9000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:10,479-Speed 11912.38 samples/sec Loss 9.9340 LearningRate 0.3929 Epoch: 2 Global Step: 9010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:14,574-Speed 10006.24 samples/sec Loss 9.8766 LearningRate 0.3928 Epoch: 2 Global Step: 9020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:18,171-Speed 11390.19 samples/sec Loss 9.8992 LearningRate 0.3927 Epoch: 2 Global Step: 9030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:21,626-Speed 11856.92 samples/sec Loss 9.9023 LearningRate 0.3926 Epoch: 2 Global Step: 9040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:25,118-Speed 11735.05 samples/sec Loss 9.8910 LearningRate 0.3925 Epoch: 2 Global Step: 9050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:28,596-Speed 11780.79 samples/sec Loss 9.8839 LearningRate 0.3924 Epoch: 2 Global Step: 9060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:32,045-Speed 11880.02 samples/sec Loss 9.9288 LearningRate 0.3923 Epoch: 2 Global Step: 9070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:35:36,081-Speed 10151.67 samples/sec Loss 10.0293 LearningRate 0.3922 Epoch: 2 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:35:39,554-Speed 11798.96 samples/sec Loss 9.8904 LearningRate 0.3921 Epoch: 2 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:35:44,104-Speed 9004.57 samples/sec Loss 9.8444 LearningRate 0.3920 Epoch: 2 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:35:47,590-Speed 11761.26 samples/sec Loss 9.8502 LearningRate 0.3919 Epoch: 2 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:35:51,138-Speed 11548.60 samples/sec Loss 9.8948 LearningRate 0.3918 Epoch: 2 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:35:54,711-Speed 11467.65 samples/sec Loss 9.7779 LearningRate 0.3917 Epoch: 2 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:35:58,196-Speed 11755.47 samples/sec Loss 9.8407 LearningRate 0.3916 Epoch: 2 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:36:02,435-Speed 9666.63 samples/sec Loss 9.8630 LearningRate 0.3915 Epoch: 2 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:36:05,896-Speed 11837.33 samples/sec Loss 9.9636 LearningRate 0.3914 Epoch: 2 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:36:09,341-Speed 11896.53 samples/sec Loss 9.8811 LearningRate 0.3912 Epoch: 2 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:36:12,810-Speed 11812.63 samples/sec Loss 9.9206 LearningRate 0.3911 Epoch: 2 Global Step: 9180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:16,683-Speed 10579.13 samples/sec Loss 9.8536 LearningRate 0.3910 Epoch: 2 Global Step: 9190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:20,207-Speed 11625.65 samples/sec Loss 9.8740 LearningRate 0.3909 Epoch: 2 Global Step: 9200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:24,010-Speed 10776.71 samples/sec Loss 9.8511 LearningRate 0.3908 Epoch: 2 Global Step: 9210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:27,455-Speed 11924.18 samples/sec Loss 9.8735 LearningRate 0.3907 Epoch: 2 Global Step: 9220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:30,910-Speed 11858.88 samples/sec Loss 9.8750 LearningRate 0.3906 Epoch: 2 Global Step: 9230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:34,550-Speed 11257.64 samples/sec Loss 9.8782 LearningRate 0.3905 Epoch: 2 Global Step: 9240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:38,075-Speed 11624.56 samples/sec Loss 9.7635 LearningRate 0.3904 Epoch: 2 Global Step: 9250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:41,554-Speed 11794.81 samples/sec Loss 9.8507 LearningRate 0.3903 Epoch: 2 Global Step: 9260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:45,331-Speed 10847.56 samples/sec Loss 9.8381 LearningRate 0.3902 Epoch: 2 Global Step: 9270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:48,966-Speed 11271.74 samples/sec Loss 9.7574 LearningRate 0.3901 Epoch: 2 Global Step: 9280 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:36:52,853-Speed 10539.23 samples/sec Loss 9.8651 LearningRate 0.3900 Epoch: 2 Global Step: 9290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:56,507-Speed 11216.50 samples/sec Loss 9.8902 LearningRate 0.3899 Epoch: 2 Global Step: 9300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:36:59,953-Speed 11888.53 samples/sec Loss 9.7823 LearningRate 0.3898 Epoch: 2 Global Step: 9310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:03,474-Speed 11638.77 samples/sec Loss 9.8215 LearningRate 0.3897 Epoch: 2 Global Step: 9320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:07,040-Speed 11491.05 samples/sec Loss 9.7800 LearningRate 0.3896 Epoch: 2 Global Step: 9330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:10,646-Speed 11363.29 samples/sec Loss 9.7987 LearningRate 0.3895 Epoch: 2 Global Step: 9340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:14,244-Speed 11386.88 samples/sec Loss 9.8270 LearningRate 0.3894 Epoch: 2 Global Step: 9350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:18,906-Speed 8789.21 samples/sec Loss 9.7906 LearningRate 0.3892 Epoch: 2 Global Step: 9360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:22,348-Speed 11902.87 samples/sec Loss 9.8764 LearningRate 0.3891 Epoch: 2 Global Step: 9370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:25,862-Speed 11660.00 samples/sec Loss 9.8216 LearningRate 0.3890 Epoch: 2 Global Step: 9380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:29,607-Speed 10940.05 samples/sec Loss 9.7990 LearningRate 0.3889 Epoch: 2 Global Step: 9390 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:37:33,178-Speed 11476.18 samples/sec Loss 9.7626 LearningRate 0.3888 Epoch: 2 Global Step: 9400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:36,993-Speed 10740.13 samples/sec Loss 9.7857 LearningRate 0.3887 Epoch: 2 Global Step: 9410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:40,466-Speed 11798.22 samples/sec Loss 9.7542 LearningRate 0.3886 Epoch: 2 Global Step: 9420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:43,980-Speed 11659.24 samples/sec Loss 9.7623 LearningRate 0.3885 Epoch: 2 Global Step: 9430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:47,577-Speed 11391.20 samples/sec Loss 9.7591 LearningRate 0.3884 Epoch: 2 Global Step: 9440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:51,337-Speed 10898.16 samples/sec Loss 9.8438 LearningRate 0.3883 Epoch: 2 Global Step: 9450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:54,746-Speed 12019.86 samples/sec Loss 9.7163 LearningRate 0.3882 Epoch: 2 Global Step: 9460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:37:58,553-Speed 10761.34 samples/sec Loss 9.7357 LearningRate 0.3881 Epoch: 2 Global Step: 9470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:02,349-Speed 10793.64 samples/sec Loss 9.7603 LearningRate 0.3880 Epoch: 2 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:06,038-Speed 11105.96 samples/sec Loss 9.7719 LearningRate 0.3879 Epoch: 2 Global Step: 9490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:09,731-Speed 11097.23 samples/sec Loss 9.6798 LearningRate 0.3878 Epoch: 2 Global Step: 9500 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:38:13,540-Speed 10754.18 samples/sec Loss 9.7480 LearningRate 0.3877 Epoch: 2 Global Step: 9510 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:38:16,956-Speed 11994.88 samples/sec Loss 9.7436 LearningRate 0.3876 Epoch: 2 Global Step: 9520 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:38:20,459-Speed 11696.19 samples/sec Loss 9.8413 LearningRate 0.3875 Epoch: 2 Global Step: 9530 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:38:24,065-Speed 11363.47 samples/sec Loss 9.7373 LearningRate 0.3874 Epoch: 2 Global Step: 9540 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:38:27,617-Speed 11534.88 samples/sec Loss 9.7656 LearningRate 0.3873 Epoch: 2 Global Step: 9550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:31,318-Speed 11071.87 samples/sec Loss 9.7440 LearningRate 0.3872 Epoch: 2 Global Step: 9560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:35,707-Speed 9335.50 samples/sec Loss 9.7528 LearningRate 0.3870 Epoch: 2 Global Step: 9570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:39,467-Speed 10896.45 samples/sec Loss 9.7618 LearningRate 0.3869 Epoch: 2 Global Step: 9580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:43,604-Speed 9904.44 samples/sec Loss 9.7211 LearningRate 0.3868 Epoch: 2 Global Step: 9590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:47,352-Speed 10931.70 samples/sec Loss 9.7908 LearningRate 0.3867 Epoch: 2 Global Step: 9600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:50,972-Speed 11319.26 samples/sec Loss 9.7594 LearningRate 0.3866 Epoch: 2 Global Step: 9610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:54,728-Speed 10906.35 samples/sec Loss 9.8828 LearningRate 0.3865 Epoch: 2 Global Step: 9620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:38:58,413-Speed 11121.10 samples/sec Loss 9.8193 LearningRate 0.3864 Epoch: 2 Global Step: 9630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:01,842-Speed 11948.15 samples/sec Loss 9.7977 LearningRate 0.3863 Epoch: 2 Global Step: 9640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:05,387-Speed 11556.92 samples/sec Loss 9.6490 LearningRate 0.3862 Epoch: 2 Global Step: 9650 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:39:09,091-Speed 11063.73 samples/sec Loss 9.7349 LearningRate 0.3861 Epoch: 2 Global Step: 9660 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:39:12,784-Speed 11094.81 samples/sec Loss 9.7872 LearningRate 0.3860 Epoch: 2 Global Step: 9670 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:39:16,565-Speed 10835.92 samples/sec Loss 9.7781 LearningRate 0.3859 Epoch: 2 Global Step: 9680 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:39:20,138-Speed 11469.53 samples/sec Loss 9.8594 LearningRate 0.3858 Epoch: 2 Global Step: 9690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:23,739-Speed 11376.39 samples/sec Loss 9.7287 LearningRate 0.3857 Epoch: 2 Global Step: 9700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:27,222-Speed 11771.94 samples/sec Loss 9.9262 LearningRate 0.3856 Epoch: 2 Global Step: 9710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:30,667-Speed 11892.39 samples/sec Loss 9.8014 LearningRate 0.3855 Epoch: 2 Global Step: 9720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:34,179-Speed 11666.17 samples/sec Loss 9.6328 LearningRate 0.3854 Epoch: 2 Global Step: 9730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:38,811-Speed 8844.92 samples/sec Loss 9.7270 LearningRate 0.3853 Epoch: 2 Global Step: 9740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:42,450-Speed 11260.20 samples/sec Loss 9.7256 LearningRate 0.3852 Epoch: 2 Global Step: 9750 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:46,162-Speed 11037.32 samples/sec Loss 9.7621 LearningRate 0.3851 Epoch: 2 Global Step: 9760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:49,825-Speed 11187.47 samples/sec Loss 9.7697 LearningRate 0.3850 Epoch: 2 Global Step: 9770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:53,364-Speed 11575.81 samples/sec Loss 9.7279 LearningRate 0.3848 Epoch: 2 Global Step: 9780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:39:56,934-Speed 11477.36 samples/sec Loss 9.6211 LearningRate 0.3847 Epoch: 2 Global Step: 9790 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:40:00,427-Speed 11731.99 samples/sec Loss 9.6923 LearningRate 0.3846 Epoch: 2 Global Step: 9800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:03,827-Speed 12053.06 samples/sec Loss 9.6942 LearningRate 0.3845 Epoch: 2 Global Step: 9810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:07,304-Speed 11782.13 samples/sec Loss 9.7098 LearningRate 0.3844 Epoch: 2 Global Step: 9820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:10,719-Speed 12000.02 samples/sec Loss 9.6945 LearningRate 0.3843 Epoch: 2 Global Step: 9830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:14,163-Speed 11896.35 samples/sec Loss 9.6980 LearningRate 0.3842 Epoch: 2 Global Step: 9840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:17,712-Speed 11543.59 samples/sec Loss 9.6530 LearningRate 0.3841 Epoch: 2 Global Step: 9850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:21,583-Speed 10584.33 samples/sec Loss 9.6607 LearningRate 0.3840 Epoch: 2 Global Step: 9860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:25,259-Speed 11147.85 samples/sec Loss 9.6503 LearningRate 0.3839 Epoch: 2 Global Step: 9870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:28,951-Speed 11100.45 samples/sec Loss 9.6314 LearningRate 0.3838 Epoch: 2 Global Step: 9880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:32,688-Speed 10961.86 samples/sec Loss 9.6228 LearningRate 0.3837 Epoch: 2 Global Step: 9890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:36,294-Speed 11364.60 samples/sec Loss 9.5413 LearningRate 0.3836 Epoch: 2 Global Step: 9900 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:40:40,586-Speed 9546.59 samples/sec Loss 9.6277 LearningRate 0.3835 Epoch: 2 Global Step: 9910 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:40:44,029-Speed 11900.42 samples/sec Loss 9.5586 LearningRate 0.3834 Epoch: 2 Global Step: 9920 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:40:47,734-Speed 11058.67 samples/sec Loss 9.6275 LearningRate 0.3833 Epoch: 2 Global Step: 9930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:51,562-Speed 10703.76 samples/sec Loss 9.5750 LearningRate 0.3832 Epoch: 2 Global Step: 9940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:55,319-Speed 10904.75 samples/sec Loss 9.5925 LearningRate 0.3831 Epoch: 2 Global Step: 9950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:40:59,124-Speed 10770.62 samples/sec Loss 9.5799 LearningRate 0.3830 Epoch: 2 Global Step: 9960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:41:02,964-Speed 10670.43 samples/sec Loss 9.6170 LearningRate 0.3829 Epoch: 2 Global Step: 9970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:41:06,586-Speed 11313.68 samples/sec Loss 9.6646 LearningRate 0.3828 Epoch: 2 Global Step: 9980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:41:10,329-Speed 10945.37 samples/sec Loss 9.6152 LearningRate 0.3827 Epoch: 2 Global Step: 9990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:41:13,797-Speed 11814.14 samples/sec Loss 9.6258 LearningRate 0.3826 Epoch: 2 Global Step: 10000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:41:34,956-[lfw][10000]XNorm: 14.073455 Training: 2022-01-16 22:41:34,957-[lfw][10000]Accuracy-Flip: 0.99433+-0.00281 Training: 2022-01-16 22:41:34,957-[lfw][10000]Accuracy-Highest: 0.99433 Training: 2022-01-16 22:41:59,360-[cfp_fp][10000]XNorm: 11.762516 Training: 2022-01-16 22:41:59,361-[cfp_fp][10000]Accuracy-Flip: 0.94157+-0.01243 Training: 2022-01-16 22:41:59,363-[cfp_fp][10000]Accuracy-Highest: 0.94157 Training: 2022-01-16 22:42:20,423-[agedb_30][10000]XNorm: 13.492690 Training: 2022-01-16 22:42:20,423-[agedb_30][10000]Accuracy-Flip: 0.94900+-0.01265 Training: 2022-01-16 22:42:20,423-[agedb_30][10000]Accuracy-Highest: 0.94900 Training: 2022-01-16 22:42:23,792-Speed 585.20 samples/sec Loss 9.5627 LearningRate 0.3824 Epoch: 2 Global Step: 10010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:27,160-Speed 12166.74 samples/sec Loss 9.6550 LearningRate 0.3823 Epoch: 2 Global Step: 10020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:30,556-Speed 12066.37 samples/sec Loss 9.5855 LearningRate 0.3822 Epoch: 2 Global Step: 10030 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:42:33,942-Speed 12101.14 samples/sec Loss 9.5602 LearningRate 0.3821 Epoch: 2 Global Step: 10040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:37,327-Speed 12107.09 samples/sec Loss 9.5452 LearningRate 0.3820 Epoch: 2 Global Step: 10050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:40,699-Speed 12148.06 samples/sec Loss 9.6546 LearningRate 0.3819 Epoch: 2 Global Step: 10060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:44,083-Speed 12109.47 samples/sec Loss 9.5726 LearningRate 0.3818 Epoch: 2 Global Step: 10070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:47,538-Speed 11858.02 samples/sec Loss 9.5462 LearningRate 0.3817 Epoch: 2 Global Step: 10080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:51,308-Speed 10870.25 samples/sec Loss 9.6158 LearningRate 0.3816 Epoch: 2 Global Step: 10090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:54,946-Speed 11263.15 samples/sec Loss 9.6590 LearningRate 0.3815 Epoch: 2 Global Step: 10100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:42:58,597-Speed 11223.52 samples/sec Loss 9.5623 LearningRate 0.3814 Epoch: 2 Global Step: 10110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:02,647-Speed 10114.76 samples/sec Loss 9.5910 LearningRate 0.3813 Epoch: 2 Global Step: 10120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:06,131-Speed 11762.06 samples/sec Loss 9.5779 LearningRate 0.3812 Epoch: 2 Global Step: 10130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:09,870-Speed 10975.12 samples/sec Loss 9.5710 LearningRate 0.3811 Epoch: 2 Global Step: 10140 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:43:13,550-Speed 11132.40 samples/sec Loss 9.6100 LearningRate 0.3810 Epoch: 2 Global Step: 10150 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:43:17,051-Speed 11703.59 samples/sec Loss 9.6518 LearningRate 0.3809 Epoch: 2 Global Step: 10160 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:43:20,892-Speed 10667.19 samples/sec Loss 9.5736 LearningRate 0.3808 Epoch: 2 Global Step: 10170 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:43:24,410-Speed 11647.68 samples/sec Loss 9.5830 LearningRate 0.3807 Epoch: 2 Global Step: 10180 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:43:27,932-Speed 11632.78 samples/sec Loss 9.5803 LearningRate 0.3806 Epoch: 2 Global Step: 10190 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:43:31,783-Speed 10639.63 samples/sec Loss 9.4436 LearningRate 0.3805 Epoch: 2 Global Step: 10200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:35,320-Speed 11584.05 samples/sec Loss 9.5894 LearningRate 0.3804 Epoch: 2 Global Step: 10210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:38,745-Speed 11963.97 samples/sec Loss 9.5025 LearningRate 0.3803 Epoch: 2 Global Step: 10220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:42,222-Speed 11786.96 samples/sec Loss 9.5879 LearningRate 0.3802 Epoch: 2 Global Step: 10230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:45,936-Speed 11030.38 samples/sec Loss 9.4989 LearningRate 0.3801 Epoch: 2 Global Step: 10240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:49,723-Speed 10821.54 samples/sec Loss 9.5416 LearningRate 0.3800 Epoch: 2 Global Step: 10250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:53,568-Speed 10655.85 samples/sec Loss 9.5553 LearningRate 0.3798 Epoch: 2 Global Step: 10260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:43:57,306-Speed 10960.08 samples/sec Loss 9.5035 LearningRate 0.3797 Epoch: 2 Global Step: 10270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:00,806-Speed 11708.15 samples/sec Loss 9.5378 LearningRate 0.3796 Epoch: 2 Global Step: 10280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:04,323-Speed 11647.93 samples/sec Loss 9.5100 LearningRate 0.3795 Epoch: 2 Global Step: 10290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:07,954-Speed 11285.13 samples/sec Loss 9.5497 LearningRate 0.3794 Epoch: 2 Global Step: 10300 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:44:11,953-Speed 10245.71 samples/sec Loss 9.5220 LearningRate 0.3793 Epoch: 2 Global Step: 10310 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:44:15,496-Speed 11564.71 samples/sec Loss 9.4937 LearningRate 0.3792 Epoch: 2 Global Step: 10320 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:44:18,955-Speed 11846.02 samples/sec Loss 9.4696 LearningRate 0.3791 Epoch: 2 Global Step: 10330 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:44:22,454-Speed 11709.22 samples/sec Loss 9.5246 LearningRate 0.3790 Epoch: 2 Global Step: 10340 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:44:25,844-Speed 12086.64 samples/sec Loss 9.5267 LearningRate 0.3789 Epoch: 2 Global Step: 10350 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:44:29,445-Speed 11378.95 samples/sec Loss 9.4659 LearningRate 0.3788 Epoch: 2 Global Step: 10360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:32,920-Speed 11788.68 samples/sec Loss 9.4814 LearningRate 0.3787 Epoch: 2 Global Step: 10370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:36,473-Speed 11531.27 samples/sec Loss 9.5379 LearningRate 0.3786 Epoch: 2 Global Step: 10380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:39,978-Speed 11690.22 samples/sec Loss 9.4760 LearningRate 0.3785 Epoch: 2 Global Step: 10390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:43,678-Speed 11073.96 samples/sec Loss 9.5424 LearningRate 0.3784 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:47,543-Speed 10602.66 samples/sec Loss 9.4648 LearningRate 0.3783 Epoch: 2 Global Step: 10410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:51,341-Speed 10789.25 samples/sec Loss 9.5616 LearningRate 0.3782 Epoch: 2 Global Step: 10420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:55,188-Speed 10650.12 samples/sec Loss 9.5098 LearningRate 0.3781 Epoch: 2 Global Step: 10430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:44:58,774-Speed 11427.15 samples/sec Loss 9.4677 LearningRate 0.3780 Epoch: 2 Global Step: 10440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:45:02,445-Speed 11159.14 samples/sec Loss 9.4301 LearningRate 0.3779 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:06,633-Speed 9784.24 samples/sec Loss 9.4253 LearningRate 0.3778 Epoch: 2 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:10,084-Speed 11873.00 samples/sec Loss 9.5103 LearningRate 0.3777 Epoch: 2 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:13,562-Speed 11779.17 samples/sec Loss 9.4980 LearningRate 0.3776 Epoch: 2 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:16,930-Speed 12169.89 samples/sec Loss 9.4658 LearningRate 0.3775 Epoch: 2 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:20,350-Speed 11979.29 samples/sec Loss 9.4493 LearningRate 0.3774 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:23,819-Speed 11811.16 samples/sec Loss 9.4383 LearningRate 0.3773 Epoch: 2 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:27,349-Speed 11609.29 samples/sec Loss 9.4798 LearningRate 0.3772 Epoch: 2 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:30,786-Speed 11919.50 samples/sec Loss 9.4312 LearningRate 0.3771 Epoch: 2 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:34,373-Speed 11422.14 samples/sec Loss 9.4573 LearningRate 0.3769 Epoch: 2 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:45:37,968-Speed 11397.81 samples/sec Loss 9.4492 LearningRate 0.3768 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:45:41,643-Speed 11150.37 samples/sec Loss 9.4281 LearningRate 0.3767 Epoch: 2 Global Step: 10560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:45:45,143-Speed 11706.91 samples/sec Loss 9.3976 LearningRate 0.3766 Epoch: 2 Global Step: 10570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:45:48,726-Speed 11435.48 samples/sec Loss 9.4430 LearningRate 0.3765 Epoch: 2 Global Step: 10580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:45:52,372-Speed 11238.83 samples/sec Loss 9.4492 LearningRate 0.3764 Epoch: 2 Global Step: 10590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:45:56,202-Speed 10697.92 samples/sec Loss 9.4893 LearningRate 0.3763 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:45:59,790-Speed 11417.42 samples/sec Loss 9.3985 LearningRate 0.3762 Epoch: 2 Global Step: 10610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:03,269-Speed 11780.57 samples/sec Loss 9.4092 LearningRate 0.3761 Epoch: 2 Global Step: 10620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:06,737-Speed 11816.55 samples/sec Loss 9.3777 LearningRate 0.3760 Epoch: 2 Global Step: 10630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:10,982-Speed 9650.78 samples/sec Loss 9.3828 LearningRate 0.3759 Epoch: 2 Global Step: 10640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:14,554-Speed 11472.43 samples/sec Loss 9.3680 LearningRate 0.3758 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:46:18,202-Speed 11230.95 samples/sec Loss 9.4905 LearningRate 0.3757 Epoch: 2 Global Step: 10660 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:46:21,913-Speed 11040.88 samples/sec Loss 9.4158 LearningRate 0.3756 Epoch: 2 Global Step: 10670 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:46:25,337-Speed 11967.13 samples/sec Loss 9.3937 LearningRate 0.3755 Epoch: 2 Global Step: 10680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:28,801-Speed 11828.53 samples/sec Loss 9.3611 LearningRate 0.3754 Epoch: 2 Global Step: 10690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:32,279-Speed 11780.45 samples/sec Loss 9.3486 LearningRate 0.3753 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:35,951-Speed 11159.34 samples/sec Loss 9.4046 LearningRate 0.3752 Epoch: 2 Global Step: 10710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:39,569-Speed 11322.77 samples/sec Loss 9.5660 LearningRate 0.3751 Epoch: 2 Global Step: 10720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:43,137-Speed 11485.04 samples/sec Loss 9.5049 LearningRate 0.3750 Epoch: 2 Global Step: 10730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:46,956-Speed 10729.33 samples/sec Loss 9.4041 LearningRate 0.3749 Epoch: 2 Global Step: 10740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:50,830-Speed 10576.38 samples/sec Loss 9.4160 LearningRate 0.3748 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:46:54,577-Speed 10935.14 samples/sec Loss 9.3872 LearningRate 0.3747 Epoch: 2 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:46:58,162-Speed 11427.24 samples/sec Loss 9.3567 LearningRate 0.3746 Epoch: 2 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:02,000-Speed 10675.25 samples/sec Loss 9.3238 LearningRate 0.3745 Epoch: 2 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:05,617-Speed 11330.18 samples/sec Loss 9.3038 LearningRate 0.3744 Epoch: 2 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:09,189-Speed 11470.69 samples/sec Loss 9.4312 LearningRate 0.3743 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:13,697-Speed 9088.81 samples/sec Loss 9.3551 LearningRate 0.3742 Epoch: 2 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:17,124-Speed 11956.35 samples/sec Loss 9.3372 LearningRate 0.3741 Epoch: 2 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:20,530-Speed 12027.98 samples/sec Loss 9.3403 LearningRate 0.3740 Epoch: 2 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:24,062-Speed 11600.19 samples/sec Loss 9.4171 LearningRate 0.3739 Epoch: 2 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:27,487-Speed 11965.11 samples/sec Loss 9.3279 LearningRate 0.3737 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:47:31,657-Speed 9825.96 samples/sec Loss 9.3187 LearningRate 0.3736 Epoch: 2 Global Step: 10860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:47:35,264-Speed 11359.70 samples/sec Loss 9.3390 LearningRate 0.3735 Epoch: 2 Global Step: 10870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:47:38,862-Speed 11385.92 samples/sec Loss 9.4109 LearningRate 0.3734 Epoch: 2 Global Step: 10880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:47:42,263-Speed 12047.96 samples/sec Loss 9.2985 LearningRate 0.3733 Epoch: 2 Global Step: 10890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:47:46,103-Speed 10670.59 samples/sec Loss 9.3512 LearningRate 0.3732 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:47:49,534-Speed 11941.98 samples/sec Loss 9.3594 LearningRate 0.3731 Epoch: 2 Global Step: 10910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:47:52,946-Speed 12007.10 samples/sec Loss 9.4286 LearningRate 0.3730 Epoch: 2 Global Step: 10920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:47:56,473-Speed 11616.96 samples/sec Loss 9.2704 LearningRate 0.3729 Epoch: 2 Global Step: 10930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:00,177-Speed 11063.55 samples/sec Loss 9.3027 LearningRate 0.3728 Epoch: 2 Global Step: 10940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:03,808-Speed 11284.51 samples/sec Loss 9.3454 LearningRate 0.3727 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:07,306-Speed 11711.72 samples/sec Loss 9.3292 LearningRate 0.3726 Epoch: 2 Global Step: 10960 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:11,029-Speed 11005.90 samples/sec Loss 9.2954 LearningRate 0.3725 Epoch: 2 Global Step: 10970 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:14,748-Speed 11019.01 samples/sec Loss 9.3172 LearningRate 0.3724 Epoch: 2 Global Step: 10980 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:18,586-Speed 10674.19 samples/sec Loss 9.3052 LearningRate 0.3723 Epoch: 2 Global Step: 10990 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:21,984-Speed 12058.73 samples/sec Loss 9.2236 LearningRate 0.3722 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:25,431-Speed 11888.96 samples/sec Loss 9.3037 LearningRate 0.3721 Epoch: 2 Global Step: 11010 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:29,245-Speed 10741.61 samples/sec Loss 9.4028 LearningRate 0.3720 Epoch: 2 Global Step: 11020 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:32,833-Speed 11418.09 samples/sec Loss 9.3522 LearningRate 0.3719 Epoch: 2 Global Step: 11030 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:36,323-Speed 11741.54 samples/sec Loss 9.2640 LearningRate 0.3718 Epoch: 2 Global Step: 11040 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:48:39,760-Speed 11922.71 samples/sec Loss 9.2128 LearningRate 0.3717 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:43,507-Speed 10934.24 samples/sec Loss 9.2708 LearningRate 0.3716 Epoch: 2 Global Step: 11060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:47,116-Speed 11353.18 samples/sec Loss 9.3216 LearningRate 0.3715 Epoch: 2 Global Step: 11070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:50,828-Speed 11037.50 samples/sec Loss 9.3297 LearningRate 0.3714 Epoch: 2 Global Step: 11080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:54,519-Speed 11099.13 samples/sec Loss 9.3180 LearningRate 0.3713 Epoch: 2 Global Step: 11090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:48:57,903-Speed 12111.42 samples/sec Loss 9.3049 LearningRate 0.3712 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:01,320-Speed 11987.88 samples/sec Loss 9.3027 LearningRate 0.3711 Epoch: 2 Global Step: 11110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:04,755-Speed 11929.23 samples/sec Loss 9.2524 LearningRate 0.3710 Epoch: 2 Global Step: 11120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:08,451-Speed 11087.01 samples/sec Loss 9.2471 LearningRate 0.3709 Epoch: 2 Global Step: 11130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:11,992-Speed 11569.09 samples/sec Loss 9.2404 LearningRate 0.3708 Epoch: 2 Global Step: 11140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:16,696-Speed 8711.08 samples/sec Loss 9.2504 LearningRate 0.3707 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:49:20,354-Speed 11198.07 samples/sec Loss 9.2382 LearningRate 0.3706 Epoch: 2 Global Step: 11160 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:49:23,854-Speed 11705.94 samples/sec Loss 9.2394 LearningRate 0.3705 Epoch: 2 Global Step: 11170 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:49:27,368-Speed 11661.80 samples/sec Loss 9.2434 LearningRate 0.3704 Epoch: 2 Global Step: 11180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:30,740-Speed 12149.48 samples/sec Loss 9.1898 LearningRate 0.3703 Epoch: 2 Global Step: 11190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:34,211-Speed 11806.93 samples/sec Loss 9.2077 LearningRate 0.3702 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:37,805-Speed 11398.89 samples/sec Loss 9.2459 LearningRate 0.3701 Epoch: 2 Global Step: 11210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:41,554-Speed 10928.90 samples/sec Loss 9.2814 LearningRate 0.3699 Epoch: 2 Global Step: 11220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:45,162-Speed 11355.98 samples/sec Loss 9.1652 LearningRate 0.3698 Epoch: 2 Global Step: 11230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:48,635-Speed 11800.34 samples/sec Loss 9.2803 LearningRate 0.3697 Epoch: 2 Global Step: 11240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:52,159-Speed 11624.86 samples/sec Loss 9.2691 LearningRate 0.3696 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:55,639-Speed 11775.35 samples/sec Loss 9.1669 LearningRate 0.3695 Epoch: 2 Global Step: 11260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:49:59,500-Speed 10611.56 samples/sec Loss 9.1785 LearningRate 0.3694 Epoch: 2 Global Step: 11270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:50:03,034-Speed 11593.95 samples/sec Loss 9.2315 LearningRate 0.3693 Epoch: 2 Global Step: 11280 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:50:06,495-Speed 11838.15 samples/sec Loss 9.1560 LearningRate 0.3692 Epoch: 2 Global Step: 11290 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:50:10,083-Speed 11420.97 samples/sec Loss 9.2360 LearningRate 0.3691 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:50:13,693-Speed 11350.93 samples/sec Loss 9.2301 LearningRate 0.3690 Epoch: 2 Global Step: 11310 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:50:17,903-Speed 9731.20 samples/sec Loss 9.2102 LearningRate 0.3689 Epoch: 2 Global Step: 11320 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:50:21,799-Speed 10515.41 samples/sec Loss 9.1605 LearningRate 0.3688 Epoch: 2 Global Step: 11330 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:50:25,593-Speed 10800.60 samples/sec Loss 9.1897 LearningRate 0.3687 Epoch: 2 Global Step: 11340 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:50:29,117-Speed 11625.61 samples/sec Loss 9.2572 LearningRate 0.3686 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:32,637-Speed 11640.01 samples/sec Loss 9.1778 LearningRate 0.3685 Epoch: 2 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:36,142-Speed 11691.12 samples/sec Loss 9.1779 LearningRate 0.3684 Epoch: 2 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:39,691-Speed 11543.29 samples/sec Loss 9.1922 LearningRate 0.3683 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:43,461-Speed 10870.01 samples/sec Loss 9.3726 LearningRate 0.3682 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:46,988-Speed 11617.21 samples/sec Loss 9.2030 LearningRate 0.3681 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:50,480-Speed 11733.68 samples/sec Loss 9.1913 LearningRate 0.3680 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:53,897-Speed 11994.12 samples/sec Loss 9.1743 LearningRate 0.3679 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:50:57,484-Speed 11423.26 samples/sec Loss 9.1728 LearningRate 0.3678 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:51:01,016-Speed 11599.52 samples/sec Loss 9.1392 LearningRate 0.3677 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:51:04,397-Speed 12118.28 samples/sec Loss 9.1040 LearningRate 0.3676 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:07,877-Speed 11777.90 samples/sec Loss 9.1546 LearningRate 0.3675 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:11,401-Speed 11628.20 samples/sec Loss 9.1505 LearningRate 0.3674 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:14,843-Speed 11905.09 samples/sec Loss 9.1639 LearningRate 0.3673 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:19,262-Speed 9269.99 samples/sec Loss 9.1975 LearningRate 0.3672 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:22,718-Speed 11856.73 samples/sec Loss 9.1841 LearningRate 0.3671 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:26,469-Speed 10925.13 samples/sec Loss 9.1130 LearningRate 0.3670 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:30,213-Speed 10944.05 samples/sec Loss 9.1385 LearningRate 0.3669 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:33,769-Speed 11522.84 samples/sec Loss 9.1017 LearningRate 0.3668 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:37,433-Speed 11180.59 samples/sec Loss 9.1308 LearningRate 0.3667 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:40,924-Speed 11736.66 samples/sec Loss 9.1837 LearningRate 0.3666 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:44,509-Speed 11430.10 samples/sec Loss 9.1732 LearningRate 0.3665 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:48,142-Speed 11277.40 samples/sec Loss 9.1291 LearningRate 0.3664 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:51,567-Speed 11963.74 samples/sec Loss 9.1710 LearningRate 0.3663 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:55,542-Speed 10307.78 samples/sec Loss 9.0457 LearningRate 0.3662 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:51:59,128-Speed 11424.47 samples/sec Loss 9.1318 LearningRate 0.3661 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:02,811-Speed 11125.58 samples/sec Loss 9.0777 LearningRate 0.3660 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:06,278-Speed 11820.33 samples/sec Loss 9.0624 LearningRate 0.3659 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:09,984-Speed 11054.05 samples/sec Loss 9.0896 LearningRate 0.3658 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:13,681-Speed 11082.59 samples/sec Loss 9.1644 LearningRate 0.3657 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:17,330-Speed 11228.15 samples/sec Loss 9.0373 LearningRate 0.3656 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:52:21,772-Speed 9224.01 samples/sec Loss 9.0538 LearningRate 0.3655 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:25,251-Speed 11782.16 samples/sec Loss 9.0561 LearningRate 0.3654 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:28,889-Speed 11261.21 samples/sec Loss 9.1386 LearningRate 0.3653 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:32,673-Speed 10827.36 samples/sec Loss 9.1050 LearningRate 0.3651 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:36,395-Speed 11007.43 samples/sec Loss 9.1430 LearningRate 0.3650 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:40,002-Speed 11361.19 samples/sec Loss 9.0659 LearningRate 0.3649 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:43,578-Speed 11457.96 samples/sec Loss 9.0041 LearningRate 0.3648 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:47,372-Speed 10800.29 samples/sec Loss 9.0854 LearningRate 0.3647 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:50,807-Speed 11930.69 samples/sec Loss 9.0649 LearningRate 0.3646 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:54,621-Speed 10741.82 samples/sec Loss 9.0551 LearningRate 0.3645 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:52:58,574-Speed 10365.08 samples/sec Loss 9.1550 LearningRate 0.3644 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:53:02,339-Speed 10883.46 samples/sec Loss 9.0780 LearningRate 0.3643 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:53:05,998-Speed 11197.57 samples/sec Loss 9.1168 LearningRate 0.3642 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:53:09,492-Speed 11726.76 samples/sec Loss 9.0543 LearningRate 0.3641 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:13,276-Speed 10828.35 samples/sec Loss 9.0610 LearningRate 0.3640 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:16,708-Speed 11941.24 samples/sec Loss 9.1534 LearningRate 0.3639 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:20,359-Speed 11219.96 samples/sec Loss 9.1177 LearningRate 0.3638 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:23,832-Speed 11798.88 samples/sec Loss 9.0514 LearningRate 0.3637 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:27,390-Speed 11516.90 samples/sec Loss 9.0927 LearningRate 0.3636 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:31,175-Speed 10826.12 samples/sec Loss 9.1161 LearningRate 0.3635 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:34,949-Speed 10855.77 samples/sec Loss 9.1558 LearningRate 0.3634 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:38,681-Speed 10982.30 samples/sec Loss 9.1027 LearningRate 0.3633 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:42,527-Speed 10653.54 samples/sec Loss 9.0570 LearningRate 0.3632 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:53:46,066-Speed 11576.09 samples/sec Loss 9.0079 LearningRate 0.3631 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:53:50,393-Speed 9470.01 samples/sec Loss 9.0877 LearningRate 0.3630 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:53:53,951-Speed 11513.21 samples/sec Loss 9.1717 LearningRate 0.3629 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:53:57,392-Speed 11907.78 samples/sec Loss 9.0562 LearningRate 0.3628 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:01,022-Speed 11287.62 samples/sec Loss 9.0727 LearningRate 0.3627 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:04,683-Speed 11191.61 samples/sec Loss 9.0819 LearningRate 0.3626 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:08,189-Speed 11689.48 samples/sec Loss 9.1632 LearningRate 0.3625 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:11,958-Speed 10871.43 samples/sec Loss 9.0988 LearningRate 0.3624 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:15,386-Speed 11951.19 samples/sec Loss 9.0821 LearningRate 0.3623 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:19,045-Speed 11198.64 samples/sec Loss 9.0045 LearningRate 0.3622 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:22,714-Speed 11167.67 samples/sec Loss 9.0108 LearningRate 0.3621 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:54:26,393-Speed 11137.55 samples/sec Loss 8.9554 LearningRate 0.3620 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:54:29,999-Speed 11361.16 samples/sec Loss 9.0151 LearningRate 0.3619 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:33,512-Speed 11661.98 samples/sec Loss 9.0076 LearningRate 0.3618 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:37,255-Speed 10947.49 samples/sec Loss 8.9991 LearningRate 0.3617 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:40,745-Speed 11740.84 samples/sec Loss 9.0264 LearningRate 0.3616 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:44,708-Speed 10340.21 samples/sec Loss 8.9886 LearningRate 0.3615 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:48,640-Speed 10421.26 samples/sec Loss 8.9761 LearningRate 0.3614 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:53,194-Speed 8995.60 samples/sec Loss 9.0396 LearningRate 0.3613 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:54:56,648-Speed 11863.36 samples/sec Loss 9.0252 LearningRate 0.3612 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:00,247-Speed 11385.07 samples/sec Loss 9.0084 LearningRate 0.3611 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:04,065-Speed 10733.26 samples/sec Loss 9.0010 LearningRate 0.3610 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:07,436-Speed 12154.25 samples/sec Loss 8.9762 LearningRate 0.3609 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:10,817-Speed 12118.69 samples/sec Loss 8.9745 LearningRate 0.3608 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:14,301-Speed 11760.11 samples/sec Loss 9.0168 LearningRate 0.3607 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:17,964-Speed 11183.60 samples/sec Loss 9.0285 LearningRate 0.3606 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:21,470-Speed 11690.98 samples/sec Loss 9.0938 LearningRate 0.3605 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:24,857-Speed 12096.72 samples/sec Loss 9.1006 LearningRate 0.3604 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:28,264-Speed 12024.34 samples/sec Loss 8.9861 LearningRate 0.3603 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:55:31,646-Speed 12117.04 samples/sec Loss 8.9665 LearningRate 0.3602 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:55:35,209-Speed 11497.35 samples/sec Loss 8.9048 LearningRate 0.3601 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:55:38,597-Speed 12096.85 samples/sec Loss 8.9870 LearningRate 0.3600 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:55:42,027-Speed 11946.43 samples/sec Loss 8.9731 LearningRate 0.3599 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:55:45,979-Speed 10366.55 samples/sec Loss 8.9800 LearningRate 0.3598 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:55:49,508-Speed 11613.24 samples/sec Loss 8.9066 LearningRate 0.3597 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:55:53,006-Speed 11712.96 samples/sec Loss 8.9259 LearningRate 0.3596 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:55:57,853-Speed 8453.82 samples/sec Loss 8.9742 LearningRate 0.3595 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:01,589-Speed 10969.07 samples/sec Loss 8.9050 LearningRate 0.3594 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:05,199-Speed 11346.62 samples/sec Loss 8.9269 LearningRate 0.3593 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:08,776-Speed 11454.80 samples/sec Loss 8.9610 LearningRate 0.3592 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:56:12,270-Speed 11728.34 samples/sec Loss 9.0007 LearningRate 0.3591 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:15,725-Speed 11859.96 samples/sec Loss 8.9818 LearningRate 0.3590 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:19,201-Speed 11788.39 samples/sec Loss 8.8821 LearningRate 0.3589 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:23,061-Speed 10614.17 samples/sec Loss 8.9225 LearningRate 0.3588 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:26,553-Speed 11734.34 samples/sec Loss 8.9341 LearningRate 0.3587 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:29,967-Speed 12000.77 samples/sec Loss 8.9615 LearningRate 0.3586 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:33,858-Speed 10530.65 samples/sec Loss 8.9842 LearningRate 0.3585 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:37,660-Speed 10774.58 samples/sec Loss 8.9622 LearningRate 0.3584 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:41,528-Speed 10593.89 samples/sec Loss 8.9190 LearningRate 0.3583 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:45,063-Speed 11595.64 samples/sec Loss 8.9479 LearningRate 0.3582 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 22:56:48,571-Speed 11679.86 samples/sec Loss 8.9815 LearningRate 0.3581 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:56:52,053-Speed 11766.66 samples/sec Loss 8.9576 LearningRate 0.3580 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:56:55,760-Speed 11055.03 samples/sec Loss 8.9263 LearningRate 0.3579 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:00,109-Speed 9419.35 samples/sec Loss 8.9594 LearningRate 0.3578 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:03,573-Speed 11828.42 samples/sec Loss 8.9647 LearningRate 0.3577 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:07,168-Speed 11398.38 samples/sec Loss 8.9729 LearningRate 0.3576 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:11,082-Speed 10467.86 samples/sec Loss 8.9687 LearningRate 0.3575 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:14,662-Speed 11446.52 samples/sec Loss 8.9451 LearningRate 0.3574 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:18,509-Speed 10651.27 samples/sec Loss 8.8894 LearningRate 0.3573 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:22,177-Speed 11169.41 samples/sec Loss 8.8701 LearningRate 0.3572 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:25,807-Speed 11335.98 samples/sec Loss 8.8702 LearningRate 0.3571 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:57:29,593-Speed 10822.24 samples/sec Loss 8.9381 LearningRate 0.3570 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:57:33,581-Speed 10273.29 samples/sec Loss 8.9763 LearningRate 0.3569 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:05,670-Speed 1276.50 samples/sec Loss 8.5585 LearningRate 0.3567 Epoch: 3 Global Step: 12520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:09,139-Speed 11812.08 samples/sec Loss 8.0581 LearningRate 0.3566 Epoch: 3 Global Step: 12530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:13,588-Speed 9210.73 samples/sec Loss 8.0624 LearningRate 0.3565 Epoch: 3 Global Step: 12540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:17,176-Speed 11417.38 samples/sec Loss 8.0462 LearningRate 0.3564 Epoch: 3 Global Step: 12550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:21,546-Speed 9375.68 samples/sec Loss 8.0946 LearningRate 0.3563 Epoch: 3 Global Step: 12560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:25,474-Speed 10432.16 samples/sec Loss 8.0810 LearningRate 0.3562 Epoch: 3 Global Step: 12570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:29,089-Speed 11333.75 samples/sec Loss 8.0873 LearningRate 0.3561 Epoch: 3 Global Step: 12580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:33,071-Speed 10290.01 samples/sec Loss 8.1667 LearningRate 0.3560 Epoch: 3 Global Step: 12590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:36,809-Speed 10960.63 samples/sec Loss 8.1564 LearningRate 0.3559 Epoch: 3 Global Step: 12600 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:58:40,635-Speed 10710.46 samples/sec Loss 8.1295 LearningRate 0.3558 Epoch: 3 Global Step: 12610 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:58:44,259-Speed 11307.27 samples/sec Loss 8.1148 LearningRate 0.3557 Epoch: 3 Global Step: 12620 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:58:48,818-Speed 8986.07 samples/sec Loss 8.2109 LearningRate 0.3556 Epoch: 3 Global Step: 12630 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:58:52,413-Speed 11397.95 samples/sec Loss 8.1778 LearningRate 0.3555 Epoch: 3 Global Step: 12640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:55,974-Speed 11506.86 samples/sec Loss 8.2032 LearningRate 0.3554 Epoch: 3 Global Step: 12650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:58:59,520-Speed 11558.15 samples/sec Loss 8.2180 LearningRate 0.3553 Epoch: 3 Global Step: 12660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:02,952-Speed 11937.82 samples/sec Loss 8.1319 LearningRate 0.3552 Epoch: 3 Global Step: 12670 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:06,606-Speed 11213.93 samples/sec Loss 8.2206 LearningRate 0.3551 Epoch: 3 Global Step: 12680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:10,652-Speed 10125.00 samples/sec Loss 8.2269 LearningRate 0.3550 Epoch: 3 Global Step: 12690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:14,241-Speed 11415.72 samples/sec Loss 8.2469 LearningRate 0.3549 Epoch: 3 Global Step: 12700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:17,840-Speed 11385.98 samples/sec Loss 8.2747 LearningRate 0.3548 Epoch: 3 Global Step: 12710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:21,442-Speed 11376.96 samples/sec Loss 8.2049 LearningRate 0.3547 Epoch: 3 Global Step: 12720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:24,805-Speed 12181.22 samples/sec Loss 8.2857 LearningRate 0.3546 Epoch: 3 Global Step: 12730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 22:59:28,310-Speed 11691.88 samples/sec Loss 8.3422 LearningRate 0.3545 Epoch: 3 Global Step: 12740 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:32,089-Speed 10841.79 samples/sec Loss 8.2799 LearningRate 0.3544 Epoch: 3 Global Step: 12750 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:35,733-Speed 11243.90 samples/sec Loss 8.2908 LearningRate 0.3543 Epoch: 3 Global Step: 12760 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:39,550-Speed 10736.31 samples/sec Loss 8.3282 LearningRate 0.3542 Epoch: 3 Global Step: 12770 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:43,713-Speed 9841.90 samples/sec Loss 8.3651 LearningRate 0.3541 Epoch: 3 Global Step: 12780 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:47,422-Speed 11046.38 samples/sec Loss 8.3322 LearningRate 0.3540 Epoch: 3 Global Step: 12790 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:51,245-Speed 10716.42 samples/sec Loss 8.3111 LearningRate 0.3539 Epoch: 3 Global Step: 12800 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:55,179-Speed 10414.30 samples/sec Loss 8.3436 LearningRate 0.3538 Epoch: 3 Global Step: 12810 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 22:59:59,146-Speed 10327.67 samples/sec Loss 8.4048 LearningRate 0.3537 Epoch: 3 Global Step: 12820 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:02,660-Speed 11661.83 samples/sec Loss 8.3161 LearningRate 0.3536 Epoch: 3 Global Step: 12830 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:06,331-Speed 11159.00 samples/sec Loss 8.4037 LearningRate 0.3535 Epoch: 3 Global Step: 12840 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:10,023-Speed 11097.63 samples/sec Loss 8.3887 LearningRate 0.3534 Epoch: 3 Global Step: 12850 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:13,578-Speed 11526.08 samples/sec Loss 8.3708 LearningRate 0.3533 Epoch: 3 Global Step: 12860 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:17,213-Speed 11271.26 samples/sec Loss 8.4270 LearningRate 0.3532 Epoch: 3 Global Step: 12870 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:20,762-Speed 11545.18 samples/sec Loss 8.3936 LearningRate 0.3531 Epoch: 3 Global Step: 12880 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:24,420-Speed 11200.81 samples/sec Loss 8.3999 LearningRate 0.3530 Epoch: 3 Global Step: 12890 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:28,058-Speed 11262.91 samples/sec Loss 8.4485 LearningRate 0.3529 Epoch: 3 Global Step: 12900 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:31,773-Speed 11031.46 samples/sec Loss 8.4210 LearningRate 0.3528 Epoch: 3 Global Step: 12910 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:36,371-Speed 8910.24 samples/sec Loss 8.3673 LearningRate 0.3527 Epoch: 3 Global Step: 12920 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:40,165-Speed 10797.99 samples/sec Loss 8.4165 LearningRate 0.3526 Epoch: 3 Global Step: 12930 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:00:43,877-Speed 11037.32 samples/sec Loss 8.4112 LearningRate 0.3525 Epoch: 3 Global Step: 12940 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:00:47,451-Speed 11468.95 samples/sec Loss 8.4399 LearningRate 0.3524 Epoch: 3 Global Step: 12950 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:00:50,937-Speed 11756.04 samples/sec Loss 8.4433 LearningRate 0.3523 Epoch: 3 Global Step: 12960 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:00:54,315-Speed 12130.02 samples/sec Loss 8.4612 LearningRate 0.3522 Epoch: 3 Global Step: 12970 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:00:57,716-Speed 12046.87 samples/sec Loss 8.4272 LearningRate 0.3521 Epoch: 3 Global Step: 12980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:01,112-Speed 12064.47 samples/sec Loss 8.4314 LearningRate 0.3520 Epoch: 3 Global Step: 12990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:04,554-Speed 11905.72 samples/sec Loss 8.4390 LearningRate 0.3519 Epoch: 3 Global Step: 13000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:07,971-Speed 11992.44 samples/sec Loss 8.4741 LearningRate 0.3518 Epoch: 3 Global Step: 13010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:11,391-Speed 11980.14 samples/sec Loss 8.4596 LearningRate 0.3517 Epoch: 3 Global Step: 13020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:15,052-Speed 11193.40 samples/sec Loss 8.4697 LearningRate 0.3516 Epoch: 3 Global Step: 13030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:18,528-Speed 11786.48 samples/sec Loss 8.4756 LearningRate 0.3515 Epoch: 3 Global Step: 13040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:22,311-Speed 10829.32 samples/sec Loss 8.4625 LearningRate 0.3514 Epoch: 3 Global Step: 13050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:26,003-Speed 11098.65 samples/sec Loss 8.4704 LearningRate 0.3513 Epoch: 3 Global Step: 13060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:29,419-Speed 11993.11 samples/sec Loss 8.4205 LearningRate 0.3512 Epoch: 3 Global Step: 13070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:32,907-Speed 11751.65 samples/sec Loss 8.4670 LearningRate 0.3511 Epoch: 3 Global Step: 13080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:36,345-Speed 11917.12 samples/sec Loss 8.4635 LearningRate 0.3510 Epoch: 3 Global Step: 13090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:40,087-Speed 10948.69 samples/sec Loss 8.4606 LearningRate 0.3509 Epoch: 3 Global Step: 13100 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:01:44,198-Speed 9966.31 samples/sec Loss 8.4435 LearningRate 0.3508 Epoch: 3 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:01:48,028-Speed 10698.33 samples/sec Loss 8.4657 LearningRate 0.3507 Epoch: 3 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:01:51,592-Speed 11496.58 samples/sec Loss 8.4934 LearningRate 0.3506 Epoch: 3 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:01:55,278-Speed 11116.56 samples/sec Loss 8.4574 LearningRate 0.3505 Epoch: 3 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:01:58,939-Speed 11191.60 samples/sec Loss 8.4732 LearningRate 0.3504 Epoch: 3 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:02:02,378-Speed 11916.22 samples/sec Loss 8.5368 LearningRate 0.3503 Epoch: 3 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:02:05,833-Speed 11856.60 samples/sec Loss 8.5356 LearningRate 0.3502 Epoch: 3 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:02:09,348-Speed 11655.72 samples/sec Loss 8.5311 LearningRate 0.3501 Epoch: 3 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:02:13,208-Speed 10616.08 samples/sec Loss 8.5219 LearningRate 0.3500 Epoch: 3 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:02:16,668-Speed 11843.04 samples/sec Loss 8.4879 LearningRate 0.3499 Epoch: 3 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:02:20,141-Speed 11798.37 samples/sec Loss 8.5485 LearningRate 0.3498 Epoch: 3 Global Step: 13210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:23,697-Speed 11519.74 samples/sec Loss 8.4693 LearningRate 0.3497 Epoch: 3 Global Step: 13220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:27,440-Speed 10947.96 samples/sec Loss 8.5154 LearningRate 0.3496 Epoch: 3 Global Step: 13230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:30,896-Speed 11856.77 samples/sec Loss 8.5236 LearningRate 0.3495 Epoch: 3 Global Step: 13240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:34,605-Speed 11047.41 samples/sec Loss 8.4822 LearningRate 0.3494 Epoch: 3 Global Step: 13250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:38,140-Speed 11591.12 samples/sec Loss 8.4745 LearningRate 0.3493 Epoch: 3 Global Step: 13260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:41,815-Speed 11149.16 samples/sec Loss 8.4369 LearningRate 0.3492 Epoch: 3 Global Step: 13270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:45,387-Speed 11469.63 samples/sec Loss 8.4824 LearningRate 0.3491 Epoch: 3 Global Step: 13280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:48,831-Speed 11897.31 samples/sec Loss 8.5531 LearningRate 0.3490 Epoch: 3 Global Step: 13290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:52,535-Speed 11061.31 samples/sec Loss 8.4972 LearningRate 0.3489 Epoch: 3 Global Step: 13300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:02:55,960-Speed 11965.03 samples/sec Loss 8.4494 LearningRate 0.3488 Epoch: 3 Global Step: 13310 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:02:59,584-Speed 11306.40 samples/sec Loss 8.5074 LearningRate 0.3487 Epoch: 3 Global Step: 13320 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:03:03,210-Speed 11299.02 samples/sec Loss 8.5529 LearningRate 0.3486 Epoch: 3 Global Step: 13330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:06,986-Speed 10851.01 samples/sec Loss 8.5229 LearningRate 0.3485 Epoch: 3 Global Step: 13340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:10,563-Speed 11453.75 samples/sec Loss 8.5873 LearningRate 0.3484 Epoch: 3 Global Step: 13350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:14,220-Speed 11206.36 samples/sec Loss 8.5327 LearningRate 0.3483 Epoch: 3 Global Step: 13360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:17,831-Speed 11346.47 samples/sec Loss 8.5382 LearningRate 0.3482 Epoch: 3 Global Step: 13370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:21,470-Speed 11258.99 samples/sec Loss 8.5633 LearningRate 0.3482 Epoch: 3 Global Step: 13380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:24,991-Speed 11637.55 samples/sec Loss 8.4647 LearningRate 0.3481 Epoch: 3 Global Step: 13390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:28,445-Speed 11862.12 samples/sec Loss 8.5487 LearningRate 0.3480 Epoch: 3 Global Step: 13400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:32,134-Speed 11107.23 samples/sec Loss 8.5715 LearningRate 0.3479 Epoch: 3 Global Step: 13410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:35,874-Speed 10957.32 samples/sec Loss 8.4997 LearningRate 0.3478 Epoch: 3 Global Step: 13420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:39,667-Speed 10799.93 samples/sec Loss 8.5204 LearningRate 0.3477 Epoch: 3 Global Step: 13430 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:43,452-Speed 10826.75 samples/sec Loss 8.4874 LearningRate 0.3476 Epoch: 3 Global Step: 13440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:03:47,224-Speed 10861.37 samples/sec Loss 8.4637 LearningRate 0.3475 Epoch: 3 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:03:51,023-Speed 10786.04 samples/sec Loss 8.5102 LearningRate 0.3474 Epoch: 3 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:03:54,720-Speed 11081.98 samples/sec Loss 8.4912 LearningRate 0.3473 Epoch: 3 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:03:58,556-Speed 10682.30 samples/sec Loss 8.5100 LearningRate 0.3472 Epoch: 3 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:04:02,746-Speed 9778.56 samples/sec Loss 8.4627 LearningRate 0.3471 Epoch: 3 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:04:06,283-Speed 11584.06 samples/sec Loss 8.4426 LearningRate 0.3470 Epoch: 3 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:04:10,160-Speed 10568.87 samples/sec Loss 8.5328 LearningRate 0.3469 Epoch: 3 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:04:14,446-Speed 9558.76 samples/sec Loss 8.4931 LearningRate 0.3468 Epoch: 3 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:04:17,969-Speed 11629.80 samples/sec Loss 8.5505 LearningRate 0.3467 Epoch: 3 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:04:21,750-Speed 10835.80 samples/sec Loss 8.5616 LearningRate 0.3466 Epoch: 3 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 23:04:25,417-Speed 11174.08 samples/sec Loss 8.5121 LearningRate 0.3465 Epoch: 3 Global Step: 13550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:29,039-Speed 11311.55 samples/sec Loss 8.5708 LearningRate 0.3464 Epoch: 3 Global Step: 13560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:32,490-Speed 11871.58 samples/sec Loss 8.5013 LearningRate 0.3463 Epoch: 3 Global Step: 13570 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:36,265-Speed 10853.15 samples/sec Loss 8.5625 LearningRate 0.3462 Epoch: 3 Global Step: 13580 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:39,982-Speed 11085.46 samples/sec Loss 8.5176 LearningRate 0.3461 Epoch: 3 Global Step: 13590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:43,462-Speed 11775.17 samples/sec Loss 8.5330 LearningRate 0.3460 Epoch: 3 Global Step: 13600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:46,979-Speed 11651.36 samples/sec Loss 8.5433 LearningRate 0.3459 Epoch: 3 Global Step: 13610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:50,624-Speed 11241.97 samples/sec Loss 8.5299 LearningRate 0.3458 Epoch: 3 Global Step: 13620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:54,589-Speed 10333.76 samples/sec Loss 8.5623 LearningRate 0.3457 Epoch: 3 Global Step: 13630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:04:58,433-Speed 10658.44 samples/sec Loss 8.5543 LearningRate 0.3456 Epoch: 3 Global Step: 13640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:02,010-Speed 11456.47 samples/sec Loss 8.5628 LearningRate 0.3455 Epoch: 3 Global Step: 13650 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:05:05,783-Speed 10859.61 samples/sec Loss 8.5439 LearningRate 0.3454 Epoch: 3 Global Step: 13660 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:05:09,580-Speed 10791.35 samples/sec Loss 8.5107 LearningRate 0.3453 Epoch: 3 Global Step: 13670 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:05:13,195-Speed 11332.63 samples/sec Loss 8.5281 LearningRate 0.3452 Epoch: 3 Global Step: 13680 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:16,965-Speed 10867.63 samples/sec Loss 8.5284 LearningRate 0.3451 Epoch: 3 Global Step: 13690 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:20,475-Speed 11672.98 samples/sec Loss 8.5267 LearningRate 0.3450 Epoch: 3 Global Step: 13700 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:24,351-Speed 10570.13 samples/sec Loss 8.5010 LearningRate 0.3449 Epoch: 3 Global Step: 13710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:28,595-Speed 9658.23 samples/sec Loss 8.5330 LearningRate 0.3448 Epoch: 3 Global Step: 13720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:32,292-Speed 11081.02 samples/sec Loss 8.4620 LearningRate 0.3447 Epoch: 3 Global Step: 13730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:36,457-Speed 9837.56 samples/sec Loss 8.5390 LearningRate 0.3446 Epoch: 3 Global Step: 13740 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:39,972-Speed 11653.59 samples/sec Loss 8.5829 LearningRate 0.3445 Epoch: 3 Global Step: 13750 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:43,561-Speed 11422.83 samples/sec Loss 8.4282 LearningRate 0.3444 Epoch: 3 Global Step: 13760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:47,327-Speed 10878.82 samples/sec Loss 8.5195 LearningRate 0.3443 Epoch: 3 Global Step: 13770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 23:05:50,931-Speed 11367.58 samples/sec Loss 8.6338 LearningRate 0.3442 Epoch: 3 Global Step: 13780 Fp16 Grad Scale: 524288 Required: 8 hours Training: 2022-01-16 23:05:54,619-Speed 11109.57 samples/sec Loss 8.5124 LearningRate 0.3441 Epoch: 3 Global Step: 13790 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:05:58,130-Speed 11669.31 samples/sec Loss 8.5826 LearningRate 0.3440 Epoch: 3 Global Step: 13800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:01,697-Speed 11488.25 samples/sec Loss 8.5604 LearningRate 0.3439 Epoch: 3 Global Step: 13810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:05,410-Speed 11033.92 samples/sec Loss 8.5261 LearningRate 0.3438 Epoch: 3 Global Step: 13820 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:08,897-Speed 11751.16 samples/sec Loss 8.5882 LearningRate 0.3437 Epoch: 3 Global Step: 13830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:12,448-Speed 11539.48 samples/sec Loss 8.5616 LearningRate 0.3436 Epoch: 3 Global Step: 13840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:15,969-Speed 11637.26 samples/sec Loss 8.4503 LearningRate 0.3435 Epoch: 3 Global Step: 13850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:19,483-Speed 11658.91 samples/sec Loss 8.5103 LearningRate 0.3434 Epoch: 3 Global Step: 13860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:23,093-Speed 11351.24 samples/sec Loss 8.5719 LearningRate 0.3433 Epoch: 3 Global Step: 13870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:26,898-Speed 10768.02 samples/sec Loss 8.4860 LearningRate 0.3432 Epoch: 3 Global Step: 13880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:30,595-Speed 11081.47 samples/sec Loss 8.5711 LearningRate 0.3431 Epoch: 3 Global Step: 13890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:34,059-Speed 11829.71 samples/sec Loss 8.5730 LearningRate 0.3430 Epoch: 3 Global Step: 13900 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:06:38,316-Speed 9623.72 samples/sec Loss 8.7212 LearningRate 0.3429 Epoch: 3 Global Step: 13910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:41,857-Speed 11573.89 samples/sec Loss 8.5080 LearningRate 0.3428 Epoch: 3 Global Step: 13920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:45,803-Speed 10382.78 samples/sec Loss 8.5183 LearningRate 0.3427 Epoch: 3 Global Step: 13930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:49,477-Speed 11154.19 samples/sec Loss 8.5242 LearningRate 0.3426 Epoch: 3 Global Step: 13940 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:52,992-Speed 11656.21 samples/sec Loss 8.5098 LearningRate 0.3425 Epoch: 3 Global Step: 13950 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:06:56,816-Speed 10715.07 samples/sec Loss 8.5367 LearningRate 0.3424 Epoch: 3 Global Step: 13960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:00,636-Speed 10725.77 samples/sec Loss 8.5290 LearningRate 0.3423 Epoch: 3 Global Step: 13970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:04,293-Speed 11204.91 samples/sec Loss 8.4335 LearningRate 0.3422 Epoch: 3 Global Step: 13980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:07,856-Speed 11501.71 samples/sec Loss 8.4973 LearningRate 0.3421 Epoch: 3 Global Step: 13990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:11,698-Speed 10663.79 samples/sec Loss 8.5291 LearningRate 0.3420 Epoch: 3 Global Step: 14000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:15,487-Speed 10813.24 samples/sec Loss 8.5760 LearningRate 0.3419 Epoch: 3 Global Step: 14010 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:19,048-Speed 11504.57 samples/sec Loss 8.5145 LearningRate 0.3418 Epoch: 3 Global Step: 14020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:22,555-Speed 11683.93 samples/sec Loss 8.5695 LearningRate 0.3417 Epoch: 3 Global Step: 14030 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:26,571-Speed 10202.52 samples/sec Loss 8.4742 LearningRate 0.3416 Epoch: 3 Global Step: 14040 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:30,292-Speed 11012.86 samples/sec Loss 8.4878 LearningRate 0.3415 Epoch: 3 Global Step: 14050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:33,764-Speed 11801.80 samples/sec Loss 8.4910 LearningRate 0.3414 Epoch: 3 Global Step: 14060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:37,621-Speed 10620.50 samples/sec Loss 8.4450 LearningRate 0.3413 Epoch: 3 Global Step: 14070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:41,228-Speed 11360.52 samples/sec Loss 8.5245 LearningRate 0.3412 Epoch: 3 Global Step: 14080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:45,104-Speed 10570.53 samples/sec Loss 8.4965 LearningRate 0.3411 Epoch: 3 Global Step: 14090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:48,584-Speed 11775.55 samples/sec Loss 8.5879 LearningRate 0.3410 Epoch: 3 Global Step: 14100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:07:52,207-Speed 11306.33 samples/sec Loss 8.5270 LearningRate 0.3409 Epoch: 3 Global Step: 14110 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:07:55,689-Speed 11768.76 samples/sec Loss 8.5334 LearningRate 0.3408 Epoch: 3 Global Step: 14120 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:07:59,858-Speed 9827.87 samples/sec Loss 8.4463 LearningRate 0.3407 Epoch: 3 Global Step: 14130 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:08:03,509-Speed 11225.16 samples/sec Loss 8.5260 LearningRate 0.3406 Epoch: 3 Global Step: 14140 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:07,022-Speed 11660.94 samples/sec Loss 8.4482 LearningRate 0.3405 Epoch: 3 Global Step: 14150 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:10,742-Speed 11015.53 samples/sec Loss 8.4936 LearningRate 0.3404 Epoch: 3 Global Step: 14160 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:14,466-Speed 11001.62 samples/sec Loss 8.4519 LearningRate 0.3403 Epoch: 3 Global Step: 14170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:18,114-Speed 11231.77 samples/sec Loss 8.4972 LearningRate 0.3402 Epoch: 3 Global Step: 14180 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:22,036-Speed 10446.81 samples/sec Loss 8.5228 LearningRate 0.3401 Epoch: 3 Global Step: 14190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:25,669-Speed 11276.49 samples/sec Loss 8.5181 LearningRate 0.3400 Epoch: 3 Global Step: 14200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:29,467-Speed 10787.34 samples/sec Loss 8.5178 LearningRate 0.3399 Epoch: 3 Global Step: 14210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:33,467-Speed 10244.76 samples/sec Loss 8.5238 LearningRate 0.3399 Epoch: 3 Global Step: 14220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:36,986-Speed 11642.90 samples/sec Loss 8.5721 LearningRate 0.3398 Epoch: 3 Global Step: 14230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:08:40,753-Speed 10875.30 samples/sec Loss 8.5821 LearningRate 0.3397 Epoch: 3 Global Step: 14240 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:08:45,071-Speed 9489.43 samples/sec Loss 8.4583 LearningRate 0.3396 Epoch: 3 Global Step: 14250 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:08:49,173-Speed 9987.14 samples/sec Loss 8.4563 LearningRate 0.3395 Epoch: 3 Global Step: 14260 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:08:52,852-Speed 11138.46 samples/sec Loss 8.5014 LearningRate 0.3394 Epoch: 3 Global Step: 14270 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:08:56,620-Speed 10871.93 samples/sec Loss 8.5389 LearningRate 0.3393 Epoch: 3 Global Step: 14280 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:09:00,126-Speed 11687.30 samples/sec Loss 8.5147 LearningRate 0.3392 Epoch: 3 Global Step: 14290 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:09:04,273-Speed 9880.03 samples/sec Loss 8.5050 LearningRate 0.3391 Epoch: 3 Global Step: 14300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:07,873-Speed 11382.01 samples/sec Loss 8.4527 LearningRate 0.3390 Epoch: 3 Global Step: 14310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:11,632-Speed 10902.02 samples/sec Loss 8.4582 LearningRate 0.3389 Epoch: 3 Global Step: 14320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:15,426-Speed 10797.55 samples/sec Loss 8.5238 LearningRate 0.3388 Epoch: 3 Global Step: 14330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:18,904-Speed 11782.32 samples/sec Loss 8.4739 LearningRate 0.3387 Epoch: 3 Global Step: 14340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:22,517-Speed 11340.98 samples/sec Loss 8.4878 LearningRate 0.3386 Epoch: 3 Global Step: 14350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:26,221-Speed 11062.43 samples/sec Loss 8.4800 LearningRate 0.3385 Epoch: 3 Global Step: 14360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:30,047-Speed 10708.38 samples/sec Loss 8.4687 LearningRate 0.3384 Epoch: 3 Global Step: 14370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:33,839-Speed 10806.99 samples/sec Loss 8.4182 LearningRate 0.3383 Epoch: 3 Global Step: 14380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:37,863-Speed 10182.88 samples/sec Loss 8.4773 LearningRate 0.3382 Epoch: 3 Global Step: 14390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:41,711-Speed 10645.45 samples/sec Loss 8.5141 LearningRate 0.3381 Epoch: 3 Global Step: 14400 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:09:45,200-Speed 11746.09 samples/sec Loss 8.4848 LearningRate 0.3380 Epoch: 3 Global Step: 14410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:48,661-Speed 11836.89 samples/sec Loss 8.4798 LearningRate 0.3379 Epoch: 3 Global Step: 14420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:52,170-Speed 11678.37 samples/sec Loss 8.4879 LearningRate 0.3378 Epoch: 3 Global Step: 14430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:55,678-Speed 11679.23 samples/sec Loss 8.4914 LearningRate 0.3377 Epoch: 3 Global Step: 14440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:09:59,648-Speed 10321.02 samples/sec Loss 8.4261 LearningRate 0.3376 Epoch: 3 Global Step: 14450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:03,231-Speed 11435.84 samples/sec Loss 8.4737 LearningRate 0.3375 Epoch: 3 Global Step: 14460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:07,341-Speed 9968.35 samples/sec Loss 8.4565 LearningRate 0.3374 Epoch: 3 Global Step: 14470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:11,148-Speed 10762.44 samples/sec Loss 8.5199 LearningRate 0.3373 Epoch: 3 Global Step: 14480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:14,687-Speed 11579.36 samples/sec Loss 8.4319 LearningRate 0.3372 Epoch: 3 Global Step: 14490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:18,202-Speed 11654.46 samples/sec Loss 8.4536 LearningRate 0.3371 Epoch: 3 Global Step: 14500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:21,960-Speed 10902.28 samples/sec Loss 8.4283 LearningRate 0.3370 Epoch: 3 Global Step: 14510 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:10:25,391-Speed 11944.89 samples/sec Loss 8.5546 LearningRate 0.3369 Epoch: 3 Global Step: 14520 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:10:29,071-Speed 11136.02 samples/sec Loss 8.5284 LearningRate 0.3368 Epoch: 3 Global Step: 14530 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:10:32,695-Speed 11304.35 samples/sec Loss 8.4427 LearningRate 0.3367 Epoch: 3 Global Step: 14540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:36,229-Speed 11596.00 samples/sec Loss 8.4675 LearningRate 0.3366 Epoch: 3 Global Step: 14550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:39,869-Speed 11254.48 samples/sec Loss 8.4683 LearningRate 0.3365 Epoch: 3 Global Step: 14560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:43,496-Speed 11296.47 samples/sec Loss 8.4475 LearningRate 0.3364 Epoch: 3 Global Step: 14570 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:47,078-Speed 11439.56 samples/sec Loss 8.4654 LearningRate 0.3363 Epoch: 3 Global Step: 14580 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:50,553-Speed 11792.83 samples/sec Loss 8.4526 LearningRate 0.3362 Epoch: 3 Global Step: 14590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:54,050-Speed 11714.95 samples/sec Loss 8.4532 LearningRate 0.3361 Epoch: 3 Global Step: 14600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:10:57,723-Speed 11155.72 samples/sec Loss 8.4773 LearningRate 0.3360 Epoch: 3 Global Step: 14610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:01,289-Speed 11489.04 samples/sec Loss 8.3661 LearningRate 0.3359 Epoch: 3 Global Step: 14620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:04,915-Speed 11299.40 samples/sec Loss 8.4374 LearningRate 0.3358 Epoch: 3 Global Step: 14630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:09,634-Speed 8683.91 samples/sec Loss 8.4800 LearningRate 0.3357 Epoch: 3 Global Step: 14640 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:11:13,281-Speed 11235.37 samples/sec Loss 8.4928 LearningRate 0.3356 Epoch: 3 Global Step: 14650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:16,830-Speed 11546.12 samples/sec Loss 8.4511 LearningRate 0.3355 Epoch: 3 Global Step: 14660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:20,430-Speed 11380.10 samples/sec Loss 8.4908 LearningRate 0.3354 Epoch: 3 Global Step: 14670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:23,957-Speed 11618.25 samples/sec Loss 8.4583 LearningRate 0.3353 Epoch: 3 Global Step: 14680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:27,619-Speed 11190.08 samples/sec Loss 8.4689 LearningRate 0.3353 Epoch: 3 Global Step: 14690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:31,703-Speed 10030.97 samples/sec Loss 8.4207 LearningRate 0.3352 Epoch: 3 Global Step: 14700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:35,891-Speed 9784.53 samples/sec Loss 8.3803 LearningRate 0.3351 Epoch: 3 Global Step: 14710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:39,478-Speed 11422.67 samples/sec Loss 8.5550 LearningRate 0.3350 Epoch: 3 Global Step: 14720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:43,000-Speed 11632.54 samples/sec Loss 8.4431 LearningRate 0.3349 Epoch: 3 Global Step: 14730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:46,524-Speed 11626.68 samples/sec Loss 8.4425 LearningRate 0.3348 Epoch: 3 Global Step: 14740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:11:50,203-Speed 11138.06 samples/sec Loss 8.4591 LearningRate 0.3347 Epoch: 3 Global Step: 14750 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:11:53,939-Speed 10967.39 samples/sec Loss 8.4670 LearningRate 0.3346 Epoch: 3 Global Step: 14760 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:11:57,630-Speed 11099.43 samples/sec Loss 8.5004 LearningRate 0.3345 Epoch: 3 Global Step: 14770 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:01,292-Speed 11188.85 samples/sec Loss 8.4378 LearningRate 0.3344 Epoch: 3 Global Step: 14780 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:04,854-Speed 11504.35 samples/sec Loss 8.4049 LearningRate 0.3343 Epoch: 3 Global Step: 14790 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:08,939-Speed 10029.75 samples/sec Loss 8.4595 LearningRate 0.3342 Epoch: 3 Global Step: 14800 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:13,213-Speed 9585.75 samples/sec Loss 8.4405 LearningRate 0.3341 Epoch: 3 Global Step: 14810 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:16,928-Speed 11028.72 samples/sec Loss 8.4859 LearningRate 0.3340 Epoch: 3 Global Step: 14820 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:20,783-Speed 10629.65 samples/sec Loss 8.4203 LearningRate 0.3339 Epoch: 3 Global Step: 14830 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:24,540-Speed 10906.56 samples/sec Loss 8.3720 LearningRate 0.3338 Epoch: 3 Global Step: 14840 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:28,221-Speed 11131.60 samples/sec Loss 8.4666 LearningRate 0.3337 Epoch: 3 Global Step: 14850 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:32,002-Speed 10836.26 samples/sec Loss 8.4837 LearningRate 0.3336 Epoch: 3 Global Step: 14860 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:35,706-Speed 11063.59 samples/sec Loss 8.3569 LearningRate 0.3335 Epoch: 3 Global Step: 14870 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:12:39,629-Speed 10443.43 samples/sec Loss 8.5055 LearningRate 0.3334 Epoch: 3 Global Step: 14880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:12:43,365-Speed 10969.04 samples/sec Loss 8.3794 LearningRate 0.3333 Epoch: 3 Global Step: 14890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:12:46,899-Speed 11591.02 samples/sec Loss 8.4553 LearningRate 0.3332 Epoch: 3 Global Step: 14900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:12:50,387-Speed 11759.01 samples/sec Loss 8.3575 LearningRate 0.3331 Epoch: 3 Global Step: 14910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:12:54,435-Speed 10121.61 samples/sec Loss 8.4064 LearningRate 0.3330 Epoch: 3 Global Step: 14920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:12:58,487-Speed 10111.79 samples/sec Loss 8.3296 LearningRate 0.3329 Epoch: 3 Global Step: 14930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:13:01,985-Speed 11713.12 samples/sec Loss 8.3871 LearningRate 0.3328 Epoch: 3 Global Step: 14940 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:13:05,405-Speed 11982.30 samples/sec Loss 8.4581 LearningRate 0.3327 Epoch: 3 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:13:09,125-Speed 11013.24 samples/sec Loss 8.4703 LearningRate 0.3326 Epoch: 3 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:13:12,991-Speed 10597.37 samples/sec Loss 8.3881 LearningRate 0.3325 Epoch: 3 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:13:16,770-Speed 10843.90 samples/sec Loss 8.3916 LearningRate 0.3324 Epoch: 3 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:13:20,466-Speed 11086.26 samples/sec Loss 8.4836 LearningRate 0.3323 Epoch: 3 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:13:24,154-Speed 11109.49 samples/sec Loss 8.3837 LearningRate 0.3322 Epoch: 3 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:13:45,385-[lfw][15000]XNorm: 13.495657 Training: 2022-01-16 23:13:45,385-[lfw][15000]Accuracy-Flip: 0.99450+-0.00395 Training: 2022-01-16 23:13:45,386-[lfw][15000]Accuracy-Highest: 0.99450 Training: 2022-01-16 23:14:09,896-[cfp_fp][15000]XNorm: 11.284248 Training: 2022-01-16 23:14:09,897-[cfp_fp][15000]Accuracy-Flip: 0.95086+-0.01430 Training: 2022-01-16 23:14:09,897-[cfp_fp][15000]Accuracy-Highest: 0.95086 Training: 2022-01-16 23:14:31,133-[agedb_30][15000]XNorm: 12.950405 Training: 2022-01-16 23:14:31,133-[agedb_30][15000]Accuracy-Flip: 0.95383+-0.01078 Training: 2022-01-16 23:14:31,134-[agedb_30][15000]Accuracy-Highest: 0.95383 Training: 2022-01-16 23:14:34,539-Speed 581.95 samples/sec Loss 8.3639 LearningRate 0.3321 Epoch: 3 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:14:37,906-Speed 12169.09 samples/sec Loss 8.4245 LearningRate 0.3320 Epoch: 3 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:14:42,136-Speed 9685.02 samples/sec Loss 8.3024 LearningRate 0.3319 Epoch: 3 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:14:45,691-Speed 11527.66 samples/sec Loss 8.3857 LearningRate 0.3318 Epoch: 3 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:14:49,689-Speed 10246.56 samples/sec Loss 8.3949 LearningRate 0.3318 Epoch: 3 Global Step: 15050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:14:53,208-Speed 11644.91 samples/sec Loss 8.4441 LearningRate 0.3317 Epoch: 3 Global Step: 15060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:14:56,754-Speed 11555.80 samples/sec Loss 8.4051 LearningRate 0.3316 Epoch: 3 Global Step: 15070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:00,686-Speed 10419.84 samples/sec Loss 8.2900 LearningRate 0.3315 Epoch: 3 Global Step: 15080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:04,437-Speed 10923.07 samples/sec Loss 8.3897 LearningRate 0.3314 Epoch: 3 Global Step: 15090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:08,016-Speed 11447.65 samples/sec Loss 8.3849 LearningRate 0.3313 Epoch: 3 Global Step: 15100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:11,646-Speed 11288.11 samples/sec Loss 8.3415 LearningRate 0.3312 Epoch: 3 Global Step: 15110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:15,364-Speed 11017.81 samples/sec Loss 8.3625 LearningRate 0.3311 Epoch: 3 Global Step: 15120 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:19,358-Speed 10260.38 samples/sec Loss 8.3760 LearningRate 0.3310 Epoch: 3 Global Step: 15130 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:23,009-Speed 11223.05 samples/sec Loss 8.3667 LearningRate 0.3309 Epoch: 3 Global Step: 15140 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:26,986-Speed 10302.39 samples/sec Loss 8.4535 LearningRate 0.3308 Epoch: 3 Global Step: 15150 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:15:30,635-Speed 11227.80 samples/sec Loss 8.3286 LearningRate 0.3307 Epoch: 3 Global Step: 15160 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:15:34,367-Speed 10979.19 samples/sec Loss 8.3430 LearningRate 0.3306 Epoch: 3 Global Step: 15170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:38,030-Speed 11186.38 samples/sec Loss 8.3626 LearningRate 0.3305 Epoch: 3 Global Step: 15180 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:41,509-Speed 11777.09 samples/sec Loss 8.3067 LearningRate 0.3304 Epoch: 3 Global Step: 15190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:45,860-Speed 9414.80 samples/sec Loss 8.3160 LearningRate 0.3303 Epoch: 3 Global Step: 15200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:49,413-Speed 11533.99 samples/sec Loss 8.3308 LearningRate 0.3302 Epoch: 3 Global Step: 15210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:53,065-Speed 11221.26 samples/sec Loss 8.3680 LearningRate 0.3301 Epoch: 3 Global Step: 15220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:15:56,636-Speed 11483.05 samples/sec Loss 8.3019 LearningRate 0.3300 Epoch: 3 Global Step: 15230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:16:00,514-Speed 10564.01 samples/sec Loss 8.3699 LearningRate 0.3299 Epoch: 3 Global Step: 15240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:16:04,101-Speed 11422.36 samples/sec Loss 8.4165 LearningRate 0.3298 Epoch: 3 Global Step: 15250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:16:07,878-Speed 10848.16 samples/sec Loss 8.3826 LearningRate 0.3297 Epoch: 3 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:11,612-Speed 10974.61 samples/sec Loss 8.3758 LearningRate 0.3296 Epoch: 3 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:15,059-Speed 11888.17 samples/sec Loss 8.3670 LearningRate 0.3295 Epoch: 3 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:18,651-Speed 11406.05 samples/sec Loss 8.3584 LearningRate 0.3294 Epoch: 3 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:22,169-Speed 11645.48 samples/sec Loss 8.2967 LearningRate 0.3293 Epoch: 3 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:25,819-Speed 11225.33 samples/sec Loss 8.3091 LearningRate 0.3292 Epoch: 3 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:29,595-Speed 10853.00 samples/sec Loss 8.2699 LearningRate 0.3291 Epoch: 3 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:33,411-Speed 10734.49 samples/sec Loss 8.4055 LearningRate 0.3290 Epoch: 3 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:36,985-Speed 11466.02 samples/sec Loss 8.3632 LearningRate 0.3289 Epoch: 3 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:40,964-Speed 10299.99 samples/sec Loss 8.2917 LearningRate 0.3288 Epoch: 3 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:16:44,896-Speed 10417.95 samples/sec Loss 8.3295 LearningRate 0.3287 Epoch: 3 Global Step: 15360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:16:49,370-Speed 9158.13 samples/sec Loss 8.3201 LearningRate 0.3287 Epoch: 3 Global Step: 15370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:16:53,057-Speed 11113.71 samples/sec Loss 8.2789 LearningRate 0.3286 Epoch: 3 Global Step: 15380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:16:56,523-Speed 11822.35 samples/sec Loss 8.3233 LearningRate 0.3285 Epoch: 3 Global Step: 15390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:00,155-Speed 11280.66 samples/sec Loss 8.3106 LearningRate 0.3284 Epoch: 3 Global Step: 15400 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:03,556-Speed 12047.53 samples/sec Loss 8.4047 LearningRate 0.3283 Epoch: 3 Global Step: 15410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:07,105-Speed 11544.37 samples/sec Loss 8.3896 LearningRate 0.3282 Epoch: 3 Global Step: 15420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:10,646-Speed 11570.48 samples/sec Loss 8.3495 LearningRate 0.3281 Epoch: 3 Global Step: 15430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:14,348-Speed 11067.86 samples/sec Loss 8.3853 LearningRate 0.3280 Epoch: 3 Global Step: 15440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:18,125-Speed 10849.32 samples/sec Loss 8.3342 LearningRate 0.3279 Epoch: 3 Global Step: 15450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:21,780-Speed 11208.28 samples/sec Loss 8.3318 LearningRate 0.3278 Epoch: 3 Global Step: 15460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:25,864-Speed 10034.59 samples/sec Loss 8.3471 LearningRate 0.3277 Epoch: 3 Global Step: 15470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:29,514-Speed 11224.27 samples/sec Loss 8.3172 LearningRate 0.3276 Epoch: 3 Global Step: 15480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:33,386-Speed 10581.88 samples/sec Loss 8.3528 LearningRate 0.3275 Epoch: 3 Global Step: 15490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:37,138-Speed 10918.74 samples/sec Loss 8.2923 LearningRate 0.3274 Epoch: 3 Global Step: 15500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:40,795-Speed 11203.67 samples/sec Loss 8.3161 LearningRate 0.3273 Epoch: 3 Global Step: 15510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:44,727-Speed 10422.37 samples/sec Loss 8.2896 LearningRate 0.3272 Epoch: 3 Global Step: 15520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:48,542-Speed 10739.10 samples/sec Loss 8.3007 LearningRate 0.3271 Epoch: 3 Global Step: 15530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:52,975-Speed 9242.39 samples/sec Loss 8.3136 LearningRate 0.3270 Epoch: 3 Global Step: 15540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:17:56,499-Speed 11627.22 samples/sec Loss 8.3038 LearningRate 0.3269 Epoch: 3 Global Step: 15550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:00,175-Speed 11146.59 samples/sec Loss 8.3124 LearningRate 0.3268 Epoch: 3 Global Step: 15560 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:18:03,891-Speed 11025.05 samples/sec Loss 8.3282 LearningRate 0.3267 Epoch: 3 Global Step: 15570 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:18:07,370-Speed 11782.90 samples/sec Loss 8.3024 LearningRate 0.3266 Epoch: 3 Global Step: 15580 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:18:10,988-Speed 11326.67 samples/sec Loss 8.3718 LearningRate 0.3265 Epoch: 3 Global Step: 15590 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:18:14,444-Speed 11854.02 samples/sec Loss 8.3020 LearningRate 0.3264 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:18:17,997-Speed 11532.42 samples/sec Loss 8.2769 LearningRate 0.3263 Epoch: 3 Global Step: 15610 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:18:21,731-Speed 10974.57 samples/sec Loss 8.3080 LearningRate 0.3262 Epoch: 3 Global Step: 15620 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:18:25,889-Speed 9855.21 samples/sec Loss 8.2942 LearningRate 0.3261 Epoch: 3 Global Step: 15630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:29,290-Speed 12049.60 samples/sec Loss 8.3022 LearningRate 0.3261 Epoch: 3 Global Step: 15640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:32,935-Speed 11239.93 samples/sec Loss 8.2205 LearningRate 0.3260 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:36,330-Speed 12070.57 samples/sec Loss 8.3319 LearningRate 0.3259 Epoch: 3 Global Step: 15660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:40,172-Speed 10664.32 samples/sec Loss 8.3084 LearningRate 0.3258 Epoch: 3 Global Step: 15670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:43,748-Speed 11477.85 samples/sec Loss 8.3397 LearningRate 0.3257 Epoch: 3 Global Step: 15680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:47,598-Speed 10642.39 samples/sec Loss 8.2933 LearningRate 0.3256 Epoch: 3 Global Step: 15690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:51,201-Speed 11369.29 samples/sec Loss 8.3171 LearningRate 0.3255 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:55,648-Speed 9213.28 samples/sec Loss 8.3176 LearningRate 0.3254 Epoch: 3 Global Step: 15710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:18:59,120-Speed 11802.09 samples/sec Loss 8.2779 LearningRate 0.3253 Epoch: 3 Global Step: 15720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:02,587-Speed 11816.70 samples/sec Loss 8.2202 LearningRate 0.3252 Epoch: 3 Global Step: 15730 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:19:06,024-Speed 11922.88 samples/sec Loss 8.2671 LearningRate 0.3251 Epoch: 3 Global Step: 15740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:09,686-Speed 11192.54 samples/sec Loss 8.2644 LearningRate 0.3250 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:13,394-Speed 11050.46 samples/sec Loss 8.2229 LearningRate 0.3249 Epoch: 3 Global Step: 15760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:16,880-Speed 11752.19 samples/sec Loss 8.2270 LearningRate 0.3248 Epoch: 3 Global Step: 15770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:20,845-Speed 10333.38 samples/sec Loss 8.3138 LearningRate 0.3247 Epoch: 3 Global Step: 15780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:24,593-Speed 10934.29 samples/sec Loss 8.2967 LearningRate 0.3246 Epoch: 3 Global Step: 15790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:28,093-Speed 11709.05 samples/sec Loss 8.2061 LearningRate 0.3245 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:32,107-Speed 10206.30 samples/sec Loss 8.2647 LearningRate 0.3244 Epoch: 3 Global Step: 15810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:35,955-Speed 10646.85 samples/sec Loss 8.2298 LearningRate 0.3243 Epoch: 3 Global Step: 15820 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:39,669-Speed 11034.34 samples/sec Loss 8.2955 LearningRate 0.3242 Epoch: 3 Global Step: 15830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:19:43,470-Speed 10779.17 samples/sec Loss 8.2926 LearningRate 0.3241 Epoch: 3 Global Step: 15840 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:19:47,142-Speed 11159.04 samples/sec Loss 8.2938 LearningRate 0.3240 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:19:50,717-Speed 11461.94 samples/sec Loss 8.2991 LearningRate 0.3239 Epoch: 3 Global Step: 15860 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:19:54,371-Speed 11211.11 samples/sec Loss 8.2515 LearningRate 0.3238 Epoch: 3 Global Step: 15870 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:19:58,951-Speed 8946.15 samples/sec Loss 8.2725 LearningRate 0.3237 Epoch: 3 Global Step: 15880 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:20:02,405-Speed 11864.75 samples/sec Loss 8.3176 LearningRate 0.3237 Epoch: 3 Global Step: 15890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:20:06,086-Speed 11128.59 samples/sec Loss 8.3222 LearningRate 0.3236 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:20:09,553-Speed 11819.77 samples/sec Loss 8.2653 LearningRate 0.3235 Epoch: 3 Global Step: 15910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:20:13,252-Speed 11076.77 samples/sec Loss 8.2707 LearningRate 0.3234 Epoch: 3 Global Step: 15920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:20:16,913-Speed 11190.24 samples/sec Loss 8.2567 LearningRate 0.3233 Epoch: 3 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:20,599-Speed 11116.99 samples/sec Loss 8.2467 LearningRate 0.3232 Epoch: 3 Global Step: 15940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:24,136-Speed 11583.07 samples/sec Loss 8.2423 LearningRate 0.3231 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:27,655-Speed 11642.66 samples/sec Loss 8.2257 LearningRate 0.3230 Epoch: 3 Global Step: 15960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:31,219-Speed 11501.75 samples/sec Loss 8.2107 LearningRate 0.3229 Epoch: 3 Global Step: 15970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:34,764-Speed 11556.21 samples/sec Loss 8.2068 LearningRate 0.3228 Epoch: 3 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:38,673-Speed 10483.31 samples/sec Loss 8.1849 LearningRate 0.3227 Epoch: 3 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:42,242-Speed 11479.20 samples/sec Loss 8.2126 LearningRate 0.3226 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:46,113-Speed 10586.03 samples/sec Loss 8.1969 LearningRate 0.3225 Epoch: 3 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:49,768-Speed 11208.79 samples/sec Loss 8.1970 LearningRate 0.3224 Epoch: 3 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:20:53,393-Speed 11303.07 samples/sec Loss 8.2139 LearningRate 0.3223 Epoch: 3 Global Step: 16030 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:20:57,005-Speed 11342.79 samples/sec Loss 8.2102 LearningRate 0.3222 Epoch: 3 Global Step: 16040 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:21:01,239-Speed 9676.91 samples/sec Loss 8.1898 LearningRate 0.3221 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:21:04,778-Speed 11577.53 samples/sec Loss 8.1794 LearningRate 0.3220 Epoch: 3 Global Step: 16060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:21:08,433-Speed 11210.60 samples/sec Loss 8.3617 LearningRate 0.3219 Epoch: 3 Global Step: 16070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:21:12,027-Speed 11401.41 samples/sec Loss 8.2363 LearningRate 0.3218 Epoch: 3 Global Step: 16080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:21:15,777-Speed 10923.52 samples/sec Loss 8.1875 LearningRate 0.3217 Epoch: 3 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:19,329-Speed 11536.43 samples/sec Loss 8.2327 LearningRate 0.3216 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:23,188-Speed 10619.29 samples/sec Loss 8.2666 LearningRate 0.3215 Epoch: 3 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:26,914-Speed 10994.15 samples/sec Loss 8.2851 LearningRate 0.3215 Epoch: 3 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:30,440-Speed 11633.91 samples/sec Loss 8.2341 LearningRate 0.3214 Epoch: 3 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:34,309-Speed 10590.33 samples/sec Loss 8.1842 LearningRate 0.3213 Epoch: 3 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:38,036-Speed 10993.72 samples/sec Loss 8.2519 LearningRate 0.3212 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:41,659-Speed 11313.53 samples/sec Loss 8.1740 LearningRate 0.3211 Epoch: 3 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:45,346-Speed 11115.93 samples/sec Loss 8.2468 LearningRate 0.3210 Epoch: 3 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:49,098-Speed 10917.86 samples/sec Loss 8.2251 LearningRate 0.3209 Epoch: 3 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:21:52,697-Speed 11386.65 samples/sec Loss 8.2798 LearningRate 0.3208 Epoch: 3 Global Step: 16190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:21:56,274-Speed 11451.23 samples/sec Loss 8.2376 LearningRate 0.3207 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:22:00,118-Speed 10659.14 samples/sec Loss 8.2208 LearningRate 0.3206 Epoch: 3 Global Step: 16210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:22:04,190-Speed 10062.19 samples/sec Loss 8.2188 LearningRate 0.3205 Epoch: 3 Global Step: 16220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:22:08,435-Speed 9653.17 samples/sec Loss 8.1797 LearningRate 0.3204 Epoch: 3 Global Step: 16230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:22:12,122-Speed 11110.29 samples/sec Loss 8.2759 LearningRate 0.3203 Epoch: 3 Global Step: 16240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:22:15,583-Speed 11840.19 samples/sec Loss 8.2142 LearningRate 0.3202 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:22:19,176-Speed 11402.18 samples/sec Loss 8.2608 LearningRate 0.3201 Epoch: 3 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:22,886-Speed 11045.39 samples/sec Loss 8.1281 LearningRate 0.3200 Epoch: 3 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:26,433-Speed 11552.70 samples/sec Loss 8.1320 LearningRate 0.3199 Epoch: 3 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:30,429-Speed 10253.88 samples/sec Loss 8.2206 LearningRate 0.3198 Epoch: 3 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:34,089-Speed 11193.97 samples/sec Loss 8.1845 LearningRate 0.3197 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:37,632-Speed 11564.28 samples/sec Loss 8.1288 LearningRate 0.3196 Epoch: 3 Global Step: 16310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:41,404-Speed 10862.70 samples/sec Loss 8.1623 LearningRate 0.3195 Epoch: 3 Global Step: 16320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:45,157-Speed 10916.60 samples/sec Loss 8.1474 LearningRate 0.3194 Epoch: 3 Global Step: 16330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:49,133-Speed 10304.84 samples/sec Loss 8.1996 LearningRate 0.3194 Epoch: 3 Global Step: 16340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:53,078-Speed 10385.15 samples/sec Loss 8.1592 LearningRate 0.3193 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:22:57,072-Speed 10258.55 samples/sec Loss 8.1942 LearningRate 0.3192 Epoch: 3 Global Step: 16360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:00,648-Speed 11457.05 samples/sec Loss 8.2549 LearningRate 0.3191 Epoch: 3 Global Step: 16370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:04,122-Speed 11798.00 samples/sec Loss 8.0975 LearningRate 0.3190 Epoch: 3 Global Step: 16380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:07,665-Speed 11564.63 samples/sec Loss 8.0785 LearningRate 0.3189 Epoch: 3 Global Step: 16390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:12,050-Speed 9342.19 samples/sec Loss 8.1385 LearningRate 0.3188 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:15,972-Speed 10445.50 samples/sec Loss 8.1310 LearningRate 0.3187 Epoch: 3 Global Step: 16410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:19,701-Speed 10989.11 samples/sec Loss 8.2378 LearningRate 0.3186 Epoch: 3 Global Step: 16420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:23,191-Speed 11739.15 samples/sec Loss 8.2706 LearningRate 0.3185 Epoch: 3 Global Step: 16430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:26,932-Speed 10954.40 samples/sec Loss 8.1987 LearningRate 0.3184 Epoch: 3 Global Step: 16440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:30,609-Speed 11143.03 samples/sec Loss 8.1246 LearningRate 0.3183 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:34,110-Speed 11700.85 samples/sec Loss 8.2073 LearningRate 0.3182 Epoch: 3 Global Step: 16460 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:23:37,547-Speed 11922.14 samples/sec Loss 8.2076 LearningRate 0.3181 Epoch: 3 Global Step: 16470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:40,925-Speed 12129.18 samples/sec Loss 8.1424 LearningRate 0.3180 Epoch: 3 Global Step: 16480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:44,488-Speed 11501.65 samples/sec Loss 8.1395 LearningRate 0.3179 Epoch: 3 Global Step: 16490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:48,127-Speed 11262.45 samples/sec Loss 8.1547 LearningRate 0.3178 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:51,756-Speed 11289.99 samples/sec Loss 8.1692 LearningRate 0.3177 Epoch: 3 Global Step: 16510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:55,550-Speed 10797.67 samples/sec Loss 8.1887 LearningRate 0.3176 Epoch: 3 Global Step: 16520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:23:59,318-Speed 10874.25 samples/sec Loss 8.1742 LearningRate 0.3175 Epoch: 3 Global Step: 16530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:03,078-Speed 10897.66 samples/sec Loss 8.1066 LearningRate 0.3175 Epoch: 3 Global Step: 16540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:06,665-Speed 11421.63 samples/sec Loss 8.1221 LearningRate 0.3174 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:10,435-Speed 10867.30 samples/sec Loss 8.1214 LearningRate 0.3173 Epoch: 3 Global Step: 16560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:14,162-Speed 10995.49 samples/sec Loss 8.1121 LearningRate 0.3172 Epoch: 3 Global Step: 16570 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:24:17,875-Speed 11031.84 samples/sec Loss 8.1469 LearningRate 0.3171 Epoch: 3 Global Step: 16580 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:21,463-Speed 11422.11 samples/sec Loss 8.0974 LearningRate 0.3170 Epoch: 3 Global Step: 16590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:25,614-Speed 9871.00 samples/sec Loss 8.1167 LearningRate 0.3169 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:29,194-Speed 11444.51 samples/sec Loss 8.1563 LearningRate 0.3168 Epoch: 3 Global Step: 16610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:32,818-Speed 11306.12 samples/sec Loss 8.1728 LearningRate 0.3167 Epoch: 3 Global Step: 16620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:36,544-Speed 10995.94 samples/sec Loss 8.1260 LearningRate 0.3166 Epoch: 3 Global Step: 16630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:40,087-Speed 11567.48 samples/sec Loss 8.0699 LearningRate 0.3165 Epoch: 3 Global Step: 16640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:43,722-Speed 11269.27 samples/sec Loss 8.1464 LearningRate 0.3164 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:47,314-Speed 11408.86 samples/sec Loss 8.1672 LearningRate 0.3163 Epoch: 3 Global Step: 16660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:50,970-Speed 11208.23 samples/sec Loss 8.0863 LearningRate 0.3162 Epoch: 3 Global Step: 16670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:24:55,000-Speed 10166.03 samples/sec Loss 8.1638 LearningRate 0.3161 Epoch: 3 Global Step: 16680 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:25:34,740-Speed 1030.75 samples/sec Loss 7.9808 LearningRate 0.3160 Epoch: 4 Global Step: 16690 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:25:38,553-Speed 10747.16 samples/sec Loss 7.3619 LearningRate 0.3159 Epoch: 4 Global Step: 16700 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:25:42,791-Speed 9669.01 samples/sec Loss 7.3014 LearningRate 0.3158 Epoch: 4 Global Step: 16710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:25:46,415-Speed 11307.69 samples/sec Loss 7.3442 LearningRate 0.3157 Epoch: 4 Global Step: 16720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:25:50,283-Speed 10592.85 samples/sec Loss 7.2796 LearningRate 0.3157 Epoch: 4 Global Step: 16730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:25:54,412-Speed 9921.80 samples/sec Loss 7.3749 LearningRate 0.3156 Epoch: 4 Global Step: 16740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:25:58,019-Speed 11360.16 samples/sec Loss 7.3777 LearningRate 0.3155 Epoch: 4 Global Step: 16750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:01,866-Speed 10652.59 samples/sec Loss 7.3697 LearningRate 0.3154 Epoch: 4 Global Step: 16760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:05,731-Speed 10600.08 samples/sec Loss 7.3780 LearningRate 0.3153 Epoch: 4 Global Step: 16770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:09,419-Speed 11108.66 samples/sec Loss 7.3730 LearningRate 0.3152 Epoch: 4 Global Step: 16780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:12,961-Speed 11567.10 samples/sec Loss 7.4452 LearningRate 0.3151 Epoch: 4 Global Step: 16790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:17,939-Speed 8232.29 samples/sec Loss 7.3871 LearningRate 0.3150 Epoch: 4 Global Step: 16800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:21,933-Speed 10257.60 samples/sec Loss 7.4713 LearningRate 0.3149 Epoch: 4 Global Step: 16810 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:26:25,573-Speed 11255.98 samples/sec Loss 7.4349 LearningRate 0.3148 Epoch: 4 Global Step: 16820 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:29,097-Speed 11627.66 samples/sec Loss 7.4540 LearningRate 0.3147 Epoch: 4 Global Step: 16830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:32,520-Speed 11971.46 samples/sec Loss 7.4379 LearningRate 0.3146 Epoch: 4 Global Step: 16840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:35,968-Speed 11883.26 samples/sec Loss 7.4442 LearningRate 0.3145 Epoch: 4 Global Step: 16850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:39,398-Speed 11945.88 samples/sec Loss 7.4687 LearningRate 0.3144 Epoch: 4 Global Step: 16860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:43,047-Speed 11227.49 samples/sec Loss 7.4937 LearningRate 0.3143 Epoch: 4 Global Step: 16870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:46,500-Speed 11868.15 samples/sec Loss 7.5241 LearningRate 0.3142 Epoch: 4 Global Step: 16880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:50,245-Speed 10939.98 samples/sec Loss 7.4563 LearningRate 0.3141 Epoch: 4 Global Step: 16890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:54,066-Speed 10724.36 samples/sec Loss 7.4758 LearningRate 0.3140 Epoch: 4 Global Step: 16900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:26:57,480-Speed 12005.01 samples/sec Loss 7.5808 LearningRate 0.3140 Epoch: 4 Global Step: 16910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:27:00,918-Speed 11916.64 samples/sec Loss 7.5039 LearningRate 0.3139 Epoch: 4 Global Step: 16920 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:27:04,417-Speed 11712.68 samples/sec Loss 7.4702 LearningRate 0.3138 Epoch: 4 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:07,983-Speed 11487.07 samples/sec Loss 7.5506 LearningRate 0.3137 Epoch: 4 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:11,591-Speed 11356.56 samples/sec Loss 7.5603 LearningRate 0.3136 Epoch: 4 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:15,454-Speed 10607.07 samples/sec Loss 7.5237 LearningRate 0.3135 Epoch: 4 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:18,992-Speed 11580.76 samples/sec Loss 7.6020 LearningRate 0.3134 Epoch: 4 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:22,474-Speed 11769.82 samples/sec Loss 7.5519 LearningRate 0.3133 Epoch: 4 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:26,195-Speed 11008.82 samples/sec Loss 7.5115 LearningRate 0.3132 Epoch: 4 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:29,947-Speed 10921.61 samples/sec Loss 7.6011 LearningRate 0.3131 Epoch: 4 Global Step: 17000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:33,691-Speed 10946.40 samples/sec Loss 7.6790 LearningRate 0.3130 Epoch: 4 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:37,211-Speed 11639.58 samples/sec Loss 7.5931 LearningRate 0.3129 Epoch: 4 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:27:41,040-Speed 10700.54 samples/sec Loss 7.6318 LearningRate 0.3128 Epoch: 4 Global Step: 17030 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:27:44,689-Speed 11229.01 samples/sec Loss 7.6001 LearningRate 0.3127 Epoch: 4 Global Step: 17040 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:27:48,397-Speed 11049.89 samples/sec Loss 7.6253 LearningRate 0.3126 Epoch: 4 Global Step: 17050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:27:52,175-Speed 10845.65 samples/sec Loss 7.6158 LearningRate 0.3125 Epoch: 4 Global Step: 17060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:27:56,199-Speed 10180.96 samples/sec Loss 7.6052 LearningRate 0.3124 Epoch: 4 Global Step: 17070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:27:59,764-Speed 11494.82 samples/sec Loss 7.6215 LearningRate 0.3123 Epoch: 4 Global Step: 17080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:03,329-Speed 11490.57 samples/sec Loss 7.6453 LearningRate 0.3123 Epoch: 4 Global Step: 17090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:06,995-Speed 11178.39 samples/sec Loss 7.6718 LearningRate 0.3122 Epoch: 4 Global Step: 17100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:11,103-Speed 9974.16 samples/sec Loss 7.6604 LearningRate 0.3121 Epoch: 4 Global Step: 17110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:14,738-Speed 11269.79 samples/sec Loss 7.6519 LearningRate 0.3120 Epoch: 4 Global Step: 17120 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:18,276-Speed 11583.20 samples/sec Loss 7.6841 LearningRate 0.3119 Epoch: 4 Global Step: 17130 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:28:21,939-Speed 11190.44 samples/sec Loss 7.5950 LearningRate 0.3118 Epoch: 4 Global Step: 17140 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:28:25,541-Speed 11377.24 samples/sec Loss 7.7076 LearningRate 0.3117 Epoch: 4 Global Step: 17150 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:28:29,334-Speed 10801.07 samples/sec Loss 7.6574 LearningRate 0.3116 Epoch: 4 Global Step: 17160 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:28:32,961-Speed 11296.85 samples/sec Loss 7.6852 LearningRate 0.3115 Epoch: 4 Global Step: 17170 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:28:36,737-Speed 10851.62 samples/sec Loss 7.6412 LearningRate 0.3114 Epoch: 4 Global Step: 17180 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:40,452-Speed 11028.97 samples/sec Loss 7.7224 LearningRate 0.3113 Epoch: 4 Global Step: 17190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:43,997-Speed 11558.46 samples/sec Loss 7.6211 LearningRate 0.3112 Epoch: 4 Global Step: 17200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:47,702-Speed 11058.36 samples/sec Loss 7.7752 LearningRate 0.3111 Epoch: 4 Global Step: 17210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:51,276-Speed 11465.19 samples/sec Loss 7.6689 LearningRate 0.3110 Epoch: 4 Global Step: 17220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:54,885-Speed 11352.74 samples/sec Loss 7.7168 LearningRate 0.3109 Epoch: 4 Global Step: 17230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:28:58,595-Speed 11044.89 samples/sec Loss 7.7712 LearningRate 0.3108 Epoch: 4 Global Step: 17240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:02,201-Speed 11362.25 samples/sec Loss 7.7899 LearningRate 0.3108 Epoch: 4 Global Step: 17250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:05,870-Speed 11166.43 samples/sec Loss 7.7059 LearningRate 0.3107 Epoch: 4 Global Step: 17260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:09,291-Speed 11977.38 samples/sec Loss 7.7457 LearningRate 0.3106 Epoch: 4 Global Step: 17270 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:12,881-Speed 11414.26 samples/sec Loss 7.7301 LearningRate 0.3105 Epoch: 4 Global Step: 17280 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:29:16,395-Speed 11658.72 samples/sec Loss 7.7610 LearningRate 0.3104 Epoch: 4 Global Step: 17290 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:29:19,967-Speed 11469.69 samples/sec Loss 7.7153 LearningRate 0.3103 Epoch: 4 Global Step: 17300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:23,697-Speed 10985.47 samples/sec Loss 7.7024 LearningRate 0.3102 Epoch: 4 Global Step: 17310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:27,376-Speed 11138.39 samples/sec Loss 7.8028 LearningRate 0.3101 Epoch: 4 Global Step: 17320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:31,137-Speed 10893.47 samples/sec Loss 7.7974 LearningRate 0.3100 Epoch: 4 Global Step: 17330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:34,814-Speed 11144.13 samples/sec Loss 7.7876 LearningRate 0.3099 Epoch: 4 Global Step: 17340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:29:38,311-Speed 11713.96 samples/sec Loss 7.7297 LearningRate 0.3098 Epoch: 4 Global Step: 17350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:29:42,064-Speed 10918.76 samples/sec Loss 7.8051 LearningRate 0.3097 Epoch: 4 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:29:45,501-Speed 11921.02 samples/sec Loss 7.7783 LearningRate 0.3096 Epoch: 4 Global Step: 17370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:29:49,261-Speed 10897.95 samples/sec Loss 7.7866 LearningRate 0.3095 Epoch: 4 Global Step: 17380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:29:52,916-Speed 11209.31 samples/sec Loss 7.7699 LearningRate 0.3094 Epoch: 4 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:29:56,416-Speed 11707.58 samples/sec Loss 7.7949 LearningRate 0.3093 Epoch: 4 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:29:59,910-Speed 11724.29 samples/sec Loss 7.7946 LearningRate 0.3092 Epoch: 4 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:03,798-Speed 10538.28 samples/sec Loss 7.7414 LearningRate 0.3092 Epoch: 4 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:07,464-Speed 11178.58 samples/sec Loss 7.8152 LearningRate 0.3091 Epoch: 4 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:11,009-Speed 11556.15 samples/sec Loss 7.8222 LearningRate 0.3090 Epoch: 4 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:14,605-Speed 11395.43 samples/sec Loss 7.8678 LearningRate 0.3089 Epoch: 4 Global Step: 17450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:30:18,043-Speed 11917.80 samples/sec Loss 7.7925 LearningRate 0.3088 Epoch: 4 Global Step: 17460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:30:21,569-Speed 11619.53 samples/sec Loss 7.8211 LearningRate 0.3087 Epoch: 4 Global Step: 17470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:30:25,153-Speed 11432.63 samples/sec Loss 7.7885 LearningRate 0.3086 Epoch: 4 Global Step: 17480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:30:28,693-Speed 11576.26 samples/sec Loss 7.7648 LearningRate 0.3085 Epoch: 4 Global Step: 17490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:30:32,415-Speed 11006.27 samples/sec Loss 7.8025 LearningRate 0.3084 Epoch: 4 Global Step: 17500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:30:35,945-Speed 11606.51 samples/sec Loss 7.8218 LearningRate 0.3083 Epoch: 4 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:39,429-Speed 11761.10 samples/sec Loss 7.8178 LearningRate 0.3082 Epoch: 4 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:42,965-Speed 11588.22 samples/sec Loss 7.7859 LearningRate 0.3081 Epoch: 4 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:46,394-Speed 11948.32 samples/sec Loss 7.8498 LearningRate 0.3080 Epoch: 4 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:50,140-Speed 10938.38 samples/sec Loss 7.7834 LearningRate 0.3079 Epoch: 4 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:53,906-Speed 10878.96 samples/sec Loss 7.7904 LearningRate 0.3078 Epoch: 4 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:30:57,919-Speed 10211.27 samples/sec Loss 7.8130 LearningRate 0.3078 Epoch: 4 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:31:01,531-Speed 11346.94 samples/sec Loss 7.8200 LearningRate 0.3077 Epoch: 4 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:31:04,947-Speed 11994.74 samples/sec Loss 7.8326 LearningRate 0.3076 Epoch: 4 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:31:08,435-Speed 11745.18 samples/sec Loss 7.7865 LearningRate 0.3075 Epoch: 4 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:31:11,996-Speed 11506.12 samples/sec Loss 7.8085 LearningRate 0.3074 Epoch: 4 Global Step: 17610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:15,584-Speed 11419.08 samples/sec Loss 7.7709 LearningRate 0.3073 Epoch: 4 Global Step: 17620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:19,313-Speed 10990.71 samples/sec Loss 7.7866 LearningRate 0.3072 Epoch: 4 Global Step: 17630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:22,863-Speed 11540.68 samples/sec Loss 7.8145 LearningRate 0.3071 Epoch: 4 Global Step: 17640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:26,706-Speed 10662.57 samples/sec Loss 7.7923 LearningRate 0.3070 Epoch: 4 Global Step: 17650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:30,429-Speed 11003.21 samples/sec Loss 7.8134 LearningRate 0.3069 Epoch: 4 Global Step: 17660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:34,423-Speed 10258.24 samples/sec Loss 7.8309 LearningRate 0.3068 Epoch: 4 Global Step: 17670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:37,916-Speed 11732.60 samples/sec Loss 7.7925 LearningRate 0.3067 Epoch: 4 Global Step: 17680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:41,377-Speed 11839.03 samples/sec Loss 7.8336 LearningRate 0.3066 Epoch: 4 Global Step: 17690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:44,813-Speed 11923.51 samples/sec Loss 7.8508 LearningRate 0.3065 Epoch: 4 Global Step: 17700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:48,676-Speed 10607.25 samples/sec Loss 7.8826 LearningRate 0.3064 Epoch: 4 Global Step: 17710 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:31:52,301-Speed 11302.73 samples/sec Loss 7.8413 LearningRate 0.3064 Epoch: 4 Global Step: 17720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:55,812-Speed 11672.92 samples/sec Loss 7.8193 LearningRate 0.3063 Epoch: 4 Global Step: 17730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:31:59,242-Speed 11946.67 samples/sec Loss 7.8696 LearningRate 0.3062 Epoch: 4 Global Step: 17740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:02,687-Speed 11894.24 samples/sec Loss 7.7996 LearningRate 0.3061 Epoch: 4 Global Step: 17750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:06,115-Speed 11952.92 samples/sec Loss 7.7634 LearningRate 0.3060 Epoch: 4 Global Step: 17760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:09,543-Speed 11949.42 samples/sec Loss 7.7875 LearningRate 0.3059 Epoch: 4 Global Step: 17770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:13,172-Speed 11293.20 samples/sec Loss 7.8931 LearningRate 0.3058 Epoch: 4 Global Step: 17780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:16,832-Speed 11194.88 samples/sec Loss 7.9298 LearningRate 0.3057 Epoch: 4 Global Step: 17790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:20,390-Speed 11516.39 samples/sec Loss 7.8331 LearningRate 0.3056 Epoch: 4 Global Step: 17800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:23,911-Speed 11636.14 samples/sec Loss 7.7958 LearningRate 0.3055 Epoch: 4 Global Step: 17810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:27,442-Speed 11603.86 samples/sec Loss 7.8599 LearningRate 0.3054 Epoch: 4 Global Step: 17820 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:32:30,915-Speed 11796.41 samples/sec Loss 7.7469 LearningRate 0.3053 Epoch: 4 Global Step: 17830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:34,503-Speed 11421.50 samples/sec Loss 7.8180 LearningRate 0.3052 Epoch: 4 Global Step: 17840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:38,062-Speed 11514.70 samples/sec Loss 7.8369 LearningRate 0.3051 Epoch: 4 Global Step: 17850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:41,813-Speed 10922.23 samples/sec Loss 7.8594 LearningRate 0.3050 Epoch: 4 Global Step: 17860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:45,593-Speed 10841.91 samples/sec Loss 7.8162 LearningRate 0.3050 Epoch: 4 Global Step: 17870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:49,054-Speed 11838.50 samples/sec Loss 7.8222 LearningRate 0.3049 Epoch: 4 Global Step: 17880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:52,498-Speed 11899.56 samples/sec Loss 7.8374 LearningRate 0.3048 Epoch: 4 Global Step: 17890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:56,388-Speed 10530.09 samples/sec Loss 7.7960 LearningRate 0.3047 Epoch: 4 Global Step: 17900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:32:59,753-Speed 12177.09 samples/sec Loss 7.8720 LearningRate 0.3046 Epoch: 4 Global Step: 17910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:03,155-Speed 12046.27 samples/sec Loss 7.8936 LearningRate 0.3045 Epoch: 4 Global Step: 17920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:06,830-Speed 11150.15 samples/sec Loss 7.7674 LearningRate 0.3044 Epoch: 4 Global Step: 17930 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:33:10,549-Speed 11017.40 samples/sec Loss 7.7994 LearningRate 0.3043 Epoch: 4 Global Step: 17940 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:14,205-Speed 11206.70 samples/sec Loss 7.7942 LearningRate 0.3042 Epoch: 4 Global Step: 17950 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:17,615-Speed 12015.21 samples/sec Loss 7.8012 LearningRate 0.3041 Epoch: 4 Global Step: 17960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:21,551-Speed 10409.73 samples/sec Loss 7.8684 LearningRate 0.3040 Epoch: 4 Global Step: 17970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:25,047-Speed 11722.41 samples/sec Loss 7.9133 LearningRate 0.3039 Epoch: 4 Global Step: 17980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:28,635-Speed 11417.80 samples/sec Loss 7.7942 LearningRate 0.3038 Epoch: 4 Global Step: 17990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:32,648-Speed 10210.50 samples/sec Loss 7.9178 LearningRate 0.3037 Epoch: 4 Global Step: 18000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:36,079-Speed 11943.89 samples/sec Loss 7.9279 LearningRate 0.3037 Epoch: 4 Global Step: 18010 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:39,526-Speed 11885.38 samples/sec Loss 7.8744 LearningRate 0.3036 Epoch: 4 Global Step: 18020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:43,274-Speed 10933.22 samples/sec Loss 7.8883 LearningRate 0.3035 Epoch: 4 Global Step: 18030 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:47,128-Speed 10630.37 samples/sec Loss 7.8686 LearningRate 0.3034 Epoch: 4 Global Step: 18040 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:33:50,734-Speed 11364.65 samples/sec Loss 7.8865 LearningRate 0.3033 Epoch: 4 Global Step: 18050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:54,159-Speed 11963.95 samples/sec Loss 7.8423 LearningRate 0.3032 Epoch: 4 Global Step: 18060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:33:57,708-Speed 11545.00 samples/sec Loss 7.8917 LearningRate 0.3031 Epoch: 4 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:01,291-Speed 11435.35 samples/sec Loss 7.7672 LearningRate 0.3030 Epoch: 4 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:04,938-Speed 11233.04 samples/sec Loss 7.8181 LearningRate 0.3029 Epoch: 4 Global Step: 18090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:08,823-Speed 10547.77 samples/sec Loss 7.8423 LearningRate 0.3028 Epoch: 4 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:12,520-Speed 11083.23 samples/sec Loss 7.8506 LearningRate 0.3027 Epoch: 4 Global Step: 18110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:15,981-Speed 11839.04 samples/sec Loss 7.8569 LearningRate 0.3026 Epoch: 4 Global Step: 18120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:19,487-Speed 11686.74 samples/sec Loss 7.7990 LearningRate 0.3025 Epoch: 4 Global Step: 18130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:23,053-Speed 11487.85 samples/sec Loss 7.8445 LearningRate 0.3024 Epoch: 4 Global Step: 18140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:26,427-Speed 12146.60 samples/sec Loss 7.8335 LearningRate 0.3024 Epoch: 4 Global Step: 18150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:29,941-Speed 11661.64 samples/sec Loss 7.8570 LearningRate 0.3023 Epoch: 4 Global Step: 18160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:34:33,441-Speed 11705.88 samples/sec Loss 7.7861 LearningRate 0.3022 Epoch: 4 Global Step: 18170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:34:36,898-Speed 11851.85 samples/sec Loss 7.8777 LearningRate 0.3021 Epoch: 4 Global Step: 18180 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:34:41,391-Speed 9119.86 samples/sec Loss 7.8950 LearningRate 0.3020 Epoch: 4 Global Step: 18190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:34:45,069-Speed 11140.55 samples/sec Loss 7.8147 LearningRate 0.3019 Epoch: 4 Global Step: 18200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:34:49,340-Speed 9593.58 samples/sec Loss 7.8467 LearningRate 0.3018 Epoch: 4 Global Step: 18210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:34:52,775-Speed 11931.90 samples/sec Loss 7.8269 LearningRate 0.3017 Epoch: 4 Global Step: 18220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:34:56,207-Speed 11938.65 samples/sec Loss 7.8992 LearningRate 0.3016 Epoch: 4 Global Step: 18230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:34:59,612-Speed 12031.94 samples/sec Loss 7.7933 LearningRate 0.3015 Epoch: 4 Global Step: 18240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:03,045-Speed 11935.08 samples/sec Loss 7.9004 LearningRate 0.3014 Epoch: 4 Global Step: 18250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:06,413-Speed 12164.75 samples/sec Loss 7.7916 LearningRate 0.3013 Epoch: 4 Global Step: 18260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:09,942-Speed 11611.22 samples/sec Loss 7.8065 LearningRate 0.3012 Epoch: 4 Global Step: 18270 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:35:13,462-Speed 11646.44 samples/sec Loss 7.8498 LearningRate 0.3012 Epoch: 4 Global Step: 18280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:17,467-Speed 10228.85 samples/sec Loss 7.8059 LearningRate 0.3011 Epoch: 4 Global Step: 18290 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:20,882-Speed 11999.84 samples/sec Loss 7.8790 LearningRate 0.3010 Epoch: 4 Global Step: 18300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:24,306-Speed 11965.95 samples/sec Loss 7.8782 LearningRate 0.3009 Epoch: 4 Global Step: 18310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:27,931-Speed 11303.86 samples/sec Loss 7.7894 LearningRate 0.3008 Epoch: 4 Global Step: 18320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:31,712-Speed 10837.89 samples/sec Loss 7.8868 LearningRate 0.3007 Epoch: 4 Global Step: 18330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:35,174-Speed 11837.18 samples/sec Loss 7.8695 LearningRate 0.3006 Epoch: 4 Global Step: 18340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:38,646-Speed 11800.81 samples/sec Loss 7.7962 LearningRate 0.3005 Epoch: 4 Global Step: 18350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:42,290-Speed 11241.66 samples/sec Loss 7.8049 LearningRate 0.3004 Epoch: 4 Global Step: 18360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:46,223-Speed 10418.91 samples/sec Loss 7.8600 LearningRate 0.3003 Epoch: 4 Global Step: 18370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:50,041-Speed 10732.33 samples/sec Loss 7.8653 LearningRate 0.3002 Epoch: 4 Global Step: 18380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:53,472-Speed 11942.38 samples/sec Loss 7.7924 LearningRate 0.3001 Epoch: 4 Global Step: 18390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:35:56,916-Speed 11894.48 samples/sec Loss 7.8694 LearningRate 0.3000 Epoch: 4 Global Step: 18400 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:00,342-Speed 11960.50 samples/sec Loss 7.8738 LearningRate 0.3000 Epoch: 4 Global Step: 18410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:03,835-Speed 11728.54 samples/sec Loss 7.8281 LearningRate 0.2999 Epoch: 4 Global Step: 18420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:07,296-Speed 11841.37 samples/sec Loss 7.8082 LearningRate 0.2998 Epoch: 4 Global Step: 18430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:11,144-Speed 10662.40 samples/sec Loss 7.8296 LearningRate 0.2997 Epoch: 4 Global Step: 18440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:14,783-Speed 11258.98 samples/sec Loss 7.8634 LearningRate 0.2996 Epoch: 4 Global Step: 18450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:18,238-Speed 11861.64 samples/sec Loss 7.8433 LearningRate 0.2995 Epoch: 4 Global Step: 18460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:21,882-Speed 11242.28 samples/sec Loss 7.8672 LearningRate 0.2994 Epoch: 4 Global Step: 18470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:25,338-Speed 11854.48 samples/sec Loss 7.8247 LearningRate 0.2993 Epoch: 4 Global Step: 18480 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:36:28,855-Speed 11650.99 samples/sec Loss 7.8345 LearningRate 0.2992 Epoch: 4 Global Step: 18490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:32,364-Speed 11677.11 samples/sec Loss 7.7654 LearningRate 0.2991 Epoch: 4 Global Step: 18500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:36,262-Speed 10512.00 samples/sec Loss 7.8302 LearningRate 0.2990 Epoch: 4 Global Step: 18510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:40,204-Speed 10425.41 samples/sec Loss 7.7939 LearningRate 0.2989 Epoch: 4 Global Step: 18520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:43,600-Speed 12063.50 samples/sec Loss 7.8675 LearningRate 0.2988 Epoch: 4 Global Step: 18530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:47,021-Speed 11976.32 samples/sec Loss 7.8032 LearningRate 0.2988 Epoch: 4 Global Step: 18540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:50,512-Speed 11738.99 samples/sec Loss 7.8643 LearningRate 0.2987 Epoch: 4 Global Step: 18550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:53,984-Speed 11800.25 samples/sec Loss 7.8688 LearningRate 0.2986 Epoch: 4 Global Step: 18560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:36:57,502-Speed 11648.71 samples/sec Loss 7.7964 LearningRate 0.2985 Epoch: 4 Global Step: 18570 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:01,170-Speed 11169.46 samples/sec Loss 7.8413 LearningRate 0.2984 Epoch: 4 Global Step: 18580 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:04,766-Speed 11395.83 samples/sec Loss 7.8585 LearningRate 0.2983 Epoch: 4 Global Step: 18590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:08,631-Speed 10601.98 samples/sec Loss 7.8408 LearningRate 0.2982 Epoch: 4 Global Step: 18600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:12,225-Speed 11397.78 samples/sec Loss 7.7842 LearningRate 0.2981 Epoch: 4 Global Step: 18610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:15,597-Speed 12151.34 samples/sec Loss 7.8145 LearningRate 0.2980 Epoch: 4 Global Step: 18620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:19,133-Speed 11587.37 samples/sec Loss 7.8258 LearningRate 0.2979 Epoch: 4 Global Step: 18630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:22,620-Speed 11752.27 samples/sec Loss 7.7971 LearningRate 0.2978 Epoch: 4 Global Step: 18640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:26,089-Speed 11812.23 samples/sec Loss 7.8025 LearningRate 0.2977 Epoch: 4 Global Step: 18650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:29,592-Speed 11699.20 samples/sec Loss 7.8400 LearningRate 0.2977 Epoch: 4 Global Step: 18660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:33,057-Speed 11824.22 samples/sec Loss 7.8630 LearningRate 0.2976 Epoch: 4 Global Step: 18670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:36,731-Speed 11150.25 samples/sec Loss 7.7669 LearningRate 0.2975 Epoch: 4 Global Step: 18680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:40,156-Speed 11963.22 samples/sec Loss 7.8328 LearningRate 0.2974 Epoch: 4 Global Step: 18690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:43,597-Speed 11907.48 samples/sec Loss 7.7809 LearningRate 0.2973 Epoch: 4 Global Step: 18700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:47,812-Speed 9720.93 samples/sec Loss 7.8238 LearningRate 0.2972 Epoch: 4 Global Step: 18710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:51,249-Speed 11921.80 samples/sec Loss 7.8325 LearningRate 0.2971 Epoch: 4 Global Step: 18720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:54,805-Speed 11524.06 samples/sec Loss 7.8665 LearningRate 0.2970 Epoch: 4 Global Step: 18730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:37:58,522-Speed 11022.91 samples/sec Loss 7.7856 LearningRate 0.2969 Epoch: 4 Global Step: 18740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:02,309-Speed 10818.31 samples/sec Loss 7.8384 LearningRate 0.2968 Epoch: 4 Global Step: 18750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:05,760-Speed 11872.30 samples/sec Loss 7.7784 LearningRate 0.2967 Epoch: 4 Global Step: 18760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:09,390-Speed 11287.75 samples/sec Loss 7.8073 LearningRate 0.2966 Epoch: 4 Global Step: 18770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:12,982-Speed 11405.90 samples/sec Loss 7.8111 LearningRate 0.2965 Epoch: 4 Global Step: 18780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:16,391-Speed 12020.00 samples/sec Loss 7.8252 LearningRate 0.2965 Epoch: 4 Global Step: 18790 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:38:19,896-Speed 11691.41 samples/sec Loss 7.7920 LearningRate 0.2964 Epoch: 4 Global Step: 18800 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:38:23,583-Speed 11113.24 samples/sec Loss 7.8391 LearningRate 0.2963 Epoch: 4 Global Step: 18810 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:38:27,325-Speed 10951.54 samples/sec Loss 7.8030 LearningRate 0.2962 Epoch: 4 Global Step: 18820 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:38:30,799-Speed 11797.33 samples/sec Loss 7.8307 LearningRate 0.2961 Epoch: 4 Global Step: 18830 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:38:34,396-Speed 11390.06 samples/sec Loss 7.7928 LearningRate 0.2960 Epoch: 4 Global Step: 18840 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:38:38,480-Speed 10033.05 samples/sec Loss 7.7994 LearningRate 0.2959 Epoch: 4 Global Step: 18850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:41,924-Speed 11898.70 samples/sec Loss 7.8433 LearningRate 0.2958 Epoch: 4 Global Step: 18860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:45,426-Speed 11700.16 samples/sec Loss 7.8257 LearningRate 0.2957 Epoch: 4 Global Step: 18870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:49,722-Speed 9537.62 samples/sec Loss 7.8406 LearningRate 0.2956 Epoch: 4 Global Step: 18880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:53,133-Speed 12010.28 samples/sec Loss 7.8020 LearningRate 0.2955 Epoch: 4 Global Step: 18890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:38:56,600-Speed 11820.01 samples/sec Loss 7.8140 LearningRate 0.2955 Epoch: 4 Global Step: 18900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:39:00,192-Speed 11409.81 samples/sec Loss 7.7742 LearningRate 0.2954 Epoch: 4 Global Step: 18910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:39:03,948-Speed 10907.12 samples/sec Loss 7.7764 LearningRate 0.2953 Epoch: 4 Global Step: 18920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:39:07,353-Speed 12036.74 samples/sec Loss 7.7869 LearningRate 0.2952 Epoch: 4 Global Step: 18930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:39:10,735-Speed 12116.49 samples/sec Loss 7.7369 LearningRate 0.2951 Epoch: 4 Global Step: 18940 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:39:14,128-Speed 12073.26 samples/sec Loss 7.8129 LearningRate 0.2950 Epoch: 4 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:17,584-Speed 11856.98 samples/sec Loss 7.7964 LearningRate 0.2949 Epoch: 4 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:21,179-Speed 11397.13 samples/sec Loss 7.8276 LearningRate 0.2948 Epoch: 4 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:24,601-Speed 11974.87 samples/sec Loss 7.7624 LearningRate 0.2947 Epoch: 4 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:28,163-Speed 11501.90 samples/sec Loss 7.7958 LearningRate 0.2946 Epoch: 4 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:31,956-Speed 10803.70 samples/sec Loss 7.7795 LearningRate 0.2945 Epoch: 4 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:35,413-Speed 11853.42 samples/sec Loss 7.8065 LearningRate 0.2944 Epoch: 4 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:39,333-Speed 10452.88 samples/sec Loss 7.7467 LearningRate 0.2944 Epoch: 4 Global Step: 19020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:43,248-Speed 10473.32 samples/sec Loss 7.7817 LearningRate 0.2943 Epoch: 4 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:46,926-Speed 11139.09 samples/sec Loss 7.8093 LearningRate 0.2942 Epoch: 4 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:39:50,456-Speed 11606.33 samples/sec Loss 7.7728 LearningRate 0.2941 Epoch: 4 Global Step: 19050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:39:54,095-Speed 11262.31 samples/sec Loss 7.7877 LearningRate 0.2940 Epoch: 4 Global Step: 19060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:39:57,627-Speed 11600.45 samples/sec Loss 7.7805 LearningRate 0.2939 Epoch: 4 Global Step: 19070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:40:01,263-Speed 11267.41 samples/sec Loss 7.7079 LearningRate 0.2938 Epoch: 4 Global Step: 19080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:40:04,700-Speed 11922.51 samples/sec Loss 7.7291 LearningRate 0.2937 Epoch: 4 Global Step: 19090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:40:08,634-Speed 10413.17 samples/sec Loss 7.7839 LearningRate 0.2936 Epoch: 4 Global Step: 19100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:40:12,022-Speed 12093.99 samples/sec Loss 7.7406 LearningRate 0.2935 Epoch: 4 Global Step: 19110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:40:15,473-Speed 11881.76 samples/sec Loss 7.8417 LearningRate 0.2934 Epoch: 4 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:19,031-Speed 11518.53 samples/sec Loss 7.8178 LearningRate 0.2933 Epoch: 4 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:22,716-Speed 11119.44 samples/sec Loss 7.7807 LearningRate 0.2933 Epoch: 4 Global Step: 19140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:26,588-Speed 10582.69 samples/sec Loss 7.7748 LearningRate 0.2932 Epoch: 4 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:30,009-Speed 11974.55 samples/sec Loss 7.7620 LearningRate 0.2931 Epoch: 4 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:33,392-Speed 12113.33 samples/sec Loss 7.7491 LearningRate 0.2930 Epoch: 4 Global Step: 19170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:36,911-Speed 11643.68 samples/sec Loss 7.7940 LearningRate 0.2929 Epoch: 4 Global Step: 19180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:40,953-Speed 10137.49 samples/sec Loss 7.7459 LearningRate 0.2928 Epoch: 4 Global Step: 19190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:44,843-Speed 10532.78 samples/sec Loss 7.8325 LearningRate 0.2927 Epoch: 4 Global Step: 19200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:48,280-Speed 11920.83 samples/sec Loss 7.7423 LearningRate 0.2926 Epoch: 4 Global Step: 19210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:40:51,699-Speed 11986.84 samples/sec Loss 7.7875 LearningRate 0.2925 Epoch: 4 Global Step: 19220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:40:55,134-Speed 11927.66 samples/sec Loss 7.7665 LearningRate 0.2924 Epoch: 4 Global Step: 19230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:40:58,764-Speed 11285.78 samples/sec Loss 7.7726 LearningRate 0.2923 Epoch: 4 Global Step: 19240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:02,378-Speed 11339.51 samples/sec Loss 7.8108 LearningRate 0.2923 Epoch: 4 Global Step: 19250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:05,864-Speed 11754.61 samples/sec Loss 7.8077 LearningRate 0.2922 Epoch: 4 Global Step: 19260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:09,416-Speed 11536.56 samples/sec Loss 7.7764 LearningRate 0.2921 Epoch: 4 Global Step: 19270 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:12,806-Speed 12089.24 samples/sec Loss 7.7376 LearningRate 0.2920 Epoch: 4 Global Step: 19280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:16,236-Speed 11943.49 samples/sec Loss 7.7770 LearningRate 0.2919 Epoch: 4 Global Step: 19290 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:20,239-Speed 10236.41 samples/sec Loss 7.7917 LearningRate 0.2918 Epoch: 4 Global Step: 19300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:23,737-Speed 11714.93 samples/sec Loss 7.8030 LearningRate 0.2917 Epoch: 4 Global Step: 19310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:27,954-Speed 9714.75 samples/sec Loss 7.8012 LearningRate 0.2916 Epoch: 4 Global Step: 19320 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:41:31,465-Speed 11671.51 samples/sec Loss 7.8397 LearningRate 0.2915 Epoch: 4 Global Step: 19330 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:41:34,919-Speed 11860.79 samples/sec Loss 7.7862 LearningRate 0.2914 Epoch: 4 Global Step: 19340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:38,412-Speed 11730.97 samples/sec Loss 7.7623 LearningRate 0.2913 Epoch: 4 Global Step: 19350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:42,728-Speed 9494.45 samples/sec Loss 7.7950 LearningRate 0.2913 Epoch: 4 Global Step: 19360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:46,418-Speed 11105.22 samples/sec Loss 7.8102 LearningRate 0.2912 Epoch: 4 Global Step: 19370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:50,252-Speed 10687.67 samples/sec Loss 7.8330 LearningRate 0.2911 Epoch: 4 Global Step: 19380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:53,893-Speed 11252.66 samples/sec Loss 7.8025 LearningRate 0.2910 Epoch: 4 Global Step: 19390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:41:57,736-Speed 10661.12 samples/sec Loss 7.7679 LearningRate 0.2909 Epoch: 4 Global Step: 19400 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:01,432-Speed 11085.00 samples/sec Loss 7.7534 LearningRate 0.2908 Epoch: 4 Global Step: 19410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:04,893-Speed 11838.75 samples/sec Loss 7.7014 LearningRate 0.2907 Epoch: 4 Global Step: 19420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:08,570-Speed 11143.73 samples/sec Loss 7.7544 LearningRate 0.2906 Epoch: 4 Global Step: 19430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:12,086-Speed 11652.23 samples/sec Loss 7.7592 LearningRate 0.2905 Epoch: 4 Global Step: 19440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:15,917-Speed 10695.95 samples/sec Loss 7.7534 LearningRate 0.2904 Epoch: 4 Global Step: 19450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:19,582-Speed 11180.13 samples/sec Loss 7.7308 LearningRate 0.2903 Epoch: 4 Global Step: 19460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:23,422-Speed 10668.60 samples/sec Loss 7.6581 LearningRate 0.2903 Epoch: 4 Global Step: 19470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:26,940-Speed 11646.32 samples/sec Loss 7.7228 LearningRate 0.2902 Epoch: 4 Global Step: 19480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:30,411-Speed 11807.54 samples/sec Loss 7.7592 LearningRate 0.2901 Epoch: 4 Global Step: 19490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:33,995-Speed 11444.11 samples/sec Loss 7.7761 LearningRate 0.2900 Epoch: 4 Global Step: 19500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:37,620-Speed 11303.71 samples/sec Loss 7.7636 LearningRate 0.2899 Epoch: 4 Global Step: 19510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:41,065-Speed 11893.05 samples/sec Loss 7.7252 LearningRate 0.2898 Epoch: 4 Global Step: 19520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:45,281-Speed 9718.32 samples/sec Loss 7.8964 LearningRate 0.2897 Epoch: 4 Global Step: 19530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:48,788-Speed 11681.99 samples/sec Loss 7.7525 LearningRate 0.2896 Epoch: 4 Global Step: 19540 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:42:52,442-Speed 11213.77 samples/sec Loss 7.7673 LearningRate 0.2895 Epoch: 4 Global Step: 19550 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:42:56,394-Speed 10368.21 samples/sec Loss 7.7157 LearningRate 0.2894 Epoch: 4 Global Step: 19560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:42:59,778-Speed 12109.01 samples/sec Loss 7.7281 LearningRate 0.2893 Epoch: 4 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:03,151-Speed 12146.47 samples/sec Loss 7.7297 LearningRate 0.2893 Epoch: 4 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:06,890-Speed 10959.86 samples/sec Loss 7.7519 LearningRate 0.2892 Epoch: 4 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:10,488-Speed 11390.31 samples/sec Loss 7.7150 LearningRate 0.2891 Epoch: 4 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:14,143-Speed 11209.44 samples/sec Loss 7.7085 LearningRate 0.2890 Epoch: 4 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:17,622-Speed 11777.38 samples/sec Loss 7.7130 LearningRate 0.2889 Epoch: 4 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:21,087-Speed 11825.71 samples/sec Loss 7.7517 LearningRate 0.2888 Epoch: 4 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:24,759-Speed 11157.80 samples/sec Loss 7.7053 LearningRate 0.2887 Epoch: 4 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:28,498-Speed 10957.71 samples/sec Loss 7.7659 LearningRate 0.2886 Epoch: 4 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:32,387-Speed 10534.36 samples/sec Loss 7.8097 LearningRate 0.2885 Epoch: 4 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:36,017-Speed 11289.20 samples/sec Loss 7.7232 LearningRate 0.2884 Epoch: 4 Global Step: 19670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:43:39,468-Speed 11871.69 samples/sec Loss 7.7436 LearningRate 0.2884 Epoch: 4 Global Step: 19680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:43:43,299-Speed 10695.85 samples/sec Loss 7.7099 LearningRate 0.2883 Epoch: 4 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:46,998-Speed 11075.16 samples/sec Loss 7.7467 LearningRate 0.2882 Epoch: 4 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:51,007-Speed 10221.52 samples/sec Loss 7.7072 LearningRate 0.2881 Epoch: 4 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:54,869-Speed 10606.91 samples/sec Loss 7.7652 LearningRate 0.2880 Epoch: 4 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:43:58,471-Speed 11376.44 samples/sec Loss 7.7752 LearningRate 0.2879 Epoch: 4 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:01,925-Speed 11861.22 samples/sec Loss 7.7095 LearningRate 0.2878 Epoch: 4 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:05,417-Speed 11735.71 samples/sec Loss 7.7529 LearningRate 0.2877 Epoch: 4 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:09,064-Speed 11235.59 samples/sec Loss 7.7153 LearningRate 0.2876 Epoch: 4 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:12,595-Speed 11602.11 samples/sec Loss 7.7377 LearningRate 0.2875 Epoch: 4 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:16,350-Speed 10912.98 samples/sec Loss 7.7078 LearningRate 0.2874 Epoch: 4 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:19,896-Speed 11554.23 samples/sec Loss 7.7268 LearningRate 0.2874 Epoch: 4 Global Step: 19790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:44:23,343-Speed 11885.77 samples/sec Loss 7.6777 LearningRate 0.2873 Epoch: 4 Global Step: 19800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:44:26,824-Speed 11770.52 samples/sec Loss 7.6936 LearningRate 0.2872 Epoch: 4 Global Step: 19810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:44:30,456-Speed 11283.33 samples/sec Loss 7.7156 LearningRate 0.2871 Epoch: 4 Global Step: 19820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:34,654-Speed 9758.91 samples/sec Loss 7.7180 LearningRate 0.2870 Epoch: 4 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:38,327-Speed 11154.73 samples/sec Loss 7.6992 LearningRate 0.2869 Epoch: 4 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:42,059-Speed 10978.46 samples/sec Loss 7.7792 LearningRate 0.2868 Epoch: 4 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:45,565-Speed 11687.79 samples/sec Loss 7.7563 LearningRate 0.2867 Epoch: 4 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:49,033-Speed 11815.43 samples/sec Loss 7.7307 LearningRate 0.2866 Epoch: 4 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:53,500-Speed 9172.20 samples/sec Loss 7.7339 LearningRate 0.2865 Epoch: 4 Global Step: 19880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:44:56,883-Speed 12111.45 samples/sec Loss 7.7271 LearningRate 0.2865 Epoch: 4 Global Step: 19890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:45:00,276-Speed 12079.33 samples/sec Loss 7.7174 LearningRate 0.2864 Epoch: 4 Global Step: 19900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:45:03,679-Speed 12039.16 samples/sec Loss 7.7515 LearningRate 0.2863 Epoch: 4 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:45:07,297-Speed 11325.61 samples/sec Loss 7.6954 LearningRate 0.2862 Epoch: 4 Global Step: 19920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:10,922-Speed 11301.88 samples/sec Loss 7.7498 LearningRate 0.2861 Epoch: 4 Global Step: 19930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:14,544-Speed 11313.80 samples/sec Loss 7.7668 LearningRate 0.2860 Epoch: 4 Global Step: 19940 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:18,373-Speed 10699.82 samples/sec Loss 7.6319 LearningRate 0.2859 Epoch: 4 Global Step: 19950 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:21,912-Speed 11577.28 samples/sec Loss 7.6950 LearningRate 0.2858 Epoch: 4 Global Step: 19960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:25,680-Speed 10874.37 samples/sec Loss 7.6962 LearningRate 0.2857 Epoch: 4 Global Step: 19970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:29,256-Speed 11457.77 samples/sec Loss 7.7148 LearningRate 0.2856 Epoch: 4 Global Step: 19980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:32,660-Speed 12036.29 samples/sec Loss 7.6361 LearningRate 0.2856 Epoch: 4 Global Step: 19990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:36,054-Speed 12071.98 samples/sec Loss 7.7243 LearningRate 0.2855 Epoch: 4 Global Step: 20000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:45:57,663-[lfw][20000]XNorm: 12.940398 Training: 2022-01-16 23:45:57,664-[lfw][20000]Accuracy-Flip: 0.99533+-0.00332 Training: 2022-01-16 23:45:57,666-[lfw][20000]Accuracy-Highest: 0.99533 Training: 2022-01-16 23:46:22,465-[cfp_fp][20000]XNorm: 10.867857 Training: 2022-01-16 23:46:22,466-[cfp_fp][20000]Accuracy-Flip: 0.96114+-0.01183 Training: 2022-01-16 23:46:22,466-[cfp_fp][20000]Accuracy-Highest: 0.96114 Training: 2022-01-16 23:46:43,768-[agedb_30][20000]XNorm: 12.406967 Training: 2022-01-16 23:46:43,768-[agedb_30][20000]Accuracy-Flip: 0.95967+-0.00869 Training: 2022-01-16 23:46:43,771-[agedb_30][20000]Accuracy-Highest: 0.95967 Training: 2022-01-16 23:46:47,165-Speed 576.01 samples/sec Loss 7.7259 LearningRate 0.2854 Epoch: 4 Global Step: 20010 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:46:50,526-Speed 12191.14 samples/sec Loss 7.6850 LearningRate 0.2853 Epoch: 4 Global Step: 20020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:46:53,952-Speed 11962.44 samples/sec Loss 7.6660 LearningRate 0.2852 Epoch: 4 Global Step: 20030 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:46:57,328-Speed 12135.40 samples/sec Loss 7.6887 LearningRate 0.2851 Epoch: 4 Global Step: 20040 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:47:00,784-Speed 11857.42 samples/sec Loss 7.6798 LearningRate 0.2850 Epoch: 4 Global Step: 20050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:47:04,242-Speed 11850.89 samples/sec Loss 7.6635 LearningRate 0.2849 Epoch: 4 Global Step: 20060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:47:07,893-Speed 11225.11 samples/sec Loss 7.6587 LearningRate 0.2848 Epoch: 4 Global Step: 20070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:47:11,339-Speed 11892.78 samples/sec Loss 7.6800 LearningRate 0.2847 Epoch: 4 Global Step: 20080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:47:15,603-Speed 9607.15 samples/sec Loss 7.7202 LearningRate 0.2847 Epoch: 4 Global Step: 20090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:47:19,262-Speed 11200.63 samples/sec Loss 7.7089 LearningRate 0.2846 Epoch: 4 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:22,932-Speed 11164.62 samples/sec Loss 7.7043 LearningRate 0.2845 Epoch: 4 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:26,625-Speed 11094.43 samples/sec Loss 7.7215 LearningRate 0.2844 Epoch: 4 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:30,120-Speed 11723.01 samples/sec Loss 7.5926 LearningRate 0.2843 Epoch: 4 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:33,730-Speed 11350.37 samples/sec Loss 7.6637 LearningRate 0.2842 Epoch: 4 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:37,171-Speed 11907.99 samples/sec Loss 7.7012 LearningRate 0.2841 Epoch: 4 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:40,607-Speed 11925.60 samples/sec Loss 7.5990 LearningRate 0.2840 Epoch: 4 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:44,329-Speed 11010.47 samples/sec Loss 7.6208 LearningRate 0.2839 Epoch: 4 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:47,907-Speed 11453.73 samples/sec Loss 7.6795 LearningRate 0.2838 Epoch: 4 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:51,376-Speed 11812.93 samples/sec Loss 7.7109 LearningRate 0.2838 Epoch: 4 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:47:54,784-Speed 12021.22 samples/sec Loss 7.6365 LearningRate 0.2837 Epoch: 4 Global Step: 20200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:47:58,126-Speed 12261.15 samples/sec Loss 7.6559 LearningRate 0.2836 Epoch: 4 Global Step: 20210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:02,440-Speed 9496.75 samples/sec Loss 7.6794 LearningRate 0.2835 Epoch: 4 Global Step: 20220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:06,088-Speed 11234.37 samples/sec Loss 7.6576 LearningRate 0.2834 Epoch: 4 Global Step: 20230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:09,725-Speed 11264.75 samples/sec Loss 7.6017 LearningRate 0.2833 Epoch: 4 Global Step: 20240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:13,421-Speed 11086.36 samples/sec Loss 7.6452 LearningRate 0.2832 Epoch: 4 Global Step: 20250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:16,979-Speed 11517.15 samples/sec Loss 7.6520 LearningRate 0.2831 Epoch: 4 Global Step: 20260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:20,401-Speed 11972.25 samples/sec Loss 7.6782 LearningRate 0.2830 Epoch: 4 Global Step: 20270 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:23,800-Speed 12054.93 samples/sec Loss 7.6122 LearningRate 0.2830 Epoch: 4 Global Step: 20280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:27,328-Speed 11616.37 samples/sec Loss 7.6252 LearningRate 0.2829 Epoch: 4 Global Step: 20290 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:30,776-Speed 11882.72 samples/sec Loss 7.6832 LearningRate 0.2828 Epoch: 4 Global Step: 20300 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:48:34,381-Speed 11363.72 samples/sec Loss 7.6410 LearningRate 0.2827 Epoch: 4 Global Step: 20310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:37,861-Speed 11776.91 samples/sec Loss 7.6408 LearningRate 0.2826 Epoch: 4 Global Step: 20320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:41,484-Speed 11307.55 samples/sec Loss 7.6617 LearningRate 0.2825 Epoch: 4 Global Step: 20330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:45,042-Speed 11519.12 samples/sec Loss 7.6374 LearningRate 0.2824 Epoch: 4 Global Step: 20340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:48,656-Speed 11336.77 samples/sec Loss 7.6135 LearningRate 0.2823 Epoch: 4 Global Step: 20350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:52,422-Speed 10879.36 samples/sec Loss 7.6057 LearningRate 0.2822 Epoch: 4 Global Step: 20360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:55,866-Speed 11896.87 samples/sec Loss 7.6145 LearningRate 0.2821 Epoch: 4 Global Step: 20370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:48:59,405-Speed 11577.10 samples/sec Loss 7.6508 LearningRate 0.2821 Epoch: 4 Global Step: 20380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:03,668-Speed 9611.62 samples/sec Loss 7.6375 LearningRate 0.2820 Epoch: 4 Global Step: 20390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:07,562-Speed 10522.18 samples/sec Loss 7.6382 LearningRate 0.2819 Epoch: 4 Global Step: 20400 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:11,166-Speed 11371.56 samples/sec Loss 7.6379 LearningRate 0.2818 Epoch: 4 Global Step: 20410 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:49:14,781-Speed 11332.66 samples/sec Loss 7.7005 LearningRate 0.2817 Epoch: 4 Global Step: 20420 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:49:18,389-Speed 11354.33 samples/sec Loss 7.6111 LearningRate 0.2816 Epoch: 4 Global Step: 20430 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:49:22,071-Speed 11128.65 samples/sec Loss 7.6306 LearningRate 0.2815 Epoch: 4 Global Step: 20440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:26,096-Speed 10179.95 samples/sec Loss 7.6451 LearningRate 0.2814 Epoch: 4 Global Step: 20450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:29,520-Speed 11966.57 samples/sec Loss 7.6091 LearningRate 0.2813 Epoch: 4 Global Step: 20460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:33,151-Speed 11286.01 samples/sec Loss 7.5666 LearningRate 0.2813 Epoch: 4 Global Step: 20470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:36,653-Speed 11697.61 samples/sec Loss 7.7235 LearningRate 0.2812 Epoch: 4 Global Step: 20480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:40,107-Speed 11863.02 samples/sec Loss 7.6215 LearningRate 0.2811 Epoch: 4 Global Step: 20490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:43,546-Speed 11915.21 samples/sec Loss 7.6429 LearningRate 0.2810 Epoch: 4 Global Step: 20500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:46,923-Speed 12132.31 samples/sec Loss 7.6390 LearningRate 0.2809 Epoch: 4 Global Step: 20510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:50,340-Speed 11992.43 samples/sec Loss 7.6111 LearningRate 0.2808 Epoch: 4 Global Step: 20520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:54,072-Speed 10976.64 samples/sec Loss 7.5955 LearningRate 0.2807 Epoch: 4 Global Step: 20530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:49:57,727-Speed 11211.92 samples/sec Loss 7.6217 LearningRate 0.2806 Epoch: 4 Global Step: 20540 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:50:01,108-Speed 12119.30 samples/sec Loss 7.5655 LearningRate 0.2805 Epoch: 4 Global Step: 20550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:05,240-Speed 9915.94 samples/sec Loss 7.5943 LearningRate 0.2804 Epoch: 4 Global Step: 20560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:08,604-Speed 12181.72 samples/sec Loss 7.6235 LearningRate 0.2804 Epoch: 4 Global Step: 20570 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:12,018-Speed 12002.16 samples/sec Loss 7.5685 LearningRate 0.2803 Epoch: 4 Global Step: 20580 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:15,407-Speed 12089.05 samples/sec Loss 7.6109 LearningRate 0.2802 Epoch: 4 Global Step: 20590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:18,910-Speed 11695.51 samples/sec Loss 7.5680 LearningRate 0.2801 Epoch: 4 Global Step: 20600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:22,321-Speed 12012.06 samples/sec Loss 7.6578 LearningRate 0.2800 Epoch: 4 Global Step: 20610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:26,019-Speed 11080.66 samples/sec Loss 7.6128 LearningRate 0.2799 Epoch: 4 Global Step: 20620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:30,199-Speed 9803.28 samples/sec Loss 7.6051 LearningRate 0.2798 Epoch: 4 Global Step: 20630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:33,871-Speed 11156.45 samples/sec Loss 7.5599 LearningRate 0.2797 Epoch: 4 Global Step: 20640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:37,397-Speed 11620.44 samples/sec Loss 7.5785 LearningRate 0.2796 Epoch: 4 Global Step: 20650 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:50:40,802-Speed 12035.99 samples/sec Loss 7.6406 LearningRate 0.2796 Epoch: 4 Global Step: 20660 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:50:44,307-Speed 11687.87 samples/sec Loss 7.5566 LearningRate 0.2795 Epoch: 4 Global Step: 20670 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:50:47,862-Speed 11528.67 samples/sec Loss 7.5527 LearningRate 0.2794 Epoch: 4 Global Step: 20680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:51,409-Speed 11554.81 samples/sec Loss 7.6185 LearningRate 0.2793 Epoch: 4 Global Step: 20690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:55,264-Speed 10626.09 samples/sec Loss 7.6349 LearningRate 0.2792 Epoch: 4 Global Step: 20700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:50:58,905-Speed 11254.10 samples/sec Loss 7.5902 LearningRate 0.2791 Epoch: 4 Global Step: 20710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:02,295-Speed 12085.45 samples/sec Loss 7.5920 LearningRate 0.2790 Epoch: 4 Global Step: 20720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:05,990-Speed 11090.16 samples/sec Loss 7.5611 LearningRate 0.2789 Epoch: 4 Global Step: 20730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:09,645-Speed 11208.41 samples/sec Loss 7.6080 LearningRate 0.2788 Epoch: 4 Global Step: 20740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:13,051-Speed 12031.73 samples/sec Loss 7.6366 LearningRate 0.2788 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:16,544-Speed 11730.14 samples/sec Loss 7.5660 LearningRate 0.2787 Epoch: 4 Global Step: 20760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:20,604-Speed 10091.14 samples/sec Loss 7.6303 LearningRate 0.2786 Epoch: 4 Global Step: 20770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:24,664-Speed 10092.95 samples/sec Loss 7.6168 LearningRate 0.2785 Epoch: 4 Global Step: 20780 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:51:28,113-Speed 11881.41 samples/sec Loss 7.5824 LearningRate 0.2784 Epoch: 4 Global Step: 20790 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:51:31,797-Speed 11121.59 samples/sec Loss 7.5901 LearningRate 0.2783 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:51:35,508-Speed 11042.95 samples/sec Loss 7.5416 LearningRate 0.2782 Epoch: 4 Global Step: 20810 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:51:39,042-Speed 11591.48 samples/sec Loss 7.5363 LearningRate 0.2781 Epoch: 4 Global Step: 20820 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:42,644-Speed 11376.00 samples/sec Loss 7.5115 LearningRate 0.2780 Epoch: 4 Global Step: 20830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:46,206-Speed 11502.16 samples/sec Loss 7.5414 LearningRate 0.2780 Epoch: 4 Global Step: 20840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:50,148-Speed 10394.39 samples/sec Loss 7.5597 LearningRate 0.2779 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:51:53,905-Speed 10906.27 samples/sec Loss 7.5460 LearningRate 0.2778 Epoch: 4 Global Step: 20860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:52:26,365-Speed 1261.93 samples/sec Loss 6.7896 LearningRate 0.2777 Epoch: 5 Global Step: 20870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:52:30,599-Speed 9677.57 samples/sec Loss 6.7093 LearningRate 0.2776 Epoch: 5 Global Step: 20880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:52:34,760-Speed 9846.23 samples/sec Loss 6.8049 LearningRate 0.2775 Epoch: 5 Global Step: 20890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:52:38,743-Speed 10287.15 samples/sec Loss 6.7461 LearningRate 0.2774 Epoch: 5 Global Step: 20900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:52:42,880-Speed 9904.84 samples/sec Loss 6.8316 LearningRate 0.2773 Epoch: 5 Global Step: 20910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:52:46,440-Speed 11507.90 samples/sec Loss 6.7732 LearningRate 0.2772 Epoch: 5 Global Step: 20920 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:52:50,064-Speed 11308.44 samples/sec Loss 6.7160 LearningRate 0.2772 Epoch: 5 Global Step: 20930 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:52:53,694-Speed 11288.08 samples/sec Loss 6.8447 LearningRate 0.2771 Epoch: 5 Global Step: 20940 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:52:57,694-Speed 10243.27 samples/sec Loss 6.8505 LearningRate 0.2770 Epoch: 5 Global Step: 20950 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:53:01,568-Speed 10575.33 samples/sec Loss 6.8779 LearningRate 0.2769 Epoch: 5 Global Step: 20960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:53:06,639-Speed 8079.47 samples/sec Loss 6.9340 LearningRate 0.2768 Epoch: 5 Global Step: 20970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:53:10,413-Speed 10858.74 samples/sec Loss 6.8718 LearningRate 0.2767 Epoch: 5 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:13,915-Speed 11698.62 samples/sec Loss 6.8555 LearningRate 0.2766 Epoch: 5 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:17,585-Speed 11162.75 samples/sec Loss 6.9203 LearningRate 0.2765 Epoch: 5 Global Step: 21000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:21,049-Speed 11830.54 samples/sec Loss 6.9271 LearningRate 0.2764 Epoch: 5 Global Step: 21010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:24,462-Speed 12006.20 samples/sec Loss 7.0023 LearningRate 0.2764 Epoch: 5 Global Step: 21020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:28,318-Speed 10625.18 samples/sec Loss 6.9468 LearningRate 0.2763 Epoch: 5 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:32,061-Speed 10947.20 samples/sec Loss 6.9600 LearningRate 0.2762 Epoch: 5 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:35,584-Speed 11628.85 samples/sec Loss 6.9497 LearningRate 0.2761 Epoch: 5 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:39,259-Speed 11148.29 samples/sec Loss 6.9406 LearningRate 0.2760 Epoch: 5 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:42,722-Speed 11834.81 samples/sec Loss 6.9879 LearningRate 0.2759 Epoch: 5 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:53:46,521-Speed 10785.40 samples/sec Loss 6.9728 LearningRate 0.2758 Epoch: 5 Global Step: 21080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:53:50,226-Speed 11057.59 samples/sec Loss 7.0191 LearningRate 0.2757 Epoch: 5 Global Step: 21090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:53:53,761-Speed 11592.26 samples/sec Loss 6.9803 LearningRate 0.2757 Epoch: 5 Global Step: 21100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:53:57,285-Speed 11626.49 samples/sec Loss 7.0231 LearningRate 0.2756 Epoch: 5 Global Step: 21110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:00,822-Speed 11587.08 samples/sec Loss 6.9998 LearningRate 0.2755 Epoch: 5 Global Step: 21120 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:04,425-Speed 11370.34 samples/sec Loss 6.9914 LearningRate 0.2754 Epoch: 5 Global Step: 21130 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:07,938-Speed 11662.94 samples/sec Loss 7.0601 LearningRate 0.2753 Epoch: 5 Global Step: 21140 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:11,401-Speed 11834.20 samples/sec Loss 7.0306 LearningRate 0.2752 Epoch: 5 Global Step: 21150 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:14,914-Speed 11663.30 samples/sec Loss 7.0445 LearningRate 0.2751 Epoch: 5 Global Step: 21160 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:18,893-Speed 10297.28 samples/sec Loss 6.9835 LearningRate 0.2750 Epoch: 5 Global Step: 21170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:22,390-Speed 11717.01 samples/sec Loss 7.0861 LearningRate 0.2749 Epoch: 5 Global Step: 21180 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:54:26,048-Speed 11201.27 samples/sec Loss 7.0841 LearningRate 0.2749 Epoch: 5 Global Step: 21190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:29,894-Speed 10654.70 samples/sec Loss 7.0364 LearningRate 0.2748 Epoch: 5 Global Step: 21200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:33,513-Speed 11318.94 samples/sec Loss 7.0824 LearningRate 0.2747 Epoch: 5 Global Step: 21210 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:37,168-Speed 11212.67 samples/sec Loss 7.1027 LearningRate 0.2746 Epoch: 5 Global Step: 21220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:40,640-Speed 11799.25 samples/sec Loss 7.0920 LearningRate 0.2745 Epoch: 5 Global Step: 21230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:44,100-Speed 11847.59 samples/sec Loss 7.0923 LearningRate 0.2744 Epoch: 5 Global Step: 21240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:47,697-Speed 11388.04 samples/sec Loss 7.1512 LearningRate 0.2743 Epoch: 5 Global Step: 21250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:51,164-Speed 11817.65 samples/sec Loss 7.0862 LearningRate 0.2742 Epoch: 5 Global Step: 21260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:54,780-Speed 11331.70 samples/sec Loss 7.1243 LearningRate 0.2741 Epoch: 5 Global Step: 21270 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:54:58,163-Speed 12114.64 samples/sec Loss 7.2166 LearningRate 0.2741 Epoch: 5 Global Step: 21280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:55:01,851-Speed 11108.70 samples/sec Loss 7.1289 LearningRate 0.2740 Epoch: 5 Global Step: 21290 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:55:05,250-Speed 12053.95 samples/sec Loss 7.1562 LearningRate 0.2739 Epoch: 5 Global Step: 21300 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:55:08,779-Speed 11609.96 samples/sec Loss 7.0909 LearningRate 0.2738 Epoch: 5 Global Step: 21310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:55:12,411-Speed 11281.07 samples/sec Loss 7.1765 LearningRate 0.2737 Epoch: 5 Global Step: 21320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:55:16,175-Speed 10886.95 samples/sec Loss 7.1163 LearningRate 0.2736 Epoch: 5 Global Step: 21330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:55:19,875-Speed 11075.33 samples/sec Loss 7.1325 LearningRate 0.2735 Epoch: 5 Global Step: 21340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:55:23,273-Speed 12058.95 samples/sec Loss 7.1357 LearningRate 0.2734 Epoch: 5 Global Step: 21350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:55:26,920-Speed 11233.87 samples/sec Loss 7.1164 LearningRate 0.2734 Epoch: 5 Global Step: 21360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:55:30,880-Speed 10345.24 samples/sec Loss 7.1328 LearningRate 0.2733 Epoch: 5 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:34,414-Speed 11594.54 samples/sec Loss 7.1777 LearningRate 0.2732 Epoch: 5 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:38,037-Speed 11310.53 samples/sec Loss 7.1719 LearningRate 0.2731 Epoch: 5 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:41,484-Speed 11887.27 samples/sec Loss 7.1773 LearningRate 0.2730 Epoch: 5 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:45,118-Speed 11274.79 samples/sec Loss 7.1461 LearningRate 0.2729 Epoch: 5 Global Step: 21410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:48,613-Speed 11723.07 samples/sec Loss 7.1392 LearningRate 0.2728 Epoch: 5 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:52,401-Speed 10816.67 samples/sec Loss 7.2094 LearningRate 0.2727 Epoch: 5 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:55,984-Speed 11436.53 samples/sec Loss 7.1903 LearningRate 0.2727 Epoch: 5 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:55:59,425-Speed 11907.91 samples/sec Loss 7.1618 LearningRate 0.2726 Epoch: 5 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:56:04,267-Speed 8461.86 samples/sec Loss 7.2342 LearningRate 0.2725 Epoch: 5 Global Step: 21460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:56:08,089-Speed 10720.13 samples/sec Loss 7.1317 LearningRate 0.2724 Epoch: 5 Global Step: 21470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:11,743-Speed 11212.92 samples/sec Loss 7.1687 LearningRate 0.2723 Epoch: 5 Global Step: 21480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:15,217-Speed 11796.47 samples/sec Loss 7.1714 LearningRate 0.2722 Epoch: 5 Global Step: 21490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:18,816-Speed 11387.66 samples/sec Loss 7.1880 LearningRate 0.2721 Epoch: 5 Global Step: 21500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:22,701-Speed 10544.65 samples/sec Loss 7.2290 LearningRate 0.2720 Epoch: 5 Global Step: 21510 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:26,397-Speed 11087.46 samples/sec Loss 7.1987 LearningRate 0.2719 Epoch: 5 Global Step: 21520 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:29,915-Speed 11645.65 samples/sec Loss 7.1885 LearningRate 0.2719 Epoch: 5 Global Step: 21530 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:33,440-Speed 11623.10 samples/sec Loss 7.1577 LearningRate 0.2718 Epoch: 5 Global Step: 21540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:37,035-Speed 11399.23 samples/sec Loss 7.1936 LearningRate 0.2717 Epoch: 5 Global Step: 21550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:40,580-Speed 11558.51 samples/sec Loss 7.2355 LearningRate 0.2716 Epoch: 5 Global Step: 21560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:43,960-Speed 12118.80 samples/sec Loss 7.1688 LearningRate 0.2715 Epoch: 5 Global Step: 21570 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:56:47,770-Speed 10754.51 samples/sec Loss 7.2801 LearningRate 0.2714 Epoch: 5 Global Step: 21580 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:56:51,463-Speed 11093.96 samples/sec Loss 7.2666 LearningRate 0.2713 Epoch: 5 Global Step: 21590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:55,510-Speed 10124.92 samples/sec Loss 7.1920 LearningRate 0.2712 Epoch: 5 Global Step: 21600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:56:59,175-Speed 11180.79 samples/sec Loss 7.1860 LearningRate 0.2712 Epoch: 5 Global Step: 21610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:57:02,896-Speed 11014.87 samples/sec Loss 7.2732 LearningRate 0.2711 Epoch: 5 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:06,579-Speed 11125.69 samples/sec Loss 7.1620 LearningRate 0.2710 Epoch: 5 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:10,247-Speed 11168.96 samples/sec Loss 7.1866 LearningRate 0.2709 Epoch: 5 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:13,859-Speed 11347.06 samples/sec Loss 7.2655 LearningRate 0.2708 Epoch: 5 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:17,587-Speed 10990.82 samples/sec Loss 7.1869 LearningRate 0.2707 Epoch: 5 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:21,058-Speed 11803.04 samples/sec Loss 7.2494 LearningRate 0.2706 Epoch: 5 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:24,814-Speed 10906.71 samples/sec Loss 7.1898 LearningRate 0.2705 Epoch: 5 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:28,306-Speed 11734.25 samples/sec Loss 7.2003 LearningRate 0.2705 Epoch: 5 Global Step: 21690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:32,012-Speed 11055.08 samples/sec Loss 7.3302 LearningRate 0.2704 Epoch: 5 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:35,552-Speed 11574.89 samples/sec Loss 7.2403 LearningRate 0.2703 Epoch: 5 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:57:39,393-Speed 10668.19 samples/sec Loss 7.2233 LearningRate 0.2702 Epoch: 5 Global Step: 21720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:57:43,172-Speed 10841.84 samples/sec Loss 7.2724 LearningRate 0.2701 Epoch: 5 Global Step: 21730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:57:46,780-Speed 11354.33 samples/sec Loss 7.2630 LearningRate 0.2700 Epoch: 5 Global Step: 21740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:57:50,411-Speed 11287.72 samples/sec Loss 7.2837 LearningRate 0.2699 Epoch: 5 Global Step: 21750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:57:53,960-Speed 11545.41 samples/sec Loss 7.2630 LearningRate 0.2698 Epoch: 5 Global Step: 21760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:57:57,591-Speed 11283.37 samples/sec Loss 7.2934 LearningRate 0.2698 Epoch: 5 Global Step: 21770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:01,413-Speed 10722.07 samples/sec Loss 7.2522 LearningRate 0.2697 Epoch: 5 Global Step: 21780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:04,830-Speed 12008.15 samples/sec Loss 7.2893 LearningRate 0.2696 Epoch: 5 Global Step: 21790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:08,380-Speed 11540.20 samples/sec Loss 7.2893 LearningRate 0.2695 Epoch: 5 Global Step: 21800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:11,845-Speed 11825.25 samples/sec Loss 7.1779 LearningRate 0.2694 Epoch: 5 Global Step: 21810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:15,317-Speed 11802.29 samples/sec Loss 7.3041 LearningRate 0.2693 Epoch: 5 Global Step: 21820 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:19,807-Speed 9125.07 samples/sec Loss 7.2100 LearningRate 0.2692 Epoch: 5 Global Step: 21830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:23,355-Speed 11548.87 samples/sec Loss 7.2571 LearningRate 0.2691 Epoch: 5 Global Step: 21840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:26,814-Speed 11846.11 samples/sec Loss 7.3109 LearningRate 0.2691 Epoch: 5 Global Step: 21850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:30,217-Speed 12041.37 samples/sec Loss 7.3776 LearningRate 0.2690 Epoch: 5 Global Step: 21860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:33,862-Speed 11240.83 samples/sec Loss 7.2761 LearningRate 0.2689 Epoch: 5 Global Step: 21870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:37,365-Speed 11695.91 samples/sec Loss 7.2943 LearningRate 0.2688 Epoch: 5 Global Step: 21880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:58:40,890-Speed 11625.30 samples/sec Loss 7.2683 LearningRate 0.2687 Epoch: 5 Global Step: 21890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:58:44,410-Speed 11639.22 samples/sec Loss 7.2630 LearningRate 0.2686 Epoch: 5 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:58:47,940-Speed 11606.74 samples/sec Loss 7.2265 LearningRate 0.2685 Epoch: 5 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:58:51,605-Speed 11180.85 samples/sec Loss 7.2512 LearningRate 0.2684 Epoch: 5 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:58:55,095-Speed 11740.88 samples/sec Loss 7.2657 LearningRate 0.2684 Epoch: 5 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:58:58,871-Speed 10851.41 samples/sec Loss 7.3077 LearningRate 0.2683 Epoch: 5 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:59:02,449-Speed 11451.72 samples/sec Loss 7.3734 LearningRate 0.2682 Epoch: 5 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:59:05,925-Speed 11788.58 samples/sec Loss 7.2956 LearningRate 0.2681 Epoch: 5 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:59:09,539-Speed 11335.81 samples/sec Loss 7.3445 LearningRate 0.2680 Epoch: 5 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:59:13,198-Speed 11197.02 samples/sec Loss 7.2504 LearningRate 0.2679 Epoch: 5 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 23:59:17,008-Speed 10754.18 samples/sec Loss 7.2960 LearningRate 0.2678 Epoch: 5 Global Step: 21990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:20,522-Speed 11661.57 samples/sec Loss 7.3101 LearningRate 0.2677 Epoch: 5 Global Step: 22000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:24,240-Speed 11020.90 samples/sec Loss 7.3552 LearningRate 0.2677 Epoch: 5 Global Step: 22010 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:27,941-Speed 11067.45 samples/sec Loss 7.3774 LearningRate 0.2676 Epoch: 5 Global Step: 22020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:31,943-Speed 10237.46 samples/sec Loss 7.3584 LearningRate 0.2675 Epoch: 5 Global Step: 22030 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:35,747-Speed 10773.14 samples/sec Loss 7.3253 LearningRate 0.2674 Epoch: 5 Global Step: 22040 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:39,205-Speed 11847.24 samples/sec Loss 7.3040 LearningRate 0.2673 Epoch: 5 Global Step: 22050 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:42,687-Speed 11769.35 samples/sec Loss 7.2336 LearningRate 0.2672 Epoch: 5 Global Step: 22060 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:46,053-Speed 12174.03 samples/sec Loss 7.3083 LearningRate 0.2671 Epoch: 5 Global Step: 22070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:49,592-Speed 11579.09 samples/sec Loss 7.3112 LearningRate 0.2671 Epoch: 5 Global Step: 22080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 23:59:53,243-Speed 11223.24 samples/sec Loss 7.2808 LearningRate 0.2670 Epoch: 5 Global Step: 22090 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-16 23:59:57,058-Speed 10739.31 samples/sec Loss 7.2892 LearningRate 0.2669 Epoch: 5 Global Step: 22100 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-17 00:00:00,548-Speed 11739.23 samples/sec Loss 7.3114 LearningRate 0.2668 Epoch: 5 Global Step: 22110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:00:04,167-Speed 11320.95 samples/sec Loss 7.3044 LearningRate 0.2667 Epoch: 5 Global Step: 22120 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:00:07,655-Speed 11747.31 samples/sec Loss 7.3127 LearningRate 0.2666 Epoch: 5 Global Step: 22130 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:00:11,242-Speed 11423.01 samples/sec Loss 7.3202 LearningRate 0.2665 Epoch: 5 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:14,752-Speed 11674.14 samples/sec Loss 7.3468 LearningRate 0.2664 Epoch: 5 Global Step: 22150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:18,415-Speed 11186.49 samples/sec Loss 7.4105 LearningRate 0.2664 Epoch: 5 Global Step: 22160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:21,938-Speed 11632.29 samples/sec Loss 7.3076 LearningRate 0.2663 Epoch: 5 Global Step: 22170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:25,423-Speed 11756.06 samples/sec Loss 7.3330 LearningRate 0.2662 Epoch: 5 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:28,899-Speed 11786.25 samples/sec Loss 7.3131 LearningRate 0.2661 Epoch: 5 Global Step: 22190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:32,487-Speed 11420.10 samples/sec Loss 7.3079 LearningRate 0.2660 Epoch: 5 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:36,345-Speed 10620.15 samples/sec Loss 7.2705 LearningRate 0.2659 Epoch: 5 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:39,996-Speed 11223.93 samples/sec Loss 7.2845 LearningRate 0.2658 Epoch: 5 Global Step: 22220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:43,595-Speed 11385.28 samples/sec Loss 7.3312 LearningRate 0.2657 Epoch: 5 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:00:47,654-Speed 10093.85 samples/sec Loss 7.3624 LearningRate 0.2657 Epoch: 5 Global Step: 22240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:00:51,761-Speed 9975.31 samples/sec Loss 7.3228 LearningRate 0.2656 Epoch: 5 Global Step: 22250 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:00:55,203-Speed 11905.30 samples/sec Loss 7.3206 LearningRate 0.2655 Epoch: 5 Global Step: 22260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:00:58,998-Speed 10794.15 samples/sec Loss 7.3349 LearningRate 0.2654 Epoch: 5 Global Step: 22270 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:02,510-Speed 11668.19 samples/sec Loss 7.3782 LearningRate 0.2653 Epoch: 5 Global Step: 22280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:06,006-Speed 11719.42 samples/sec Loss 7.3044 LearningRate 0.2652 Epoch: 5 Global Step: 22290 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:09,558-Speed 11537.35 samples/sec Loss 7.3204 LearningRate 0.2651 Epoch: 5 Global Step: 22300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:13,266-Speed 11048.54 samples/sec Loss 7.3079 LearningRate 0.2651 Epoch: 5 Global Step: 22310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:16,819-Speed 11533.06 samples/sec Loss 7.3403 LearningRate 0.2650 Epoch: 5 Global Step: 22320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:20,379-Speed 11507.35 samples/sec Loss 7.3782 LearningRate 0.2649 Epoch: 5 Global Step: 22330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:24,348-Speed 10323.81 samples/sec Loss 7.3061 LearningRate 0.2648 Epoch: 5 Global Step: 22340 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-17 00:01:28,306-Speed 10351.46 samples/sec Loss 7.3644 LearningRate 0.2647 Epoch: 5 Global Step: 22350 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-17 00:01:32,166-Speed 10617.12 samples/sec Loss 7.3424 LearningRate 0.2646 Epoch: 5 Global Step: 22360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:35,683-Speed 11646.94 samples/sec Loss 7.3333 LearningRate 0.2645 Epoch: 5 Global Step: 22370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:39,360-Speed 11144.29 samples/sec Loss 7.3151 LearningRate 0.2644 Epoch: 5 Global Step: 22380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:42,821-Speed 11837.65 samples/sec Loss 7.3559 LearningRate 0.2644 Epoch: 5 Global Step: 22390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:46,785-Speed 10337.30 samples/sec Loss 7.3067 LearningRate 0.2643 Epoch: 5 Global Step: 22400 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:50,182-Speed 12059.50 samples/sec Loss 7.3174 LearningRate 0.2642 Epoch: 5 Global Step: 22410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:53,999-Speed 10735.66 samples/sec Loss 7.2769 LearningRate 0.2641 Epoch: 5 Global Step: 22420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:01:57,659-Speed 11194.14 samples/sec Loss 7.3133 LearningRate 0.2640 Epoch: 5 Global Step: 22430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:02:01,156-Speed 11716.42 samples/sec Loss 7.2696 LearningRate 0.2639 Epoch: 5 Global Step: 22440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:02:04,799-Speed 11247.71 samples/sec Loss 7.3465 LearningRate 0.2638 Epoch: 5 Global Step: 22450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:02:08,382-Speed 11434.01 samples/sec Loss 7.3625 LearningRate 0.2638 Epoch: 5 Global Step: 22460 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-17 00:02:12,129-Speed 10935.31 samples/sec Loss 7.3791 LearningRate 0.2637 Epoch: 5 Global Step: 22470 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-17 00:02:15,628-Speed 11711.09 samples/sec Loss 7.3321 LearningRate 0.2636 Epoch: 5 Global Step: 22480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:02:19,210-Speed 11437.90 samples/sec Loss 7.2564 LearningRate 0.2635 Epoch: 5 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:23,091-Speed 10556.89 samples/sec Loss 7.2509 LearningRate 0.2634 Epoch: 5 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:26,669-Speed 11453.27 samples/sec Loss 7.3716 LearningRate 0.2633 Epoch: 5 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:30,699-Speed 10167.14 samples/sec Loss 7.3274 LearningRate 0.2632 Epoch: 5 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:34,269-Speed 11477.95 samples/sec Loss 7.2818 LearningRate 0.2632 Epoch: 5 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:38,221-Speed 10368.08 samples/sec Loss 7.3095 LearningRate 0.2631 Epoch: 5 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:42,119-Speed 10511.31 samples/sec Loss 7.3111 LearningRate 0.2630 Epoch: 5 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:45,633-Speed 11662.95 samples/sec Loss 7.3628 LearningRate 0.2629 Epoch: 5 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:49,054-Speed 11974.64 samples/sec Loss 7.3882 LearningRate 0.2628 Epoch: 5 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:52,414-Speed 12193.75 samples/sec Loss 7.3239 LearningRate 0.2627 Epoch: 5 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:02:55,884-Speed 11807.18 samples/sec Loss 7.2970 LearningRate 0.2626 Epoch: 5 Global Step: 22590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:02:59,279-Speed 12071.29 samples/sec Loss 7.2973 LearningRate 0.2625 Epoch: 5 Global Step: 22600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:02,767-Speed 11745.99 samples/sec Loss 7.3134 LearningRate 0.2625 Epoch: 5 Global Step: 22610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:06,199-Speed 11939.90 samples/sec Loss 7.3358 LearningRate 0.2624 Epoch: 5 Global Step: 22620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:09,760-Speed 11508.02 samples/sec Loss 7.3470 LearningRate 0.2623 Epoch: 5 Global Step: 22630 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:13,220-Speed 11841.08 samples/sec Loss 7.3239 LearningRate 0.2622 Epoch: 5 Global Step: 22640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:16,815-Speed 11397.93 samples/sec Loss 7.2716 LearningRate 0.2621 Epoch: 5 Global Step: 22650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:20,377-Speed 11504.52 samples/sec Loss 7.2866 LearningRate 0.2620 Epoch: 5 Global Step: 22660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:23,979-Speed 11375.44 samples/sec Loss 7.3297 LearningRate 0.2619 Epoch: 5 Global Step: 22670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:27,383-Speed 12037.09 samples/sec Loss 7.3024 LearningRate 0.2619 Epoch: 5 Global Step: 22680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:31,039-Speed 11207.65 samples/sec Loss 7.2661 LearningRate 0.2618 Epoch: 5 Global Step: 22690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:34,699-Speed 11195.31 samples/sec Loss 7.3084 LearningRate 0.2617 Epoch: 5 Global Step: 22700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:38,191-Speed 11731.68 samples/sec Loss 7.2012 LearningRate 0.2616 Epoch: 5 Global Step: 22710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:41,675-Speed 11764.60 samples/sec Loss 7.3087 LearningRate 0.2615 Epoch: 5 Global Step: 22720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:45,196-Speed 11637.21 samples/sec Loss 7.2748 LearningRate 0.2614 Epoch: 5 Global Step: 22730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:49,276-Speed 10042.44 samples/sec Loss 7.3170 LearningRate 0.2613 Epoch: 5 Global Step: 22740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:53,057-Speed 10836.81 samples/sec Loss 7.3402 LearningRate 0.2613 Epoch: 5 Global Step: 22750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:03:56,752-Speed 11088.08 samples/sec Loss 7.2808 LearningRate 0.2612 Epoch: 5 Global Step: 22760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:04:00,441-Speed 11106.43 samples/sec Loss 7.3421 LearningRate 0.2611 Epoch: 5 Global Step: 22770 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:04:03,908-Speed 11819.37 samples/sec Loss 7.3027 LearningRate 0.2610 Epoch: 5 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:07,485-Speed 11453.90 samples/sec Loss 7.3071 LearningRate 0.2609 Epoch: 5 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:11,196-Speed 11040.30 samples/sec Loss 7.2938 LearningRate 0.2608 Epoch: 5 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:14,792-Speed 11393.49 samples/sec Loss 7.3515 LearningRate 0.2607 Epoch: 5 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:18,470-Speed 11141.16 samples/sec Loss 7.3052 LearningRate 0.2607 Epoch: 5 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:22,022-Speed 11536.13 samples/sec Loss 7.4130 LearningRate 0.2606 Epoch: 5 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:25,586-Speed 11495.37 samples/sec Loss 7.3100 LearningRate 0.2605 Epoch: 5 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:29,800-Speed 9722.03 samples/sec Loss 7.3169 LearningRate 0.2604 Epoch: 5 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:33,301-Speed 11703.88 samples/sec Loss 7.3254 LearningRate 0.2603 Epoch: 5 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:36,824-Speed 11633.37 samples/sec Loss 7.3821 LearningRate 0.2602 Epoch: 5 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:04:40,579-Speed 10909.84 samples/sec Loss 7.4061 LearningRate 0.2601 Epoch: 5 Global Step: 22880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:04:44,233-Speed 11214.81 samples/sec Loss 7.2882 LearningRate 0.2600 Epoch: 5 Global Step: 22890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:04:47,667-Speed 11930.03 samples/sec Loss 7.3457 LearningRate 0.2600 Epoch: 5 Global Step: 22900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:04:51,161-Speed 11725.06 samples/sec Loss 7.3320 LearningRate 0.2599 Epoch: 5 Global Step: 22910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:04:54,745-Speed 11434.97 samples/sec Loss 7.3485 LearningRate 0.2598 Epoch: 5 Global Step: 22920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:04:58,258-Speed 11662.01 samples/sec Loss 7.3171 LearningRate 0.2597 Epoch: 5 Global Step: 22930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:05:01,832-Speed 11465.81 samples/sec Loss 7.3203 LearningRate 0.2596 Epoch: 5 Global Step: 22940 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:05:05,547-Speed 11029.45 samples/sec Loss 7.2531 LearningRate 0.2595 Epoch: 5 Global Step: 22950 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:05:09,020-Speed 11795.57 samples/sec Loss 7.3002 LearningRate 0.2594 Epoch: 5 Global Step: 22960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:05:12,789-Speed 10873.46 samples/sec Loss 7.2771 LearningRate 0.2594 Epoch: 5 Global Step: 22970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-17 00:05:16,549-Speed 10895.81 samples/sec Loss 7.2928 LearningRate 0.2593 Epoch: 5 Global Step: 22980 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-01-17 00:05:20,019-Speed 11806.97 samples/sec Loss 7.2980 LearningRate 0.2592 Epoch: 5 Global Step: 22990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:05:23,559-Speed 11573.93 samples/sec Loss 7.2950 LearningRate 0.2591 Epoch: 5 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:05:26,980-Speed 11977.39 samples/sec Loss 7.3451 LearningRate 0.2590 Epoch: 5 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:05:30,941-Speed 10345.29 samples/sec Loss 7.3775 LearningRate 0.2589 Epoch: 5 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:05:34,660-Speed 11015.24 samples/sec Loss 7.3064 LearningRate 0.2588 Epoch: 5 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:05:38,053-Speed 12077.38 samples/sec Loss 7.2735 LearningRate 0.2588 Epoch: 5 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:05:41,573-Speed 11641.64 samples/sec Loss 7.2921 LearningRate 0.2587 Epoch: 5 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-17 00:05:45,218-Speed 11240.93 samples/sec Loss 7.3267 LearningRate 0.2586 Epoch: 5 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:05:48,842-Speed 11304.27 samples/sec Loss 7.2958 LearningRate 0.2585 Epoch: 5 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:05:52,349-Speed 11684.11 samples/sec Loss 7.2802 LearningRate 0.2584 Epoch: 5 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:05:55,849-Speed 11708.02 samples/sec Loss 7.2727 LearningRate 0.2583 Epoch: 5 Global Step: 23090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:05:59,291-Speed 11903.66 samples/sec Loss 7.3488 LearningRate 0.2582 Epoch: 5 Global Step: 23100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:02,889-Speed 11388.06 samples/sec Loss 7.3190 LearningRate 0.2582 Epoch: 5 Global Step: 23110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:06,703-Speed 10742.30 samples/sec Loss 7.3683 LearningRate 0.2581 Epoch: 5 Global Step: 23120 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:10,336-Speed 11277.94 samples/sec Loss 7.2714 LearningRate 0.2580 Epoch: 5 Global Step: 23130 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:14,128-Speed 10806.74 samples/sec Loss 7.3308 LearningRate 0.2579 Epoch: 5 Global Step: 23140 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:17,580-Speed 11867.60 samples/sec Loss 7.2859 LearningRate 0.2578 Epoch: 5 Global Step: 23150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:21,241-Speed 11193.36 samples/sec Loss 7.2620 LearningRate 0.2577 Epoch: 5 Global Step: 23160 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:24,764-Speed 11628.54 samples/sec Loss 7.2805 LearningRate 0.2576 Epoch: 5 Global Step: 23170 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:28,430-Speed 11175.97 samples/sec Loss 7.2528 LearningRate 0.2576 Epoch: 5 Global Step: 23180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:06:32,161-Speed 10982.84 samples/sec Loss 7.2968 LearningRate 0.2575 Epoch: 5 Global Step: 23190 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:06:35,780-Speed 11321.68 samples/sec Loss 7.3027 LearningRate 0.2574 Epoch: 5 Global Step: 23200 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:06:39,450-Speed 11164.25 samples/sec Loss 7.3119 LearningRate 0.2573 Epoch: 5 Global Step: 23210 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:06:43,000-Speed 11542.37 samples/sec Loss 7.2111 LearningRate 0.2572 Epoch: 5 Global Step: 23220 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:06:46,599-Speed 11385.09 samples/sec Loss 7.2805 LearningRate 0.2571 Epoch: 5 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:06:50,083-Speed 11757.91 samples/sec Loss 7.3292 LearningRate 0.2571 Epoch: 5 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:06:53,567-Speed 11761.36 samples/sec Loss 7.2103 LearningRate 0.2570 Epoch: 5 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:06:57,057-Speed 11749.62 samples/sec Loss 7.2960 LearningRate 0.2569 Epoch: 5 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:07:00,503-Speed 11887.99 samples/sec Loss 7.1972 LearningRate 0.2568 Epoch: 5 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:07:04,017-Speed 11661.68 samples/sec Loss 7.3316 LearningRate 0.2567 Epoch: 5 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:07:07,508-Speed 11736.71 samples/sec Loss 7.3314 LearningRate 0.2566 Epoch: 5 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:07:11,020-Speed 11668.99 samples/sec Loss 7.3345 LearningRate 0.2565 Epoch: 5 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:07:14,703-Speed 11122.07 samples/sec Loss 7.2949 LearningRate 0.2565 Epoch: 5 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:07:18,140-Speed 11924.77 samples/sec Loss 7.2693 LearningRate 0.2564 Epoch: 5 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:07:21,564-Speed 11964.89 samples/sec Loss 7.3050 LearningRate 0.2563 Epoch: 5 Global Step: 23330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:25,034-Speed 11810.35 samples/sec Loss 7.3327 LearningRate 0.2562 Epoch: 5 Global Step: 23340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:28,525-Speed 11736.63 samples/sec Loss 7.3178 LearningRate 0.2561 Epoch: 5 Global Step: 23350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:32,540-Speed 10204.38 samples/sec Loss 7.2771 LearningRate 0.2560 Epoch: 5 Global Step: 23360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:36,147-Speed 11358.69 samples/sec Loss 7.2751 LearningRate 0.2559 Epoch: 5 Global Step: 23370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:39,898-Speed 10924.22 samples/sec Loss 7.2985 LearningRate 0.2559 Epoch: 5 Global Step: 23380 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:43,457-Speed 11513.25 samples/sec Loss 7.2612 LearningRate 0.2558 Epoch: 5 Global Step: 23390 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:47,009-Speed 11532.21 samples/sec Loss 7.2948 LearningRate 0.2557 Epoch: 5 Global Step: 23400 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:50,789-Speed 10841.80 samples/sec Loss 7.3479 LearningRate 0.2556 Epoch: 5 Global Step: 23410 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:54,398-Speed 11350.48 samples/sec Loss 7.2438 LearningRate 0.2555 Epoch: 5 Global Step: 23420 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:07:58,006-Speed 11357.61 samples/sec Loss 7.2887 LearningRate 0.2554 Epoch: 5 Global Step: 23430 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:08:01,844-Speed 10674.76 samples/sec Loss 7.3164 LearningRate 0.2553 Epoch: 5 Global Step: 23440 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:08:05,575-Speed 10981.00 samples/sec Loss 7.3196 LearningRate 0.2553 Epoch: 5 Global Step: 23450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:09,175-Speed 11383.31 samples/sec Loss 7.3131 LearningRate 0.2552 Epoch: 5 Global Step: 23460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:13,037-Speed 10607.90 samples/sec Loss 7.2929 LearningRate 0.2551 Epoch: 5 Global Step: 23470 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:16,548-Speed 11671.14 samples/sec Loss 7.2982 LearningRate 0.2550 Epoch: 5 Global Step: 23480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:20,033-Speed 11755.22 samples/sec Loss 7.3211 LearningRate 0.2549 Epoch: 5 Global Step: 23490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:23,573-Speed 11574.48 samples/sec Loss 7.2639 LearningRate 0.2548 Epoch: 5 Global Step: 23500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:27,091-Speed 11647.26 samples/sec Loss 7.3051 LearningRate 0.2548 Epoch: 5 Global Step: 23510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:30,663-Speed 11472.02 samples/sec Loss 7.2674 LearningRate 0.2547 Epoch: 5 Global Step: 23520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:34,139-Speed 11786.55 samples/sec Loss 7.3100 LearningRate 0.2546 Epoch: 5 Global Step: 23530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:37,856-Speed 11024.89 samples/sec Loss 7.2839 LearningRate 0.2545 Epoch: 5 Global Step: 23540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:41,641-Speed 10824.58 samples/sec Loss 7.3496 LearningRate 0.2544 Epoch: 5 Global Step: 23550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:45,502-Speed 10612.18 samples/sec Loss 7.2704 LearningRate 0.2543 Epoch: 5 Global Step: 23560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:08:49,110-Speed 11354.01 samples/sec Loss 7.3160 LearningRate 0.2542 Epoch: 5 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:08:53,615-Speed 9095.56 samples/sec Loss 7.2449 LearningRate 0.2542 Epoch: 5 Global Step: 23580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:08:57,126-Speed 11670.16 samples/sec Loss 7.2696 LearningRate 0.2541 Epoch: 5 Global Step: 23590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:00,596-Speed 11810.56 samples/sec Loss 7.2949 LearningRate 0.2540 Epoch: 5 Global Step: 23600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:04,317-Speed 11011.76 samples/sec Loss 7.2679 LearningRate 0.2539 Epoch: 5 Global Step: 23610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:07,920-Speed 11373.55 samples/sec Loss 7.3163 LearningRate 0.2538 Epoch: 5 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:11,709-Speed 10811.89 samples/sec Loss 7.2534 LearningRate 0.2537 Epoch: 5 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:15,218-Speed 11679.46 samples/sec Loss 7.3163 LearningRate 0.2536 Epoch: 5 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:18,666-Speed 11882.45 samples/sec Loss 7.2411 LearningRate 0.2536 Epoch: 5 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:22,119-Speed 11866.78 samples/sec Loss 7.2885 LearningRate 0.2535 Epoch: 5 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:09:25,777-Speed 11203.22 samples/sec Loss 7.2440 LearningRate 0.2534 Epoch: 5 Global Step: 23670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:29,573-Speed 10791.12 samples/sec Loss 7.2381 LearningRate 0.2533 Epoch: 5 Global Step: 23680 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:33,204-Speed 11284.81 samples/sec Loss 7.2886 LearningRate 0.2532 Epoch: 5 Global Step: 23690 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:36,606-Speed 12046.78 samples/sec Loss 7.2412 LearningRate 0.2531 Epoch: 5 Global Step: 23700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:39,990-Speed 12105.39 samples/sec Loss 7.2541 LearningRate 0.2531 Epoch: 5 Global Step: 23710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:43,409-Speed 11985.87 samples/sec Loss 7.2107 LearningRate 0.2530 Epoch: 5 Global Step: 23720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:46,797-Speed 12092.12 samples/sec Loss 7.2601 LearningRate 0.2529 Epoch: 5 Global Step: 23730 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:50,503-Speed 11058.01 samples/sec Loss 7.2426 LearningRate 0.2528 Epoch: 5 Global Step: 23740 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:54,621-Speed 9949.46 samples/sec Loss 7.2025 LearningRate 0.2527 Epoch: 5 Global Step: 23750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:09:58,622-Speed 10241.26 samples/sec Loss 7.2598 LearningRate 0.2526 Epoch: 5 Global Step: 23760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:02,527-Speed 10492.29 samples/sec Loss 7.3084 LearningRate 0.2525 Epoch: 5 Global Step: 23770 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:10:06,133-Speed 11362.61 samples/sec Loss 7.2355 LearningRate 0.2525 Epoch: 5 Global Step: 23780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:10,204-Speed 10063.79 samples/sec Loss 7.1710 LearningRate 0.2524 Epoch: 5 Global Step: 23790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:13,733-Speed 11611.01 samples/sec Loss 7.2000 LearningRate 0.2523 Epoch: 5 Global Step: 23800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:17,282-Speed 11544.19 samples/sec Loss 7.2019 LearningRate 0.2522 Epoch: 5 Global Step: 23810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:20,886-Speed 11369.94 samples/sec Loss 7.2407 LearningRate 0.2521 Epoch: 5 Global Step: 23820 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:24,329-Speed 11901.63 samples/sec Loss 7.2430 LearningRate 0.2520 Epoch: 5 Global Step: 23830 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:27,726-Speed 12060.18 samples/sec Loss 7.2384 LearningRate 0.2520 Epoch: 5 Global Step: 23840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:31,289-Speed 11500.45 samples/sec Loss 7.2259 LearningRate 0.2519 Epoch: 5 Global Step: 23850 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:35,089-Speed 10781.60 samples/sec Loss 7.2580 LearningRate 0.2518 Epoch: 5 Global Step: 23860 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:38,692-Speed 11372.41 samples/sec Loss 7.2481 LearningRate 0.2517 Epoch: 5 Global Step: 23870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:42,115-Speed 11970.40 samples/sec Loss 7.2306 LearningRate 0.2516 Epoch: 5 Global Step: 23880 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:10:45,787-Speed 11159.87 samples/sec Loss 7.2817 LearningRate 0.2515 Epoch: 5 Global Step: 23890 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:49,237-Speed 11876.01 samples/sec Loss 7.2358 LearningRate 0.2514 Epoch: 5 Global Step: 23900 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:52,747-Speed 11673.15 samples/sec Loss 7.3490 LearningRate 0.2514 Epoch: 5 Global Step: 23910 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:10:56,248-Speed 11703.79 samples/sec Loss 7.2105 LearningRate 0.2513 Epoch: 5 Global Step: 23920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:11:00,617-Speed 9378.09 samples/sec Loss 7.2642 LearningRate 0.2512 Epoch: 5 Global Step: 23930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:11:04,061-Speed 11898.19 samples/sec Loss 7.2274 LearningRate 0.2511 Epoch: 5 Global Step: 23940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:11:07,440-Speed 12124.69 samples/sec Loss 7.2053 LearningRate 0.2510 Epoch: 5 Global Step: 23950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:11:11,067-Speed 11297.78 samples/sec Loss 7.1977 LearningRate 0.2509 Epoch: 5 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:14,535-Speed 11814.82 samples/sec Loss 7.2561 LearningRate 0.2509 Epoch: 5 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:18,049-Speed 11661.10 samples/sec Loss 7.2485 LearningRate 0.2508 Epoch: 5 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:21,633-Speed 11430.23 samples/sec Loss 7.1791 LearningRate 0.2507 Epoch: 5 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:25,050-Speed 11992.25 samples/sec Loss 7.2430 LearningRate 0.2506 Epoch: 5 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:28,534-Speed 11763.33 samples/sec Loss 7.2360 LearningRate 0.2505 Epoch: 5 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:31,920-Speed 12100.92 samples/sec Loss 7.3313 LearningRate 0.2504 Epoch: 5 Global Step: 24020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:35,405-Speed 11754.55 samples/sec Loss 7.2085 LearningRate 0.2503 Epoch: 5 Global Step: 24030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:38,885-Speed 11775.33 samples/sec Loss 7.2070 LearningRate 0.2503 Epoch: 5 Global Step: 24040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:42,302-Speed 11990.41 samples/sec Loss 7.2498 LearningRate 0.2502 Epoch: 5 Global Step: 24050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:11:45,916-Speed 11340.01 samples/sec Loss 7.2344 LearningRate 0.2501 Epoch: 5 Global Step: 24060 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:11:49,525-Speed 11351.03 samples/sec Loss 7.2049 LearningRate 0.2500 Epoch: 5 Global Step: 24070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:11:53,017-Speed 11734.49 samples/sec Loss 7.2536 LearningRate 0.2499 Epoch: 5 Global Step: 24080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:11:57,035-Speed 10195.96 samples/sec Loss 7.2180 LearningRate 0.2498 Epoch: 5 Global Step: 24090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:12:00,659-Speed 11305.97 samples/sec Loss 7.2040 LearningRate 0.2498 Epoch: 5 Global Step: 24100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:12:04,323-Speed 11185.57 samples/sec Loss 7.1875 LearningRate 0.2497 Epoch: 5 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:07,772-Speed 11880.40 samples/sec Loss 7.1891 LearningRate 0.2496 Epoch: 5 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:11,378-Speed 11361.04 samples/sec Loss 7.2362 LearningRate 0.2495 Epoch: 5 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:15,638-Speed 9618.63 samples/sec Loss 7.2698 LearningRate 0.2494 Epoch: 5 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:19,116-Speed 11782.91 samples/sec Loss 7.2049 LearningRate 0.2493 Epoch: 5 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:22,625-Speed 11676.75 samples/sec Loss 7.2823 LearningRate 0.2493 Epoch: 5 Global Step: 24160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:26,025-Speed 12051.24 samples/sec Loss 7.2646 LearningRate 0.2492 Epoch: 5 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:29,561-Speed 11587.11 samples/sec Loss 7.2080 LearningRate 0.2491 Epoch: 5 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:32,951-Speed 12087.91 samples/sec Loss 7.2012 LearningRate 0.2490 Epoch: 5 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:36,398-Speed 11888.09 samples/sec Loss 7.1684 LearningRate 0.2489 Epoch: 5 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:12:40,687-Speed 9552.45 samples/sec Loss 7.1859 LearningRate 0.2488 Epoch: 5 Global Step: 24210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:12:44,755-Speed 10070.74 samples/sec Loss 7.2536 LearningRate 0.2488 Epoch: 5 Global Step: 24220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:12:48,214-Speed 11846.17 samples/sec Loss 7.1896 LearningRate 0.2487 Epoch: 5 Global Step: 24230 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:12:51,636-Speed 11973.89 samples/sec Loss 7.1440 LearningRate 0.2486 Epoch: 5 Global Step: 24240 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:12:55,008-Speed 12152.34 samples/sec Loss 7.2332 LearningRate 0.2485 Epoch: 5 Global Step: 24250 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:12:58,369-Speed 12191.34 samples/sec Loss 7.2310 LearningRate 0.2484 Epoch: 5 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:01,797-Speed 11952.98 samples/sec Loss 7.1885 LearningRate 0.2483 Epoch: 5 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:05,327-Speed 11605.94 samples/sec Loss 7.2001 LearningRate 0.2482 Epoch: 5 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:08,824-Speed 11716.97 samples/sec Loss 7.2937 LearningRate 0.2482 Epoch: 5 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:12,260-Speed 11925.16 samples/sec Loss 7.1907 LearningRate 0.2481 Epoch: 5 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:15,684-Speed 11968.85 samples/sec Loss 7.2084 LearningRate 0.2480 Epoch: 5 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:19,329-Speed 11239.79 samples/sec Loss 7.1951 LearningRate 0.2479 Epoch: 5 Global Step: 24320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:22,964-Speed 11271.62 samples/sec Loss 7.1739 LearningRate 0.2478 Epoch: 5 Global Step: 24330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:26,382-Speed 11991.05 samples/sec Loss 7.1709 LearningRate 0.2477 Epoch: 5 Global Step: 24340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:29,925-Speed 11561.30 samples/sec Loss 7.1194 LearningRate 0.2477 Epoch: 5 Global Step: 24350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:33,483-Speed 11517.32 samples/sec Loss 7.2022 LearningRate 0.2476 Epoch: 5 Global Step: 24360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:13:37,094-Speed 11348.34 samples/sec Loss 7.2241 LearningRate 0.2475 Epoch: 5 Global Step: 24370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:13:40,637-Speed 11562.45 samples/sec Loss 7.1353 LearningRate 0.2474 Epoch: 5 Global Step: 24380 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:13:44,613-Speed 10306.74 samples/sec Loss 7.2057 LearningRate 0.2473 Epoch: 5 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:48,105-Speed 11733.50 samples/sec Loss 7.2058 LearningRate 0.2472 Epoch: 5 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:51,601-Speed 11721.06 samples/sec Loss 7.1995 LearningRate 0.2472 Epoch: 5 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:55,324-Speed 11004.39 samples/sec Loss 7.1853 LearningRate 0.2471 Epoch: 5 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:13:59,448-Speed 9935.75 samples/sec Loss 7.1804 LearningRate 0.2470 Epoch: 5 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:14:03,016-Speed 11483.96 samples/sec Loss 7.1973 LearningRate 0.2469 Epoch: 5 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:14:06,550-Speed 11592.72 samples/sec Loss 7.2229 LearningRate 0.2468 Epoch: 5 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:14:10,172-Speed 11315.15 samples/sec Loss 7.2248 LearningRate 0.2467 Epoch: 5 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:14:13,699-Speed 11614.91 samples/sec Loss 7.1798 LearningRate 0.2467 Epoch: 5 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:14:17,661-Speed 10341.95 samples/sec Loss 7.1676 LearningRate 0.2466 Epoch: 5 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:14:21,476-Speed 10739.94 samples/sec Loss 7.1769 LearningRate 0.2465 Epoch: 5 Global Step: 24490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:25,409-Speed 10417.93 samples/sec Loss 7.1576 LearningRate 0.2464 Epoch: 5 Global Step: 24500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:28,906-Speed 11714.21 samples/sec Loss 7.1597 LearningRate 0.2463 Epoch: 5 Global Step: 24510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:32,409-Speed 11696.87 samples/sec Loss 7.2058 LearningRate 0.2462 Epoch: 5 Global Step: 24520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:35,954-Speed 11558.40 samples/sec Loss 7.1837 LearningRate 0.2462 Epoch: 5 Global Step: 24530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:39,650-Speed 11085.72 samples/sec Loss 7.1817 LearningRate 0.2461 Epoch: 5 Global Step: 24540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:43,177-Speed 11615.90 samples/sec Loss 7.1904 LearningRate 0.2460 Epoch: 5 Global Step: 24550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:47,287-Speed 9969.28 samples/sec Loss 7.2121 LearningRate 0.2459 Epoch: 5 Global Step: 24560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:50,788-Speed 11705.92 samples/sec Loss 7.2359 LearningRate 0.2458 Epoch: 5 Global Step: 24570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:14:54,324-Speed 11587.04 samples/sec Loss 7.2129 LearningRate 0.2457 Epoch: 5 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:14:57,760-Speed 11927.83 samples/sec Loss 7.1937 LearningRate 0.2457 Epoch: 5 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:01,334-Speed 11466.70 samples/sec Loss 7.1449 LearningRate 0.2456 Epoch: 5 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:04,756-Speed 11972.18 samples/sec Loss 7.1335 LearningRate 0.2455 Epoch: 5 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:08,134-Speed 12128.06 samples/sec Loss 7.1479 LearningRate 0.2454 Epoch: 5 Global Step: 24620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:11,546-Speed 12012.46 samples/sec Loss 7.1601 LearningRate 0.2453 Epoch: 5 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:15,470-Speed 10440.31 samples/sec Loss 7.1801 LearningRate 0.2452 Epoch: 5 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:18,991-Speed 11638.59 samples/sec Loss 7.1986 LearningRate 0.2452 Epoch: 5 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:22,878-Speed 10538.86 samples/sec Loss 7.1207 LearningRate 0.2451 Epoch: 5 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:26,635-Speed 10905.98 samples/sec Loss 7.2509 LearningRate 0.2450 Epoch: 5 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:15:30,073-Speed 11916.90 samples/sec Loss 7.1756 LearningRate 0.2449 Epoch: 5 Global Step: 24680 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:33,563-Speed 11741.27 samples/sec Loss 7.1794 LearningRate 0.2448 Epoch: 5 Global Step: 24690 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:36,968-Speed 12035.11 samples/sec Loss 7.1591 LearningRate 0.2447 Epoch: 5 Global Step: 24700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:41,160-Speed 9773.20 samples/sec Loss 7.1928 LearningRate 0.2447 Epoch: 5 Global Step: 24710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:44,590-Speed 11947.59 samples/sec Loss 7.2063 LearningRate 0.2446 Epoch: 5 Global Step: 24720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:48,127-Speed 11583.78 samples/sec Loss 7.1692 LearningRate 0.2445 Epoch: 5 Global Step: 24730 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:51,794-Speed 11173.03 samples/sec Loss 7.1692 LearningRate 0.2444 Epoch: 5 Global Step: 24740 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:55,369-Speed 11460.08 samples/sec Loss 7.2243 LearningRate 0.2443 Epoch: 5 Global Step: 24750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:15:58,786-Speed 11990.26 samples/sec Loss 7.1263 LearningRate 0.2442 Epoch: 5 Global Step: 24760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:16:02,340-Speed 11529.14 samples/sec Loss 7.1926 LearningRate 0.2442 Epoch: 5 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:06,036-Speed 11086.07 samples/sec Loss 7.0984 LearningRate 0.2441 Epoch: 5 Global Step: 24780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:10,298-Speed 9615.12 samples/sec Loss 7.1625 LearningRate 0.2440 Epoch: 5 Global Step: 24790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:13,822-Speed 11627.04 samples/sec Loss 7.1340 LearningRate 0.2439 Epoch: 5 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:17,530-Speed 11053.70 samples/sec Loss 7.1461 LearningRate 0.2438 Epoch: 5 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:20,991-Speed 11838.49 samples/sec Loss 7.1547 LearningRate 0.2437 Epoch: 5 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:24,447-Speed 11854.90 samples/sec Loss 7.1557 LearningRate 0.2437 Epoch: 5 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:28,056-Speed 11352.71 samples/sec Loss 7.1650 LearningRate 0.2436 Epoch: 5 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:31,953-Speed 10513.96 samples/sec Loss 7.1520 LearningRate 0.2435 Epoch: 5 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:35,465-Speed 11667.99 samples/sec Loss 7.1817 LearningRate 0.2434 Epoch: 5 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:16:39,078-Speed 11339.00 samples/sec Loss 7.1425 LearningRate 0.2433 Epoch: 5 Global Step: 24870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:16:42,666-Speed 11421.61 samples/sec Loss 7.2035 LearningRate 0.2432 Epoch: 5 Global Step: 24880 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:16:46,467-Speed 10781.05 samples/sec Loss 7.1652 LearningRate 0.2432 Epoch: 5 Global Step: 24890 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:16:50,292-Speed 10712.44 samples/sec Loss 7.0818 LearningRate 0.2431 Epoch: 5 Global Step: 24900 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:16:53,732-Speed 11912.02 samples/sec Loss 7.1493 LearningRate 0.2430 Epoch: 5 Global Step: 24910 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:16:57,240-Speed 11679.65 samples/sec Loss 7.1477 LearningRate 0.2429 Epoch: 5 Global Step: 24920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:17:00,680-Speed 11909.80 samples/sec Loss 7.1939 LearningRate 0.2428 Epoch: 5 Global Step: 24930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:17:04,053-Speed 12146.11 samples/sec Loss 7.1322 LearningRate 0.2427 Epoch: 5 Global Step: 24940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:17:07,630-Speed 11455.18 samples/sec Loss 7.0950 LearningRate 0.2427 Epoch: 5 Global Step: 24950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:17:11,209-Speed 11452.77 samples/sec Loss 7.1729 LearningRate 0.2426 Epoch: 5 Global Step: 24960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:17:15,307-Speed 9996.82 samples/sec Loss 7.1891 LearningRate 0.2425 Epoch: 5 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:17:19,005-Speed 11079.91 samples/sec Loss 7.1112 LearningRate 0.2424 Epoch: 5 Global Step: 24980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:17:22,496-Speed 11736.36 samples/sec Loss 7.0814 LearningRate 0.2423 Epoch: 5 Global Step: 24990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:17:25,981-Speed 11757.58 samples/sec Loss 7.1637 LearningRate 0.2422 Epoch: 5 Global Step: 25000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:17:47,210-[lfw][25000]XNorm: 12.137679 Training: 2022-01-17 00:17:47,210-[lfw][25000]Accuracy-Flip: 0.99500+-0.00350 Training: 2022-01-17 00:17:47,211-[lfw][25000]Accuracy-Highest: 0.99533 Training: 2022-01-17 00:18:11,600-[cfp_fp][25000]XNorm: 10.114393 Training: 2022-01-17 00:18:11,600-[cfp_fp][25000]Accuracy-Flip: 0.96457+-0.00966 Training: 2022-01-17 00:18:11,601-[cfp_fp][25000]Accuracy-Highest: 0.96457 Training: 2022-01-17 00:18:32,620-[agedb_30][25000]XNorm: 11.651527 Training: 2022-01-17 00:18:32,620-[agedb_30][25000]Accuracy-Flip: 0.96383+-0.00943 Training: 2022-01-17 00:18:32,621-[agedb_30][25000]Accuracy-Highest: 0.96383 Training: 2022-01-17 00:18:35,989-Speed 585.09 samples/sec Loss 7.0958 LearningRate 0.2422 Epoch: 5 Global Step: 25010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:18:39,323-Speed 12288.59 samples/sec Loss 7.1462 LearningRate 0.2421 Epoch: 5 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:18:43,518-Speed 9766.42 samples/sec Loss 7.1568 LearningRate 0.2420 Epoch: 5 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:19:16,032-Speed 1259.84 samples/sec Loss 6.4903 LearningRate 0.2419 Epoch: 6 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:19:20,778-Speed 8633.85 samples/sec Loss 6.3196 LearningRate 0.2418 Epoch: 6 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:19:24,491-Speed 11036.32 samples/sec Loss 6.3583 LearningRate 0.2417 Epoch: 6 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:19:28,033-Speed 11567.07 samples/sec Loss 6.2783 LearningRate 0.2417 Epoch: 6 Global Step: 25070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:32,201-Speed 9830.07 samples/sec Loss 6.3345 LearningRate 0.2416 Epoch: 6 Global Step: 25080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:35,693-Speed 11734.68 samples/sec Loss 6.3529 LearningRate 0.2415 Epoch: 6 Global Step: 25090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:39,213-Speed 11639.61 samples/sec Loss 6.3844 LearningRate 0.2414 Epoch: 6 Global Step: 25100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:42,767-Speed 11526.93 samples/sec Loss 6.3811 LearningRate 0.2413 Epoch: 6 Global Step: 25110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:46,426-Speed 11198.42 samples/sec Loss 6.4003 LearningRate 0.2412 Epoch: 6 Global Step: 25120 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:51,095-Speed 8775.68 samples/sec Loss 6.4475 LearningRate 0.2412 Epoch: 6 Global Step: 25130 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:54,871-Speed 10849.17 samples/sec Loss 6.4631 LearningRate 0.2411 Epoch: 6 Global Step: 25140 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:19:58,496-Speed 11305.42 samples/sec Loss 6.4465 LearningRate 0.2410 Epoch: 6 Global Step: 25150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:20:02,179-Speed 11123.04 samples/sec Loss 6.4748 LearningRate 0.2409 Epoch: 6 Global Step: 25160 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:20:05,599-Speed 11981.88 samples/sec Loss 6.4723 LearningRate 0.2408 Epoch: 6 Global Step: 25170 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:20:09,457-Speed 10620.60 samples/sec Loss 6.4626 LearningRate 0.2408 Epoch: 6 Global Step: 25180 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:20:13,166-Speed 11061.58 samples/sec Loss 6.4848 LearningRate 0.2407 Epoch: 6 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:16,617-Speed 11872.58 samples/sec Loss 6.4903 LearningRate 0.2406 Epoch: 6 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:20,190-Speed 11469.01 samples/sec Loss 6.4803 LearningRate 0.2405 Epoch: 6 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:23,897-Speed 11050.83 samples/sec Loss 6.5339 LearningRate 0.2404 Epoch: 6 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:27,474-Speed 11454.64 samples/sec Loss 6.4832 LearningRate 0.2403 Epoch: 6 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:31,352-Speed 10565.28 samples/sec Loss 6.5299 LearningRate 0.2403 Epoch: 6 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:35,043-Speed 11102.11 samples/sec Loss 6.5010 LearningRate 0.2402 Epoch: 6 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:38,888-Speed 10655.04 samples/sec Loss 6.4774 LearningRate 0.2401 Epoch: 6 Global Step: 25260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:42,502-Speed 11339.63 samples/sec Loss 6.4951 LearningRate 0.2400 Epoch: 6 Global Step: 25270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:46,162-Speed 11194.52 samples/sec Loss 6.5234 LearningRate 0.2399 Epoch: 6 Global Step: 25280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:20:49,876-Speed 11030.87 samples/sec Loss 6.5271 LearningRate 0.2398 Epoch: 6 Global Step: 25290 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:20:53,694-Speed 10732.05 samples/sec Loss 6.5037 LearningRate 0.2398 Epoch: 6 Global Step: 25300 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:20:57,720-Speed 10177.58 samples/sec Loss 6.6334 LearningRate 0.2397 Epoch: 6 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:01,399-Speed 11135.83 samples/sec Loss 6.5933 LearningRate 0.2396 Epoch: 6 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:05,042-Speed 11248.12 samples/sec Loss 6.5673 LearningRate 0.2395 Epoch: 6 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:08,418-Speed 12134.77 samples/sec Loss 6.6340 LearningRate 0.2394 Epoch: 6 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:12,001-Speed 11436.71 samples/sec Loss 6.6163 LearningRate 0.2393 Epoch: 6 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:15,598-Speed 11390.49 samples/sec Loss 6.5782 LearningRate 0.2393 Epoch: 6 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:19,382-Speed 10830.65 samples/sec Loss 6.6162 LearningRate 0.2392 Epoch: 6 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:22,895-Speed 11663.03 samples/sec Loss 6.5607 LearningRate 0.2391 Epoch: 6 Global Step: 25380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:26,336-Speed 11905.63 samples/sec Loss 6.6160 LearningRate 0.2390 Epoch: 6 Global Step: 25390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:29,824-Speed 11747.76 samples/sec Loss 6.6199 LearningRate 0.2389 Epoch: 6 Global Step: 25400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:21:33,371-Speed 11551.85 samples/sec Loss 6.6243 LearningRate 0.2389 Epoch: 6 Global Step: 25410 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:21:36,954-Speed 11436.05 samples/sec Loss 6.6252 LearningRate 0.2388 Epoch: 6 Global Step: 25420 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:21:40,776-Speed 10719.31 samples/sec Loss 6.6552 LearningRate 0.2387 Epoch: 6 Global Step: 25430 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:21:44,261-Speed 11758.62 samples/sec Loss 6.6243 LearningRate 0.2386 Epoch: 6 Global Step: 25440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:21:47,687-Speed 11961.14 samples/sec Loss 6.6853 LearningRate 0.2385 Epoch: 6 Global Step: 25450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:21:51,554-Speed 10596.26 samples/sec Loss 6.6017 LearningRate 0.2384 Epoch: 6 Global Step: 25460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:21:55,161-Speed 11360.33 samples/sec Loss 6.6550 LearningRate 0.2384 Epoch: 6 Global Step: 25470 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:21:58,765-Speed 11368.44 samples/sec Loss 6.6951 LearningRate 0.2383 Epoch: 6 Global Step: 25480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:02,222-Speed 11853.47 samples/sec Loss 6.6095 LearningRate 0.2382 Epoch: 6 Global Step: 25490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:05,728-Speed 11688.28 samples/sec Loss 6.6625 LearningRate 0.2381 Epoch: 6 Global Step: 25500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:09,429-Speed 11072.17 samples/sec Loss 6.7049 LearningRate 0.2380 Epoch: 6 Global Step: 25510 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:22:13,029-Speed 11381.89 samples/sec Loss 6.6577 LearningRate 0.2379 Epoch: 6 Global Step: 25520 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:22:16,526-Speed 11714.88 samples/sec Loss 6.6685 LearningRate 0.2379 Epoch: 6 Global Step: 25530 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:22:20,027-Speed 11704.95 samples/sec Loss 6.7304 LearningRate 0.2378 Epoch: 6 Global Step: 25540 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:22:23,510-Speed 11763.19 samples/sec Loss 6.6845 LearningRate 0.2377 Epoch: 6 Global Step: 25550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:27,043-Speed 11596.81 samples/sec Loss 6.6485 LearningRate 0.2376 Epoch: 6 Global Step: 25560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:31,063-Speed 10193.09 samples/sec Loss 6.6716 LearningRate 0.2375 Epoch: 6 Global Step: 25570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:34,661-Speed 11386.99 samples/sec Loss 6.7110 LearningRate 0.2375 Epoch: 6 Global Step: 25580 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:38,165-Speed 11694.73 samples/sec Loss 6.6901 LearningRate 0.2374 Epoch: 6 Global Step: 25590 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:41,571-Speed 12029.83 samples/sec Loss 6.6834 LearningRate 0.2373 Epoch: 6 Global Step: 25600 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:45,031-Speed 11845.39 samples/sec Loss 6.7445 LearningRate 0.2372 Epoch: 6 Global Step: 25610 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:48,480-Speed 11876.82 samples/sec Loss 6.7140 LearningRate 0.2371 Epoch: 6 Global Step: 25620 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:51,931-Speed 11874.54 samples/sec Loss 6.7423 LearningRate 0.2370 Epoch: 6 Global Step: 25630 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:55,615-Speed 11121.30 samples/sec Loss 6.7364 LearningRate 0.2370 Epoch: 6 Global Step: 25640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:22:59,038-Speed 11971.64 samples/sec Loss 6.7355 LearningRate 0.2369 Epoch: 6 Global Step: 25650 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:23:02,533-Speed 11723.13 samples/sec Loss 6.6902 LearningRate 0.2368 Epoch: 6 Global Step: 25660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:06,129-Speed 11394.51 samples/sec Loss 6.7456 LearningRate 0.2367 Epoch: 6 Global Step: 25670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:09,855-Speed 10995.16 samples/sec Loss 6.6668 LearningRate 0.2366 Epoch: 6 Global Step: 25680 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:13,232-Speed 12135.43 samples/sec Loss 6.7874 LearningRate 0.2366 Epoch: 6 Global Step: 25690 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:16,655-Speed 11967.18 samples/sec Loss 6.7525 LearningRate 0.2365 Epoch: 6 Global Step: 25700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:20,362-Speed 11054.52 samples/sec Loss 6.7376 LearningRate 0.2364 Epoch: 6 Global Step: 25710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:23,900-Speed 11578.69 samples/sec Loss 6.7772 LearningRate 0.2363 Epoch: 6 Global Step: 25720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:28,254-Speed 9410.56 samples/sec Loss 6.6894 LearningRate 0.2362 Epoch: 6 Global Step: 25730 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:31,657-Speed 12041.16 samples/sec Loss 6.7562 LearningRate 0.2361 Epoch: 6 Global Step: 25740 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:35,060-Speed 12043.17 samples/sec Loss 6.7433 LearningRate 0.2361 Epoch: 6 Global Step: 25750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:38,436-Speed 12135.66 samples/sec Loss 6.7760 LearningRate 0.2360 Epoch: 6 Global Step: 25760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:41,946-Speed 11671.94 samples/sec Loss 6.7526 LearningRate 0.2359 Epoch: 6 Global Step: 25770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:45,351-Speed 12033.29 samples/sec Loss 6.7862 LearningRate 0.2358 Epoch: 6 Global Step: 25780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:49,051-Speed 11075.10 samples/sec Loss 6.7890 LearningRate 0.2357 Epoch: 6 Global Step: 25790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:52,688-Speed 11274.04 samples/sec Loss 6.7810 LearningRate 0.2357 Epoch: 6 Global Step: 25800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:56,273-Speed 11430.76 samples/sec Loss 6.7861 LearningRate 0.2356 Epoch: 6 Global Step: 25810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:23:59,709-Speed 11922.50 samples/sec Loss 6.8061 LearningRate 0.2355 Epoch: 6 Global Step: 25820 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:03,096-Speed 12099.47 samples/sec Loss 6.7800 LearningRate 0.2354 Epoch: 6 Global Step: 25830 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:07,800-Speed 8709.19 samples/sec Loss 6.7632 LearningRate 0.2353 Epoch: 6 Global Step: 25840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:11,336-Speed 11587.05 samples/sec Loss 6.7784 LearningRate 0.2352 Epoch: 6 Global Step: 25850 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:14,746-Speed 12017.45 samples/sec Loss 6.7952 LearningRate 0.2352 Epoch: 6 Global Step: 25860 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:24:18,234-Speed 11746.02 samples/sec Loss 6.7093 LearningRate 0.2351 Epoch: 6 Global Step: 25870 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:24:21,923-Speed 11106.64 samples/sec Loss 6.7798 LearningRate 0.2350 Epoch: 6 Global Step: 25880 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:24:25,791-Speed 10592.43 samples/sec Loss 6.8863 LearningRate 0.2349 Epoch: 6 Global Step: 25890 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:24:29,218-Speed 11956.08 samples/sec Loss 6.7789 LearningRate 0.2348 Epoch: 6 Global Step: 25900 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:32,768-Speed 11543.38 samples/sec Loss 6.8246 LearningRate 0.2348 Epoch: 6 Global Step: 25910 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:36,220-Speed 11867.42 samples/sec Loss 6.8839 LearningRate 0.2347 Epoch: 6 Global Step: 25920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:39,654-Speed 11931.14 samples/sec Loss 6.8437 LearningRate 0.2346 Epoch: 6 Global Step: 25930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:43,486-Speed 10691.46 samples/sec Loss 6.8909 LearningRate 0.2345 Epoch: 6 Global Step: 25940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:47,135-Speed 11230.45 samples/sec Loss 6.8124 LearningRate 0.2344 Epoch: 6 Global Step: 25950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:50,576-Speed 11910.92 samples/sec Loss 6.8116 LearningRate 0.2343 Epoch: 6 Global Step: 25960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:54,412-Speed 10680.54 samples/sec Loss 6.8304 LearningRate 0.2343 Epoch: 6 Global Step: 25970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:24:57,903-Speed 11735.62 samples/sec Loss 6.8175 LearningRate 0.2342 Epoch: 6 Global Step: 25980 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:01,378-Speed 11791.12 samples/sec Loss 6.8632 LearningRate 0.2341 Epoch: 6 Global Step: 25990 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:04,880-Speed 11699.93 samples/sec Loss 6.8870 LearningRate 0.2340 Epoch: 6 Global Step: 26000 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:25:08,663-Speed 10832.28 samples/sec Loss 6.7747 LearningRate 0.2339 Epoch: 6 Global Step: 26010 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:12,115-Speed 11868.34 samples/sec Loss 6.7855 LearningRate 0.2339 Epoch: 6 Global Step: 26020 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:15,896-Speed 10836.02 samples/sec Loss 6.8768 LearningRate 0.2338 Epoch: 6 Global Step: 26030 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:19,646-Speed 10926.27 samples/sec Loss 6.8071 LearningRate 0.2337 Epoch: 6 Global Step: 26040 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:24,006-Speed 9398.73 samples/sec Loss 6.8908 LearningRate 0.2336 Epoch: 6 Global Step: 26050 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:27,388-Speed 12112.46 samples/sec Loss 6.9098 LearningRate 0.2335 Epoch: 6 Global Step: 26060 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:30,746-Speed 12202.50 samples/sec Loss 6.8173 LearningRate 0.2335 Epoch: 6 Global Step: 26070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:34,170-Speed 11968.06 samples/sec Loss 6.8619 LearningRate 0.2334 Epoch: 6 Global Step: 26080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:37,572-Speed 12044.34 samples/sec Loss 6.8964 LearningRate 0.2333 Epoch: 6 Global Step: 26090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:41,178-Speed 11362.43 samples/sec Loss 6.8790 LearningRate 0.2332 Epoch: 6 Global Step: 26100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:25:44,944-Speed 10880.15 samples/sec Loss 6.8879 LearningRate 0.2331 Epoch: 6 Global Step: 26110 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:25:48,352-Speed 12024.38 samples/sec Loss 6.8561 LearningRate 0.2330 Epoch: 6 Global Step: 26120 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:25:51,838-Speed 11754.75 samples/sec Loss 6.8242 LearningRate 0.2330 Epoch: 6 Global Step: 26130 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:25:55,363-Speed 11622.45 samples/sec Loss 6.7976 LearningRate 0.2329 Epoch: 6 Global Step: 26140 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:25:58,876-Speed 11664.06 samples/sec Loss 6.8173 LearningRate 0.2328 Epoch: 6 Global Step: 26150 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:26:02,676-Speed 10783.35 samples/sec Loss 6.8204 LearningRate 0.2327 Epoch: 6 Global Step: 26160 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:06,100-Speed 11965.59 samples/sec Loss 6.8247 LearningRate 0.2326 Epoch: 6 Global Step: 26170 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:09,651-Speed 11540.21 samples/sec Loss 6.8806 LearningRate 0.2326 Epoch: 6 Global Step: 26180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:13,277-Speed 11298.02 samples/sec Loss 6.8560 LearningRate 0.2325 Epoch: 6 Global Step: 26190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:16,717-Speed 11911.02 samples/sec Loss 6.8251 LearningRate 0.2324 Epoch: 6 Global Step: 26200 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:20,195-Speed 11780.15 samples/sec Loss 6.8277 LearningRate 0.2323 Epoch: 6 Global Step: 26210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:24,151-Speed 10357.65 samples/sec Loss 6.8358 LearningRate 0.2322 Epoch: 6 Global Step: 26220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:27,787-Speed 11268.35 samples/sec Loss 6.8734 LearningRate 0.2322 Epoch: 6 Global Step: 26230 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:31,956-Speed 9827.47 samples/sec Loss 6.8383 LearningRate 0.2321 Epoch: 6 Global Step: 26240 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:35,409-Speed 11866.21 samples/sec Loss 6.8563 LearningRate 0.2320 Epoch: 6 Global Step: 26250 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:38,974-Speed 11493.83 samples/sec Loss 6.8621 LearningRate 0.2319 Epoch: 6 Global Step: 26260 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:26:42,632-Speed 11200.15 samples/sec Loss 6.8895 LearningRate 0.2318 Epoch: 6 Global Step: 26270 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:26:46,186-Speed 11529.61 samples/sec Loss 6.9227 LearningRate 0.2317 Epoch: 6 Global Step: 26280 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:49,734-Speed 11547.91 samples/sec Loss 6.8797 LearningRate 0.2317 Epoch: 6 Global Step: 26290 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:53,198-Speed 11830.95 samples/sec Loss 6.8679 LearningRate 0.2316 Epoch: 6 Global Step: 26300 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:26:56,727-Speed 11608.06 samples/sec Loss 6.8860 LearningRate 0.2315 Epoch: 6 Global Step: 26310 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:27:00,199-Speed 11802.42 samples/sec Loss 6.9245 LearningRate 0.2314 Epoch: 6 Global Step: 26320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:27:03,879-Speed 11133.35 samples/sec Loss 6.8809 LearningRate 0.2313 Epoch: 6 Global Step: 26330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:27:07,399-Speed 11643.07 samples/sec Loss 6.8471 LearningRate 0.2313 Epoch: 6 Global Step: 26340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:27:10,894-Speed 11720.42 samples/sec Loss 6.9103 LearningRate 0.2312 Epoch: 6 Global Step: 26350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:27:14,576-Speed 11131.89 samples/sec Loss 6.8977 LearningRate 0.2311 Epoch: 6 Global Step: 26360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:27:18,454-Speed 10564.19 samples/sec Loss 6.8962 LearningRate 0.2310 Epoch: 6 Global Step: 26370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:27:22,028-Speed 11464.68 samples/sec Loss 6.9116 LearningRate 0.2309 Epoch: 6 Global Step: 26380 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:27:26,578-Speed 9004.43 samples/sec Loss 6.8241 LearningRate 0.2309 Epoch: 6 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:30,436-Speed 10618.34 samples/sec Loss 6.8245 LearningRate 0.2308 Epoch: 6 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:33,935-Speed 11712.31 samples/sec Loss 6.8437 LearningRate 0.2307 Epoch: 6 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:37,384-Speed 11879.39 samples/sec Loss 6.8859 LearningRate 0.2306 Epoch: 6 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:41,014-Speed 11289.10 samples/sec Loss 6.8514 LearningRate 0.2305 Epoch: 6 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:45,115-Speed 9989.79 samples/sec Loss 6.8959 LearningRate 0.2304 Epoch: 6 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:48,521-Speed 12031.15 samples/sec Loss 6.8909 LearningRate 0.2304 Epoch: 6 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:52,067-Speed 11557.49 samples/sec Loss 6.8949 LearningRate 0.2303 Epoch: 6 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:55,510-Speed 11912.35 samples/sec Loss 6.8826 LearningRate 0.2302 Epoch: 6 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:27:59,266-Speed 10908.90 samples/sec Loss 6.8992 LearningRate 0.2301 Epoch: 6 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:28:02,764-Speed 11716.22 samples/sec Loss 6.9239 LearningRate 0.2300 Epoch: 6 Global Step: 26490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:06,392-Speed 11292.36 samples/sec Loss 6.8924 LearningRate 0.2300 Epoch: 6 Global Step: 26500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:10,168-Speed 10853.55 samples/sec Loss 6.8522 LearningRate 0.2299 Epoch: 6 Global Step: 26510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:13,849-Speed 11130.12 samples/sec Loss 6.8493 LearningRate 0.2298 Epoch: 6 Global Step: 26520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:17,369-Speed 11640.82 samples/sec Loss 6.8768 LearningRate 0.2297 Epoch: 6 Global Step: 26530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:20,876-Speed 11682.17 samples/sec Loss 6.8920 LearningRate 0.2296 Epoch: 6 Global Step: 26540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:24,714-Speed 10674.91 samples/sec Loss 6.9660 LearningRate 0.2296 Epoch: 6 Global Step: 26550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:28,416-Speed 11069.23 samples/sec Loss 6.8761 LearningRate 0.2295 Epoch: 6 Global Step: 26560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:32,098-Speed 11128.03 samples/sec Loss 6.9606 LearningRate 0.2294 Epoch: 6 Global Step: 26570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:35,992-Speed 10521.99 samples/sec Loss 6.8925 LearningRate 0.2293 Epoch: 6 Global Step: 26580 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:28:40,467-Speed 9162.32 samples/sec Loss 6.9054 LearningRate 0.2292 Epoch: 6 Global Step: 26590 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:28:43,937-Speed 11808.85 samples/sec Loss 6.8757 LearningRate 0.2292 Epoch: 6 Global Step: 26600 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:28:47,620-Speed 11127.39 samples/sec Loss 6.9261 LearningRate 0.2291 Epoch: 6 Global Step: 26610 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:28:51,157-Speed 11582.78 samples/sec Loss 6.8944 LearningRate 0.2290 Epoch: 6 Global Step: 26620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:28:54,980-Speed 10716.52 samples/sec Loss 6.9244 LearningRate 0.2289 Epoch: 6 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:28:58,670-Speed 11105.85 samples/sec Loss 6.8652 LearningRate 0.2288 Epoch: 6 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:02,206-Speed 11588.22 samples/sec Loss 6.8781 LearningRate 0.2288 Epoch: 6 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:05,681-Speed 11790.76 samples/sec Loss 6.9183 LearningRate 0.2287 Epoch: 6 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:09,100-Speed 11984.46 samples/sec Loss 6.8895 LearningRate 0.2286 Epoch: 6 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:12,554-Speed 11861.19 samples/sec Loss 6.8836 LearningRate 0.2285 Epoch: 6 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:15,988-Speed 11931.05 samples/sec Loss 6.8996 LearningRate 0.2284 Epoch: 6 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:19,479-Speed 11736.94 samples/sec Loss 6.9303 LearningRate 0.2284 Epoch: 6 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:23,017-Speed 11581.80 samples/sec Loss 6.8658 LearningRate 0.2283 Epoch: 6 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:29:27,354-Speed 9447.77 samples/sec Loss 6.8519 LearningRate 0.2282 Epoch: 6 Global Step: 26720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:30,963-Speed 11353.29 samples/sec Loss 6.8796 LearningRate 0.2281 Epoch: 6 Global Step: 26730 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:34,449-Speed 11752.53 samples/sec Loss 6.8551 LearningRate 0.2280 Epoch: 6 Global Step: 26740 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:37,937-Speed 11745.65 samples/sec Loss 6.8960 LearningRate 0.2279 Epoch: 6 Global Step: 26750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:42,013-Speed 10053.45 samples/sec Loss 6.9275 LearningRate 0.2279 Epoch: 6 Global Step: 26760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:45,876-Speed 10607.17 samples/sec Loss 6.8921 LearningRate 0.2278 Epoch: 6 Global Step: 26770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:49,778-Speed 10499.99 samples/sec Loss 6.9162 LearningRate 0.2277 Epoch: 6 Global Step: 26780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:53,235-Speed 11851.26 samples/sec Loss 6.9098 LearningRate 0.2276 Epoch: 6 Global Step: 26790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:29:56,661-Speed 11958.20 samples/sec Loss 6.9153 LearningRate 0.2275 Epoch: 6 Global Step: 26800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:00,098-Speed 11921.64 samples/sec Loss 6.9269 LearningRate 0.2275 Epoch: 6 Global Step: 26810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:03,530-Speed 11938.68 samples/sec Loss 6.8732 LearningRate 0.2274 Epoch: 6 Global Step: 26820 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:30:07,036-Speed 11687.24 samples/sec Loss 6.9685 LearningRate 0.2273 Epoch: 6 Global Step: 26830 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:30:10,619-Speed 11434.22 samples/sec Loss 6.9279 LearningRate 0.2272 Epoch: 6 Global Step: 26840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:14,103-Speed 11761.21 samples/sec Loss 6.9076 LearningRate 0.2271 Epoch: 6 Global Step: 26850 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:17,671-Speed 11484.06 samples/sec Loss 6.8778 LearningRate 0.2271 Epoch: 6 Global Step: 26860 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:21,231-Speed 11510.54 samples/sec Loss 6.9226 LearningRate 0.2270 Epoch: 6 Global Step: 26870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:25,839-Speed 8891.50 samples/sec Loss 6.8644 LearningRate 0.2269 Epoch: 6 Global Step: 26880 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:29,413-Speed 11462.74 samples/sec Loss 6.9385 LearningRate 0.2268 Epoch: 6 Global Step: 26890 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:32,827-Speed 12002.58 samples/sec Loss 6.9154 LearningRate 0.2267 Epoch: 6 Global Step: 26900 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:36,315-Speed 11746.37 samples/sec Loss 6.8728 LearningRate 0.2267 Epoch: 6 Global Step: 26910 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:39,765-Speed 11876.07 samples/sec Loss 6.8624 LearningRate 0.2266 Epoch: 6 Global Step: 26920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:44,140-Speed 9364.76 samples/sec Loss 6.9047 LearningRate 0.2265 Epoch: 6 Global Step: 26930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:47,672-Speed 11602.05 samples/sec Loss 6.8179 LearningRate 0.2264 Epoch: 6 Global Step: 26940 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:30:51,128-Speed 11853.59 samples/sec Loss 6.9495 LearningRate 0.2263 Epoch: 6 Global Step: 26950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:54,503-Speed 12141.95 samples/sec Loss 6.8582 LearningRate 0.2263 Epoch: 6 Global Step: 26960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:30:57,967-Speed 11826.47 samples/sec Loss 6.9331 LearningRate 0.2262 Epoch: 6 Global Step: 26970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:01,598-Speed 11286.05 samples/sec Loss 6.8807 LearningRate 0.2261 Epoch: 6 Global Step: 26980 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:05,281-Speed 11126.19 samples/sec Loss 6.8550 LearningRate 0.2260 Epoch: 6 Global Step: 26990 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:09,084-Speed 10772.09 samples/sec Loss 6.8476 LearningRate 0.2259 Epoch: 6 Global Step: 27000 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:12,855-Speed 10867.09 samples/sec Loss 6.8759 LearningRate 0.2259 Epoch: 6 Global Step: 27010 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:16,274-Speed 11982.34 samples/sec Loss 6.9343 LearningRate 0.2258 Epoch: 6 Global Step: 27020 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:19,817-Speed 11565.23 samples/sec Loss 6.9292 LearningRate 0.2257 Epoch: 6 Global Step: 27030 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:23,286-Speed 11811.79 samples/sec Loss 6.8919 LearningRate 0.2256 Epoch: 6 Global Step: 27040 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:27,765-Speed 9146.08 samples/sec Loss 6.8839 LearningRate 0.2255 Epoch: 6 Global Step: 27050 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:31:31,431-Speed 11177.88 samples/sec Loss 6.9138 LearningRate 0.2255 Epoch: 6 Global Step: 27060 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:34,943-Speed 11668.27 samples/sec Loss 6.8612 LearningRate 0.2254 Epoch: 6 Global Step: 27070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:38,558-Speed 11331.48 samples/sec Loss 6.8735 LearningRate 0.2253 Epoch: 6 Global Step: 27080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:42,182-Speed 11307.71 samples/sec Loss 6.9305 LearningRate 0.2252 Epoch: 6 Global Step: 27090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:45,800-Speed 11324.09 samples/sec Loss 6.9209 LearningRate 0.2251 Epoch: 6 Global Step: 27100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:49,353-Speed 11532.40 samples/sec Loss 6.9626 LearningRate 0.2251 Epoch: 6 Global Step: 27110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:52,776-Speed 11970.26 samples/sec Loss 6.8640 LearningRate 0.2250 Epoch: 6 Global Step: 27120 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:31:56,473-Speed 11081.13 samples/sec Loss 6.8826 LearningRate 0.2249 Epoch: 6 Global Step: 27130 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:00,021-Speed 11550.53 samples/sec Loss 6.9094 LearningRate 0.2248 Epoch: 6 Global Step: 27140 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:03,503-Speed 11764.39 samples/sec Loss 6.8642 LearningRate 0.2247 Epoch: 6 Global Step: 27150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:06,969-Speed 11821.92 samples/sec Loss 6.8363 LearningRate 0.2247 Epoch: 6 Global Step: 27160 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:32:10,542-Speed 11468.27 samples/sec Loss 6.8350 LearningRate 0.2246 Epoch: 6 Global Step: 27170 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:32:14,284-Speed 10950.73 samples/sec Loss 6.8108 LearningRate 0.2245 Epoch: 6 Global Step: 27180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:17,715-Speed 11939.22 samples/sec Loss 6.8480 LearningRate 0.2244 Epoch: 6 Global Step: 27190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:21,133-Speed 11988.98 samples/sec Loss 6.8869 LearningRate 0.2243 Epoch: 6 Global Step: 27200 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:24,512-Speed 12125.11 samples/sec Loss 6.8269 LearningRate 0.2243 Epoch: 6 Global Step: 27210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:27,900-Speed 12096.50 samples/sec Loss 6.8813 LearningRate 0.2242 Epoch: 6 Global Step: 27220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:31,310-Speed 12016.50 samples/sec Loss 6.9004 LearningRate 0.2241 Epoch: 6 Global Step: 27230 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:34,945-Speed 11271.82 samples/sec Loss 6.8784 LearningRate 0.2240 Epoch: 6 Global Step: 27240 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:38,425-Speed 11773.53 samples/sec Loss 6.8972 LearningRate 0.2239 Epoch: 6 Global Step: 27250 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:41,899-Speed 11792.81 samples/sec Loss 6.9443 LearningRate 0.2239 Epoch: 6 Global Step: 27260 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:45,988-Speed 10020.36 samples/sec Loss 6.8558 LearningRate 0.2238 Epoch: 6 Global Step: 27270 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:32:49,514-Speed 11621.85 samples/sec Loss 6.9123 LearningRate 0.2237 Epoch: 6 Global Step: 27280 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:32:52,975-Speed 11838.93 samples/sec Loss 6.8590 LearningRate 0.2236 Epoch: 6 Global Step: 27290 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:32:56,611-Speed 11268.91 samples/sec Loss 6.8707 LearningRate 0.2235 Epoch: 6 Global Step: 27300 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:33:00,344-Speed 10976.18 samples/sec Loss 6.9547 LearningRate 0.2235 Epoch: 6 Global Step: 27310 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:33:04,010-Speed 11177.53 samples/sec Loss 6.9290 LearningRate 0.2234 Epoch: 6 Global Step: 27320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:33:07,590-Speed 11445.64 samples/sec Loss 6.9240 LearningRate 0.2233 Epoch: 6 Global Step: 27330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:33:11,270-Speed 11133.39 samples/sec Loss 6.9330 LearningRate 0.2232 Epoch: 6 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:14,787-Speed 11650.03 samples/sec Loss 6.8697 LearningRate 0.2232 Epoch: 6 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:18,266-Speed 11778.12 samples/sec Loss 6.8990 LearningRate 0.2231 Epoch: 6 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:21,960-Speed 11092.03 samples/sec Loss 6.8786 LearningRate 0.2230 Epoch: 6 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:25,395-Speed 11927.69 samples/sec Loss 6.8699 LearningRate 0.2229 Epoch: 6 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:28,968-Speed 11466.32 samples/sec Loss 6.8207 LearningRate 0.2228 Epoch: 6 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:32,495-Speed 11616.46 samples/sec Loss 6.8792 LearningRate 0.2228 Epoch: 6 Global Step: 27400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:36,009-Speed 11660.96 samples/sec Loss 6.8890 LearningRate 0.2227 Epoch: 6 Global Step: 27410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:39,456-Speed 11887.62 samples/sec Loss 6.8867 LearningRate 0.2226 Epoch: 6 Global Step: 27420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:43,117-Speed 11193.53 samples/sec Loss 6.8767 LearningRate 0.2225 Epoch: 6 Global Step: 27430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:47,357-Speed 9663.35 samples/sec Loss 6.8598 LearningRate 0.2224 Epoch: 6 Global Step: 27440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:33:51,430-Speed 10059.90 samples/sec Loss 6.8682 LearningRate 0.2224 Epoch: 6 Global Step: 27450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:33:54,870-Speed 11909.25 samples/sec Loss 6.8295 LearningRate 0.2223 Epoch: 6 Global Step: 27460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:33:58,296-Speed 11960.43 samples/sec Loss 6.8582 LearningRate 0.2222 Epoch: 6 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:02,206-Speed 10478.54 samples/sec Loss 6.8708 LearningRate 0.2221 Epoch: 6 Global Step: 27480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:05,948-Speed 10951.31 samples/sec Loss 6.8751 LearningRate 0.2220 Epoch: 6 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:09,778-Speed 10697.66 samples/sec Loss 6.9206 LearningRate 0.2220 Epoch: 6 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:13,297-Speed 11642.58 samples/sec Loss 6.9154 LearningRate 0.2219 Epoch: 6 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:16,842-Speed 11557.99 samples/sec Loss 6.9022 LearningRate 0.2218 Epoch: 6 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:20,261-Speed 11984.28 samples/sec Loss 6.9144 LearningRate 0.2217 Epoch: 6 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:23,905-Speed 11243.12 samples/sec Loss 6.8103 LearningRate 0.2216 Epoch: 6 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:27,459-Speed 11528.06 samples/sec Loss 6.8732 LearningRate 0.2216 Epoch: 6 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:31,373-Speed 10468.31 samples/sec Loss 6.8811 LearningRate 0.2215 Epoch: 6 Global Step: 27560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:34:34,849-Speed 11789.51 samples/sec Loss 6.8627 LearningRate 0.2214 Epoch: 6 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:38,258-Speed 12021.02 samples/sec Loss 6.8929 LearningRate 0.2213 Epoch: 6 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:41,677-Speed 11983.81 samples/sec Loss 6.8607 LearningRate 0.2212 Epoch: 6 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:45,128-Speed 11914.12 samples/sec Loss 6.9064 LearningRate 0.2212 Epoch: 6 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:48,581-Speed 11867.26 samples/sec Loss 6.9276 LearningRate 0.2211 Epoch: 6 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:53,286-Speed 8706.57 samples/sec Loss 6.7892 LearningRate 0.2210 Epoch: 6 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:34:56,739-Speed 11867.06 samples/sec Loss 6.8718 LearningRate 0.2209 Epoch: 6 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:00,448-Speed 11046.51 samples/sec Loss 6.8897 LearningRate 0.2208 Epoch: 6 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:04,100-Speed 11221.88 samples/sec Loss 6.8484 LearningRate 0.2208 Epoch: 6 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:07,602-Speed 11700.16 samples/sec Loss 6.8860 LearningRate 0.2207 Epoch: 6 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:11,068-Speed 11826.39 samples/sec Loss 6.8279 LearningRate 0.2206 Epoch: 6 Global Step: 27670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:35:14,472-Speed 12038.23 samples/sec Loss 6.7846 LearningRate 0.2205 Epoch: 6 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:17,946-Speed 11798.17 samples/sec Loss 6.8842 LearningRate 0.2205 Epoch: 6 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:21,445-Speed 11708.02 samples/sec Loss 6.8861 LearningRate 0.2204 Epoch: 6 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:25,055-Speed 11351.53 samples/sec Loss 6.8436 LearningRate 0.2203 Epoch: 6 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:28,500-Speed 11893.08 samples/sec Loss 6.8842 LearningRate 0.2202 Epoch: 6 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:31,996-Speed 11719.38 samples/sec Loss 6.9138 LearningRate 0.2201 Epoch: 6 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:35,481-Speed 11758.64 samples/sec Loss 6.8173 LearningRate 0.2201 Epoch: 6 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:38,920-Speed 11917.09 samples/sec Loss 6.8334 LearningRate 0.2200 Epoch: 6 Global Step: 27750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:42,657-Speed 10963.64 samples/sec Loss 6.9088 LearningRate 0.2199 Epoch: 6 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:46,377-Speed 11014.19 samples/sec Loss 6.8856 LearningRate 0.2198 Epoch: 6 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:35:49,932-Speed 11524.74 samples/sec Loss 6.8521 LearningRate 0.2197 Epoch: 6 Global Step: 27780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:35:53,425-Speed 11729.75 samples/sec Loss 6.8273 LearningRate 0.2197 Epoch: 6 Global Step: 27790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:35:57,304-Speed 10565.02 samples/sec Loss 6.8373 LearningRate 0.2196 Epoch: 6 Global Step: 27800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:01,135-Speed 10693.15 samples/sec Loss 6.8632 LearningRate 0.2195 Epoch: 6 Global Step: 27810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:04,574-Speed 11914.70 samples/sec Loss 6.7997 LearningRate 0.2194 Epoch: 6 Global Step: 27820 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:07,997-Speed 11968.81 samples/sec Loss 6.9154 LearningRate 0.2193 Epoch: 6 Global Step: 27830 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:12,114-Speed 9955.16 samples/sec Loss 6.8717 LearningRate 0.2193 Epoch: 6 Global Step: 27840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:15,710-Speed 11394.50 samples/sec Loss 6.8407 LearningRate 0.2192 Epoch: 6 Global Step: 27850 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:19,243-Speed 11595.48 samples/sec Loss 6.8519 LearningRate 0.2191 Epoch: 6 Global Step: 27860 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:22,660-Speed 11993.00 samples/sec Loss 6.8719 LearningRate 0.2190 Epoch: 6 Global Step: 27870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:26,471-Speed 10749.36 samples/sec Loss 6.8901 LearningRate 0.2190 Epoch: 6 Global Step: 27880 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:36:29,994-Speed 11631.86 samples/sec Loss 6.7861 LearningRate 0.2189 Epoch: 6 Global Step: 27890 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:36:33,485-Speed 11738.21 samples/sec Loss 6.8539 LearningRate 0.2188 Epoch: 6 Global Step: 27900 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:36:37,164-Speed 11142.63 samples/sec Loss 6.8271 LearningRate 0.2187 Epoch: 6 Global Step: 27910 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:36:40,671-Speed 11684.19 samples/sec Loss 6.7634 LearningRate 0.2186 Epoch: 6 Global Step: 27920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:44,400-Speed 10986.39 samples/sec Loss 6.8227 LearningRate 0.2186 Epoch: 6 Global Step: 27930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:36:47,924-Speed 11628.20 samples/sec Loss 6.7915 LearningRate 0.2185 Epoch: 6 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:36:51,338-Speed 12002.40 samples/sec Loss 6.8204 LearningRate 0.2184 Epoch: 6 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:36:54,757-Speed 11987.18 samples/sec Loss 6.8122 LearningRate 0.2183 Epoch: 6 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:36:58,220-Speed 11831.48 samples/sec Loss 6.8590 LearningRate 0.2182 Epoch: 6 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:01,893-Speed 11154.43 samples/sec Loss 6.8513 LearningRate 0.2182 Epoch: 6 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:05,605-Speed 11037.94 samples/sec Loss 6.8856 LearningRate 0.2181 Epoch: 6 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:09,297-Speed 11097.63 samples/sec Loss 6.8647 LearningRate 0.2180 Epoch: 6 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:12,998-Speed 11072.40 samples/sec Loss 6.8922 LearningRate 0.2179 Epoch: 6 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:17,425-Speed 9255.14 samples/sec Loss 6.8791 LearningRate 0.2179 Epoch: 6 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:20,908-Speed 11761.65 samples/sec Loss 6.7993 LearningRate 0.2178 Epoch: 6 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:24,662-Speed 10915.44 samples/sec Loss 6.8686 LearningRate 0.2177 Epoch: 6 Global Step: 28040 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:37:28,393-Speed 10983.18 samples/sec Loss 6.7835 LearningRate 0.2176 Epoch: 6 Global Step: 28050 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:37:31,861-Speed 11812.25 samples/sec Loss 6.8564 LearningRate 0.2175 Epoch: 6 Global Step: 28060 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:37:35,264-Speed 12044.96 samples/sec Loss 6.8757 LearningRate 0.2175 Epoch: 6 Global Step: 28070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:37:38,721-Speed 11850.47 samples/sec Loss 6.8049 LearningRate 0.2174 Epoch: 6 Global Step: 28080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:42,114-Speed 12076.88 samples/sec Loss 6.7916 LearningRate 0.2173 Epoch: 6 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:45,511-Speed 12061.25 samples/sec Loss 6.8229 LearningRate 0.2172 Epoch: 6 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:48,913-Speed 12045.10 samples/sec Loss 6.8335 LearningRate 0.2171 Epoch: 6 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:52,379-Speed 11821.49 samples/sec Loss 6.9122 LearningRate 0.2171 Epoch: 6 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:55,813-Speed 11934.28 samples/sec Loss 6.8161 LearningRate 0.2170 Epoch: 6 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:37:59,218-Speed 12032.67 samples/sec Loss 6.8038 LearningRate 0.2169 Epoch: 6 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:38:02,690-Speed 11801.97 samples/sec Loss 6.7888 LearningRate 0.2168 Epoch: 6 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:38:06,583-Speed 10523.85 samples/sec Loss 6.8185 LearningRate 0.2168 Epoch: 6 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:38:10,012-Speed 11952.02 samples/sec Loss 6.8170 LearningRate 0.2167 Epoch: 6 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:38:14,284-Speed 9588.82 samples/sec Loss 6.8682 LearningRate 0.2166 Epoch: 6 Global Step: 28180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:17,857-Speed 11466.74 samples/sec Loss 6.8552 LearningRate 0.2165 Epoch: 6 Global Step: 28190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:21,305-Speed 11882.80 samples/sec Loss 6.8877 LearningRate 0.2164 Epoch: 6 Global Step: 28200 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:24,898-Speed 11406.85 samples/sec Loss 6.8217 LearningRate 0.2164 Epoch: 6 Global Step: 28210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:28,308-Speed 12016.19 samples/sec Loss 6.8026 LearningRate 0.2163 Epoch: 6 Global Step: 28220 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:31,902-Speed 11398.56 samples/sec Loss 6.8488 LearningRate 0.2162 Epoch: 6 Global Step: 28230 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:35,474-Speed 11470.94 samples/sec Loss 6.8009 LearningRate 0.2161 Epoch: 6 Global Step: 28240 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:39,199-Speed 11000.68 samples/sec Loss 6.7993 LearningRate 0.2160 Epoch: 6 Global Step: 28250 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:42,664-Speed 11822.80 samples/sec Loss 6.8773 LearningRate 0.2160 Epoch: 6 Global Step: 28260 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:46,054-Speed 12088.89 samples/sec Loss 6.8972 LearningRate 0.2159 Epoch: 6 Global Step: 28270 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:49,563-Speed 11674.94 samples/sec Loss 6.8571 LearningRate 0.2158 Epoch: 6 Global Step: 28280 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:38:53,980-Speed 9276.15 samples/sec Loss 6.8367 LearningRate 0.2157 Epoch: 6 Global Step: 28290 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:38:57,962-Speed 10287.98 samples/sec Loss 6.8253 LearningRate 0.2157 Epoch: 6 Global Step: 28300 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:01,614-Speed 11219.22 samples/sec Loss 6.8968 LearningRate 0.2156 Epoch: 6 Global Step: 28310 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:05,599-Speed 10283.58 samples/sec Loss 6.8852 LearningRate 0.2155 Epoch: 6 Global Step: 28320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:09,199-Speed 11381.94 samples/sec Loss 6.8289 LearningRate 0.2154 Epoch: 6 Global Step: 28330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:12,793-Speed 11400.41 samples/sec Loss 6.8496 LearningRate 0.2153 Epoch: 6 Global Step: 28340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:16,172-Speed 12124.65 samples/sec Loss 6.7883 LearningRate 0.2153 Epoch: 6 Global Step: 28350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:19,697-Speed 11622.42 samples/sec Loss 6.8221 LearningRate 0.2152 Epoch: 6 Global Step: 28360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:23,213-Speed 11655.73 samples/sec Loss 6.7722 LearningRate 0.2151 Epoch: 6 Global Step: 28370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:26,795-Speed 11438.59 samples/sec Loss 6.7772 LearningRate 0.2150 Epoch: 6 Global Step: 28380 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:30,253-Speed 11848.81 samples/sec Loss 6.8126 LearningRate 0.2150 Epoch: 6 Global Step: 28390 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:39:34,162-Speed 10479.38 samples/sec Loss 6.8237 LearningRate 0.2149 Epoch: 6 Global Step: 28400 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:39:37,913-Speed 10923.24 samples/sec Loss 6.8022 LearningRate 0.2148 Epoch: 6 Global Step: 28410 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:39:41,566-Speed 11216.51 samples/sec Loss 6.7971 LearningRate 0.2147 Epoch: 6 Global Step: 28420 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:45,122-Speed 11523.57 samples/sec Loss 6.8161 LearningRate 0.2146 Epoch: 6 Global Step: 28430 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:48,600-Speed 11782.52 samples/sec Loss 6.8251 LearningRate 0.2146 Epoch: 6 Global Step: 28440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:52,244-Speed 11243.79 samples/sec Loss 6.8247 LearningRate 0.2145 Epoch: 6 Global Step: 28450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:39:56,299-Speed 10102.57 samples/sec Loss 6.8412 LearningRate 0.2144 Epoch: 6 Global Step: 28460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:00,475-Speed 9811.70 samples/sec Loss 6.8351 LearningRate 0.2143 Epoch: 6 Global Step: 28470 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:03,854-Speed 12125.59 samples/sec Loss 6.8412 LearningRate 0.2142 Epoch: 6 Global Step: 28480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:08,946-Speed 8046.41 samples/sec Loss 6.7252 LearningRate 0.2142 Epoch: 6 Global Step: 28490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:13,069-Speed 9937.02 samples/sec Loss 6.8415 LearningRate 0.2141 Epoch: 6 Global Step: 28500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:16,498-Speed 11949.87 samples/sec Loss 6.8331 LearningRate 0.2140 Epoch: 6 Global Step: 28510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:19,883-Speed 12104.62 samples/sec Loss 6.8312 LearningRate 0.2139 Epoch: 6 Global Step: 28520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:23,273-Speed 12090.30 samples/sec Loss 6.7780 LearningRate 0.2139 Epoch: 6 Global Step: 28530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:26,650-Speed 12130.72 samples/sec Loss 6.7646 LearningRate 0.2138 Epoch: 6 Global Step: 28540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:30,008-Speed 12201.45 samples/sec Loss 6.7612 LearningRate 0.2137 Epoch: 6 Global Step: 28550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:33,407-Speed 12056.64 samples/sec Loss 6.7845 LearningRate 0.2136 Epoch: 6 Global Step: 28560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:36,771-Speed 12178.76 samples/sec Loss 6.7682 LearningRate 0.2135 Epoch: 6 Global Step: 28570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:40,638-Speed 10595.94 samples/sec Loss 6.8170 LearningRate 0.2135 Epoch: 6 Global Step: 28580 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:44,612-Speed 10309.65 samples/sec Loss 6.7941 LearningRate 0.2134 Epoch: 6 Global Step: 28590 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:48,035-Speed 11972.60 samples/sec Loss 6.7896 LearningRate 0.2133 Epoch: 6 Global Step: 28600 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:51,726-Speed 11101.01 samples/sec Loss 6.8167 LearningRate 0.2132 Epoch: 6 Global Step: 28610 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:40:55,657-Speed 10422.33 samples/sec Loss 6.7636 LearningRate 0.2132 Epoch: 6 Global Step: 28620 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:40:59,184-Speed 11614.92 samples/sec Loss 6.7574 LearningRate 0.2131 Epoch: 6 Global Step: 28630 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:02,872-Speed 11111.61 samples/sec Loss 6.7279 LearningRate 0.2130 Epoch: 6 Global Step: 28640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:06,421-Speed 11545.95 samples/sec Loss 6.7316 LearningRate 0.2129 Epoch: 6 Global Step: 28650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:10,161-Speed 10953.88 samples/sec Loss 6.8117 LearningRate 0.2128 Epoch: 6 Global Step: 28660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:13,760-Speed 11386.16 samples/sec Loss 6.7668 LearningRate 0.2128 Epoch: 6 Global Step: 28670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:17,449-Speed 11106.95 samples/sec Loss 6.7985 LearningRate 0.2127 Epoch: 6 Global Step: 28680 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:21,458-Speed 10218.91 samples/sec Loss 6.7901 LearningRate 0.2126 Epoch: 6 Global Step: 28690 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:25,045-Speed 11422.70 samples/sec Loss 6.8078 LearningRate 0.2125 Epoch: 6 Global Step: 28700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:28,947-Speed 10501.85 samples/sec Loss 6.8080 LearningRate 0.2125 Epoch: 6 Global Step: 28710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:32,555-Speed 11354.74 samples/sec Loss 6.8060 LearningRate 0.2124 Epoch: 6 Global Step: 28720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:36,149-Speed 11401.53 samples/sec Loss 6.7971 LearningRate 0.2123 Epoch: 6 Global Step: 28730 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:41:40,231-Speed 10036.33 samples/sec Loss 6.7758 LearningRate 0.2122 Epoch: 6 Global Step: 28740 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:41:43,691-Speed 11842.04 samples/sec Loss 6.7517 LearningRate 0.2121 Epoch: 6 Global Step: 28750 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:41:47,463-Speed 10865.46 samples/sec Loss 6.7901 LearningRate 0.2121 Epoch: 6 Global Step: 28760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:50,969-Speed 11687.91 samples/sec Loss 6.7788 LearningRate 0.2120 Epoch: 6 Global Step: 28770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:54,435-Speed 11820.16 samples/sec Loss 6.8105 LearningRate 0.2119 Epoch: 6 Global Step: 28780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:41:57,893-Speed 11853.05 samples/sec Loss 6.8046 LearningRate 0.2118 Epoch: 6 Global Step: 28790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:01,411-Speed 11646.08 samples/sec Loss 6.8281 LearningRate 0.2118 Epoch: 6 Global Step: 28800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:05,096-Speed 11119.27 samples/sec Loss 6.7626 LearningRate 0.2117 Epoch: 6 Global Step: 28810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:08,862-Speed 10880.60 samples/sec Loss 6.8033 LearningRate 0.2116 Epoch: 6 Global Step: 28820 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:12,657-Speed 10797.25 samples/sec Loss 6.7210 LearningRate 0.2115 Epoch: 6 Global Step: 28830 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:16,168-Speed 11666.98 samples/sec Loss 6.7185 LearningRate 0.2115 Epoch: 6 Global Step: 28840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:19,957-Speed 10814.82 samples/sec Loss 6.7527 LearningRate 0.2114 Epoch: 6 Global Step: 28850 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:24,197-Speed 9662.49 samples/sec Loss 6.8007 LearningRate 0.2113 Epoch: 6 Global Step: 28860 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:42:28,005-Speed 10761.56 samples/sec Loss 6.7275 LearningRate 0.2112 Epoch: 6 Global Step: 28870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:31,637-Speed 11279.95 samples/sec Loss 6.7360 LearningRate 0.2111 Epoch: 6 Global Step: 28880 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:35,345-Speed 11050.39 samples/sec Loss 6.7176 LearningRate 0.2111 Epoch: 6 Global Step: 28890 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:38,773-Speed 11953.68 samples/sec Loss 6.7428 LearningRate 0.2110 Epoch: 6 Global Step: 28900 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:42,325-Speed 11536.95 samples/sec Loss 6.7896 LearningRate 0.2109 Epoch: 6 Global Step: 28910 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:46,343-Speed 10198.03 samples/sec Loss 6.7264 LearningRate 0.2108 Epoch: 6 Global Step: 28920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:49,755-Speed 12013.32 samples/sec Loss 6.7751 LearningRate 0.2108 Epoch: 6 Global Step: 28930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:53,276-Speed 11635.99 samples/sec Loss 6.8306 LearningRate 0.2107 Epoch: 6 Global Step: 28940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:42:56,770-Speed 11728.46 samples/sec Loss 6.8129 LearningRate 0.2106 Epoch: 6 Global Step: 28950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:00,281-Speed 11668.36 samples/sec Loss 6.6988 LearningRate 0.2105 Epoch: 6 Global Step: 28960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:03,966-Speed 11118.43 samples/sec Loss 6.7587 LearningRate 0.2104 Epoch: 6 Global Step: 28970 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:43:07,475-Speed 11680.42 samples/sec Loss 6.8068 LearningRate 0.2104 Epoch: 6 Global Step: 28980 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:11,088-Speed 11339.76 samples/sec Loss 6.7505 LearningRate 0.2103 Epoch: 6 Global Step: 28990 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:14,615-Speed 11617.96 samples/sec Loss 6.7916 LearningRate 0.2102 Epoch: 6 Global Step: 29000 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:18,233-Speed 11325.20 samples/sec Loss 6.7743 LearningRate 0.2101 Epoch: 6 Global Step: 29010 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:21,765-Speed 11601.02 samples/sec Loss 6.7269 LearningRate 0.2101 Epoch: 6 Global Step: 29020 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:25,447-Speed 11127.00 samples/sec Loss 6.7036 LearningRate 0.2100 Epoch: 6 Global Step: 29030 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:28,987-Speed 11577.77 samples/sec Loss 6.7393 LearningRate 0.2099 Epoch: 6 Global Step: 29040 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:32,378-Speed 12079.54 samples/sec Loss 6.7614 LearningRate 0.2098 Epoch: 6 Global Step: 29050 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:35,982-Speed 11369.09 samples/sec Loss 6.6855 LearningRate 0.2098 Epoch: 6 Global Step: 29060 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:39,491-Speed 11675.53 samples/sec Loss 6.7395 LearningRate 0.2097 Epoch: 6 Global Step: 29070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:43,414-Speed 10448.51 samples/sec Loss 6.7120 LearningRate 0.2096 Epoch: 6 Global Step: 29080 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:43:47,215-Speed 10778.45 samples/sec Loss 6.8232 LearningRate 0.2095 Epoch: 6 Global Step: 29090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:51,045-Speed 10700.22 samples/sec Loss 6.7953 LearningRate 0.2094 Epoch: 6 Global Step: 29100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:54,536-Speed 11736.94 samples/sec Loss 6.7536 LearningRate 0.2094 Epoch: 6 Global Step: 29110 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:43:58,446-Speed 10476.99 samples/sec Loss 6.7563 LearningRate 0.2093 Epoch: 6 Global Step: 29120 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:44:02,635-Speed 9780.54 samples/sec Loss 6.7553 LearningRate 0.2092 Epoch: 6 Global Step: 29130 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:44:06,576-Speed 10397.75 samples/sec Loss 6.7311 LearningRate 0.2091 Epoch: 6 Global Step: 29140 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:44:10,387-Speed 10753.44 samples/sec Loss 6.8025 LearningRate 0.2091 Epoch: 6 Global Step: 29150 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:44:14,037-Speed 11222.83 samples/sec Loss 6.7768 LearningRate 0.2090 Epoch: 6 Global Step: 29160 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:44:17,724-Speed 11112.50 samples/sec Loss 6.7516 LearningRate 0.2089 Epoch: 6 Global Step: 29170 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:44:21,196-Speed 11801.85 samples/sec Loss 6.6831 LearningRate 0.2088 Epoch: 6 Global Step: 29180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:44:25,121-Speed 10439.56 samples/sec Loss 6.7934 LearningRate 0.2087 Epoch: 6 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:44:29,332-Speed 9728.61 samples/sec Loss 6.7568 LearningRate 0.2087 Epoch: 6 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:01,504-Speed 1273.23 samples/sec Loss 6.2967 LearningRate 0.2086 Epoch: 7 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:05,482-Speed 10299.85 samples/sec Loss 5.9527 LearningRate 0.2085 Epoch: 7 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:10,148-Speed 8780.90 samples/sec Loss 5.9318 LearningRate 0.2084 Epoch: 7 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:14,087-Speed 10403.39 samples/sec Loss 5.9282 LearningRate 0.2084 Epoch: 7 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:18,779-Speed 8733.32 samples/sec Loss 5.9860 LearningRate 0.2083 Epoch: 7 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:23,072-Speed 9544.36 samples/sec Loss 5.9823 LearningRate 0.2082 Epoch: 7 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:26,466-Speed 12072.97 samples/sec Loss 5.9440 LearningRate 0.2081 Epoch: 7 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:29,830-Speed 12179.97 samples/sec Loss 5.9885 LearningRate 0.2081 Epoch: 7 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:45:33,211-Speed 12117.23 samples/sec Loss 5.9599 LearningRate 0.2080 Epoch: 7 Global Step: 29290 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:45:37,353-Speed 9891.48 samples/sec Loss 5.9789 LearningRate 0.2079 Epoch: 7 Global Step: 29300 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:45:40,758-Speed 12033.01 samples/sec Loss 6.0301 LearningRate 0.2078 Epoch: 7 Global Step: 29310 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:45:44,421-Speed 11187.49 samples/sec Loss 6.0034 LearningRate 0.2078 Epoch: 7 Global Step: 29320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:45:48,150-Speed 10987.42 samples/sec Loss 6.0071 LearningRate 0.2077 Epoch: 7 Global Step: 29330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:45:51,648-Speed 11714.00 samples/sec Loss 6.0750 LearningRate 0.2076 Epoch: 7 Global Step: 29340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:45:55,366-Speed 11020.06 samples/sec Loss 6.0610 LearningRate 0.2075 Epoch: 7 Global Step: 29350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:45:59,134-Speed 10872.23 samples/sec Loss 6.0653 LearningRate 0.2074 Epoch: 7 Global Step: 29360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:02,630-Speed 11721.89 samples/sec Loss 6.0585 LearningRate 0.2074 Epoch: 7 Global Step: 29370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:06,654-Speed 10181.65 samples/sec Loss 6.0818 LearningRate 0.2073 Epoch: 7 Global Step: 29380 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:10,216-Speed 11503.63 samples/sec Loss 6.0400 LearningRate 0.2072 Epoch: 7 Global Step: 29390 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:13,693-Speed 11783.97 samples/sec Loss 6.1236 LearningRate 0.2071 Epoch: 7 Global Step: 29400 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:17,512-Speed 10726.03 samples/sec Loss 6.1393 LearningRate 0.2071 Epoch: 7 Global Step: 29410 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:21,382-Speed 10590.05 samples/sec Loss 6.1499 LearningRate 0.2070 Epoch: 7 Global Step: 29420 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:24,979-Speed 11389.41 samples/sec Loss 6.1272 LearningRate 0.2069 Epoch: 7 Global Step: 29430 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:28,612-Speed 11279.27 samples/sec Loss 6.1252 LearningRate 0.2068 Epoch: 7 Global Step: 29440 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:32,207-Speed 11396.86 samples/sec Loss 6.1293 LearningRate 0.2068 Epoch: 7 Global Step: 29450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:35,820-Speed 11339.82 samples/sec Loss 6.2270 LearningRate 0.2067 Epoch: 7 Global Step: 29460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:39,373-Speed 11531.70 samples/sec Loss 6.1639 LearningRate 0.2066 Epoch: 7 Global Step: 29470 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:42,957-Speed 11432.69 samples/sec Loss 6.2220 LearningRate 0.2065 Epoch: 7 Global Step: 29480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:46,421-Speed 11829.11 samples/sec Loss 6.1135 LearningRate 0.2064 Epoch: 7 Global Step: 29490 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:46:49,851-Speed 11942.74 samples/sec Loss 6.1588 LearningRate 0.2064 Epoch: 7 Global Step: 29500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:53,511-Speed 11195.14 samples/sec Loss 6.1896 LearningRate 0.2063 Epoch: 7 Global Step: 29510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:46:57,049-Speed 11581.86 samples/sec Loss 6.2265 LearningRate 0.2062 Epoch: 7 Global Step: 29520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:00,800-Speed 10926.73 samples/sec Loss 6.1776 LearningRate 0.2061 Epoch: 7 Global Step: 29530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:04,317-Speed 11648.93 samples/sec Loss 6.2188 LearningRate 0.2061 Epoch: 7 Global Step: 29540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:07,908-Speed 11411.22 samples/sec Loss 6.1836 LearningRate 0.2060 Epoch: 7 Global Step: 29550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:11,506-Speed 11386.20 samples/sec Loss 6.2205 LearningRate 0.2059 Epoch: 7 Global Step: 29560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:15,005-Speed 11710.08 samples/sec Loss 6.1979 LearningRate 0.2058 Epoch: 7 Global Step: 29570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:18,806-Speed 10781.47 samples/sec Loss 6.2442 LearningRate 0.2058 Epoch: 7 Global Step: 29580 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:22,913-Speed 9975.73 samples/sec Loss 6.2325 LearningRate 0.2057 Epoch: 7 Global Step: 29590 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:26,403-Speed 11741.53 samples/sec Loss 6.1674 LearningRate 0.2056 Epoch: 7 Global Step: 29600 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:47:30,161-Speed 10901.79 samples/sec Loss 6.2299 LearningRate 0.2055 Epoch: 7 Global Step: 29610 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:33,934-Speed 10859.56 samples/sec Loss 6.2216 LearningRate 0.2055 Epoch: 7 Global Step: 29620 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:37,430-Speed 11736.70 samples/sec Loss 6.2440 LearningRate 0.2054 Epoch: 7 Global Step: 29630 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:41,282-Speed 10636.60 samples/sec Loss 6.2728 LearningRate 0.2053 Epoch: 7 Global Step: 29640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:44,936-Speed 11225.07 samples/sec Loss 6.2732 LearningRate 0.2052 Epoch: 7 Global Step: 29650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:48,366-Speed 11944.45 samples/sec Loss 6.2582 LearningRate 0.2051 Epoch: 7 Global Step: 29660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:52,009-Speed 11249.12 samples/sec Loss 6.2363 LearningRate 0.2051 Epoch: 7 Global Step: 29670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:55,453-Speed 11896.59 samples/sec Loss 6.2434 LearningRate 0.2050 Epoch: 7 Global Step: 29680 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:47:59,261-Speed 10760.28 samples/sec Loss 6.2787 LearningRate 0.2049 Epoch: 7 Global Step: 29690 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:02,985-Speed 11002.13 samples/sec Loss 6.2959 LearningRate 0.2048 Epoch: 7 Global Step: 29700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:06,423-Speed 11916.38 samples/sec Loss 6.2665 LearningRate 0.2048 Epoch: 7 Global Step: 29710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:09,833-Speed 12015.12 samples/sec Loss 6.3784 LearningRate 0.2047 Epoch: 7 Global Step: 29720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:13,571-Speed 10961.32 samples/sec Loss 6.3330 LearningRate 0.2046 Epoch: 7 Global Step: 29730 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:17,012-Speed 11908.93 samples/sec Loss 6.3279 LearningRate 0.2045 Epoch: 7 Global Step: 29740 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:20,634-Speed 11312.29 samples/sec Loss 6.3403 LearningRate 0.2045 Epoch: 7 Global Step: 29750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:24,223-Speed 11416.45 samples/sec Loss 6.2804 LearningRate 0.2044 Epoch: 7 Global Step: 29760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:27,672-Speed 11878.57 samples/sec Loss 6.3564 LearningRate 0.2043 Epoch: 7 Global Step: 29770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:31,293-Speed 11314.50 samples/sec Loss 6.3321 LearningRate 0.2042 Epoch: 7 Global Step: 29780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:34,796-Speed 11696.89 samples/sec Loss 6.3190 LearningRate 0.2042 Epoch: 7 Global Step: 29790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:38,872-Speed 10051.62 samples/sec Loss 6.3578 LearningRate 0.2041 Epoch: 7 Global Step: 29800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:42,436-Speed 11496.73 samples/sec Loss 6.4126 LearningRate 0.2040 Epoch: 7 Global Step: 29810 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:48:46,069-Speed 11279.12 samples/sec Loss 6.3673 LearningRate 0.2039 Epoch: 7 Global Step: 29820 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:49,661-Speed 11408.47 samples/sec Loss 6.3562 LearningRate 0.2039 Epoch: 7 Global Step: 29830 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:48:53,374-Speed 11033.96 samples/sec Loss 6.3915 LearningRate 0.2038 Epoch: 7 Global Step: 29840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:48:56,845-Speed 11805.01 samples/sec Loss 6.3750 LearningRate 0.2037 Epoch: 7 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:00,492-Speed 11238.59 samples/sec Loss 6.3903 LearningRate 0.2036 Epoch: 7 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:04,214-Speed 11010.31 samples/sec Loss 6.3411 LearningRate 0.2035 Epoch: 7 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:08,140-Speed 10436.04 samples/sec Loss 6.4200 LearningRate 0.2035 Epoch: 7 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:11,968-Speed 10703.60 samples/sec Loss 6.3577 LearningRate 0.2034 Epoch: 7 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:15,449-Speed 11774.50 samples/sec Loss 6.3260 LearningRate 0.2033 Epoch: 7 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:19,107-Speed 11200.72 samples/sec Loss 6.3688 LearningRate 0.2032 Epoch: 7 Global Step: 29910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:22,703-Speed 11393.80 samples/sec Loss 6.3917 LearningRate 0.2032 Epoch: 7 Global Step: 29920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:26,356-Speed 11216.21 samples/sec Loss 6.3300 LearningRate 0.2031 Epoch: 7 Global Step: 29930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:29,883-Speed 11614.80 samples/sec Loss 6.4064 LearningRate 0.2030 Epoch: 7 Global Step: 29940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:49:33,712-Speed 10702.70 samples/sec Loss 6.3689 LearningRate 0.2029 Epoch: 7 Global Step: 29950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:49:37,268-Speed 11522.38 samples/sec Loss 6.3582 LearningRate 0.2029 Epoch: 7 Global Step: 29960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:49:40,851-Speed 11435.69 samples/sec Loss 6.4011 LearningRate 0.2028 Epoch: 7 Global Step: 29970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:49:44,458-Speed 11359.44 samples/sec Loss 6.3874 LearningRate 0.2027 Epoch: 7 Global Step: 29980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:48,117-Speed 11198.72 samples/sec Loss 6.3967 LearningRate 0.2026 Epoch: 7 Global Step: 29990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:49:51,570-Speed 11865.70 samples/sec Loss 6.3988 LearningRate 0.2026 Epoch: 7 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:50:12,790-[lfw][30000]XNorm: 11.389398 Training: 2022-01-17 00:50:12,791-[lfw][30000]Accuracy-Flip: 0.99633+-0.00306 Training: 2022-01-17 00:50:12,791-[lfw][30000]Accuracy-Highest: 0.99633 Training: 2022-01-17 00:50:37,098-[cfp_fp][30000]XNorm: 9.507807 Training: 2022-01-17 00:50:37,099-[cfp_fp][30000]Accuracy-Flip: 0.96214+-0.01209 Training: 2022-01-17 00:50:37,102-[cfp_fp][30000]Accuracy-Highest: 0.96457 Training: 2022-01-17 00:50:58,060-[agedb_30][30000]XNorm: 10.894418 Training: 2022-01-17 00:50:58,061-[agedb_30][30000]Accuracy-Flip: 0.96267+-0.00913 Training: 2022-01-17 00:50:58,062-[agedb_30][30000]Accuracy-Highest: 0.96383 Training: 2022-01-17 00:51:01,441-Speed 586.23 samples/sec Loss 6.4233 LearningRate 0.2025 Epoch: 7 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:04,801-Speed 12194.39 samples/sec Loss 6.3925 LearningRate 0.2024 Epoch: 7 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:08,184-Speed 12112.27 samples/sec Loss 6.4120 LearningRate 0.2023 Epoch: 7 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:11,611-Speed 11954.74 samples/sec Loss 6.4283 LearningRate 0.2023 Epoch: 7 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:14,998-Speed 12097.38 samples/sec Loss 6.3839 LearningRate 0.2022 Epoch: 7 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:18,836-Speed 10677.74 samples/sec Loss 6.4152 LearningRate 0.2021 Epoch: 7 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:22,391-Speed 11527.46 samples/sec Loss 6.3762 LearningRate 0.2020 Epoch: 7 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:25,966-Speed 11467.19 samples/sec Loss 6.4150 LearningRate 0.2020 Epoch: 7 Global Step: 30080 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:51:29,483-Speed 11654.28 samples/sec Loss 6.4762 LearningRate 0.2019 Epoch: 7 Global Step: 30090 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:51:33,159-Speed 11145.28 samples/sec Loss 6.4504 LearningRate 0.2018 Epoch: 7 Global Step: 30100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:51:36,846-Speed 11112.80 samples/sec Loss 6.3906 LearningRate 0.2017 Epoch: 7 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:40,375-Speed 11609.29 samples/sec Loss 6.4215 LearningRate 0.2017 Epoch: 7 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:44,186-Speed 10754.33 samples/sec Loss 6.4281 LearningRate 0.2016 Epoch: 7 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:47,572-Speed 12102.43 samples/sec Loss 6.4848 LearningRate 0.2015 Epoch: 7 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:51,297-Speed 10999.04 samples/sec Loss 6.4420 LearningRate 0.2014 Epoch: 7 Global Step: 30150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:55,156-Speed 10617.21 samples/sec Loss 6.4802 LearningRate 0.2014 Epoch: 7 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:51:58,643-Speed 11751.72 samples/sec Loss 6.4330 LearningRate 0.2013 Epoch: 7 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:02,427-Speed 10827.92 samples/sec Loss 6.4045 LearningRate 0.2012 Epoch: 7 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:05,987-Speed 11508.99 samples/sec Loss 6.4613 LearningRate 0.2011 Epoch: 7 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:09,964-Speed 10302.97 samples/sec Loss 6.4163 LearningRate 0.2010 Epoch: 7 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:13,456-Speed 11736.33 samples/sec Loss 6.4307 LearningRate 0.2010 Epoch: 7 Global Step: 30210 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:52:17,582-Speed 9929.01 samples/sec Loss 6.4131 LearningRate 0.2009 Epoch: 7 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:21,055-Speed 11797.26 samples/sec Loss 6.4244 LearningRate 0.2008 Epoch: 7 Global Step: 30230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:24,547-Speed 11734.16 samples/sec Loss 6.3398 LearningRate 0.2007 Epoch: 7 Global Step: 30240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:28,029-Speed 11766.86 samples/sec Loss 6.3777 LearningRate 0.2007 Epoch: 7 Global Step: 30250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:31,532-Speed 11699.21 samples/sec Loss 6.4579 LearningRate 0.2006 Epoch: 7 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:35,228-Speed 11085.52 samples/sec Loss 6.4350 LearningRate 0.2005 Epoch: 7 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:38,952-Speed 11002.08 samples/sec Loss 6.4542 LearningRate 0.2004 Epoch: 7 Global Step: 30280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:42,340-Speed 12095.71 samples/sec Loss 6.4505 LearningRate 0.2004 Epoch: 7 Global Step: 30290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:45,793-Speed 11864.75 samples/sec Loss 6.4374 LearningRate 0.2003 Epoch: 7 Global Step: 30300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:49,308-Speed 11656.83 samples/sec Loss 6.4680 LearningRate 0.2002 Epoch: 7 Global Step: 30310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:52:53,031-Speed 11006.89 samples/sec Loss 6.4244 LearningRate 0.2001 Epoch: 7 Global Step: 30320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:52:57,220-Speed 9781.41 samples/sec Loss 6.4787 LearningRate 0.2001 Epoch: 7 Global Step: 30330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:53:00,743-Speed 11628.35 samples/sec Loss 6.5192 LearningRate 0.2000 Epoch: 7 Global Step: 30340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:53:04,196-Speed 11866.81 samples/sec Loss 6.4860 LearningRate 0.1999 Epoch: 7 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:07,623-Speed 11954.87 samples/sec Loss 6.4359 LearningRate 0.1998 Epoch: 7 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:11,243-Speed 11320.87 samples/sec Loss 6.4286 LearningRate 0.1998 Epoch: 7 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:14,979-Speed 10968.58 samples/sec Loss 6.4569 LearningRate 0.1997 Epoch: 7 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:18,512-Speed 11594.80 samples/sec Loss 6.4528 LearningRate 0.1996 Epoch: 7 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:22,397-Speed 10549.08 samples/sec Loss 6.4789 LearningRate 0.1995 Epoch: 7 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:26,017-Speed 11317.27 samples/sec Loss 6.4443 LearningRate 0.1995 Epoch: 7 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:29,952-Speed 10413.59 samples/sec Loss 6.4562 LearningRate 0.1994 Epoch: 7 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:33,597-Speed 11240.60 samples/sec Loss 6.4018 LearningRate 0.1993 Epoch: 7 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:37,054-Speed 11850.86 samples/sec Loss 6.4738 LearningRate 0.1992 Epoch: 7 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:53:40,495-Speed 11905.86 samples/sec Loss 6.4969 LearningRate 0.1992 Epoch: 7 Global Step: 30450 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:53:43,884-Speed 12091.20 samples/sec Loss 6.4498 LearningRate 0.1991 Epoch: 7 Global Step: 30460 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:53:47,304-Speed 11981.33 samples/sec Loss 6.4961 LearningRate 0.1990 Epoch: 7 Global Step: 30470 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:53:50,666-Speed 12186.60 samples/sec Loss 6.4705 LearningRate 0.1989 Epoch: 7 Global Step: 30480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:53:54,125-Speed 11847.91 samples/sec Loss 6.4672 LearningRate 0.1989 Epoch: 7 Global Step: 30490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:53:58,509-Speed 9344.46 samples/sec Loss 6.4740 LearningRate 0.1988 Epoch: 7 Global Step: 30500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:54:02,164-Speed 11210.72 samples/sec Loss 6.4984 LearningRate 0.1987 Epoch: 7 Global Step: 30510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:54:05,833-Speed 11166.45 samples/sec Loss 6.4742 LearningRate 0.1986 Epoch: 7 Global Step: 30520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:09,371-Speed 11582.85 samples/sec Loss 6.4917 LearningRate 0.1986 Epoch: 7 Global Step: 30530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:13,123-Speed 10920.97 samples/sec Loss 6.4435 LearningRate 0.1985 Epoch: 7 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:16,712-Speed 11416.25 samples/sec Loss 6.4757 LearningRate 0.1984 Epoch: 7 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:20,328-Speed 11330.02 samples/sec Loss 6.5186 LearningRate 0.1983 Epoch: 7 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:23,905-Speed 11457.20 samples/sec Loss 6.4654 LearningRate 0.1983 Epoch: 7 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:27,370-Speed 11824.57 samples/sec Loss 6.5536 LearningRate 0.1982 Epoch: 7 Global Step: 30580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:30,816-Speed 11888.33 samples/sec Loss 6.5326 LearningRate 0.1981 Epoch: 7 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:34,346-Speed 11607.68 samples/sec Loss 6.4904 LearningRate 0.1980 Epoch: 7 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:38,024-Speed 11139.58 samples/sec Loss 6.5004 LearningRate 0.1980 Epoch: 7 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:54:41,605-Speed 11443.90 samples/sec Loss 6.5213 LearningRate 0.1979 Epoch: 7 Global Step: 30620 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:54:45,336-Speed 10982.83 samples/sec Loss 6.4834 LearningRate 0.1978 Epoch: 7 Global Step: 30630 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:54:48,982-Speed 11237.12 samples/sec Loss 6.5335 LearningRate 0.1977 Epoch: 7 Global Step: 30640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:54:52,551-Speed 11481.90 samples/sec Loss 6.5126 LearningRate 0.1977 Epoch: 7 Global Step: 30650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:54:56,464-Speed 10471.12 samples/sec Loss 6.4714 LearningRate 0.1976 Epoch: 7 Global Step: 30660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:00,625-Speed 9845.48 samples/sec Loss 6.5117 LearningRate 0.1975 Epoch: 7 Global Step: 30670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:04,321-Speed 11085.39 samples/sec Loss 6.5025 LearningRate 0.1974 Epoch: 7 Global Step: 30680 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:07,707-Speed 12100.65 samples/sec Loss 6.4641 LearningRate 0.1974 Epoch: 7 Global Step: 30690 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:11,191-Speed 11760.14 samples/sec Loss 6.4943 LearningRate 0.1973 Epoch: 7 Global Step: 30700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:14,891-Speed 11074.29 samples/sec Loss 6.5024 LearningRate 0.1972 Epoch: 7 Global Step: 30710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:18,557-Speed 11179.61 samples/sec Loss 6.5130 LearningRate 0.1971 Epoch: 7 Global Step: 30720 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:55:21,983-Speed 11962.77 samples/sec Loss 6.4740 LearningRate 0.1971 Epoch: 7 Global Step: 30730 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 00:55:25,445-Speed 11833.93 samples/sec Loss 6.4651 LearningRate 0.1970 Epoch: 7 Global Step: 30740 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:28,862-Speed 11993.81 samples/sec Loss 6.4890 LearningRate 0.1969 Epoch: 7 Global Step: 30750 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:32,321-Speed 11845.54 samples/sec Loss 6.4923 LearningRate 0.1968 Epoch: 7 Global Step: 30760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:35,793-Speed 11797.90 samples/sec Loss 6.5200 LearningRate 0.1968 Epoch: 7 Global Step: 30770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:39,405-Speed 11343.94 samples/sec Loss 6.4844 LearningRate 0.1967 Epoch: 7 Global Step: 30780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:43,185-Speed 10841.70 samples/sec Loss 6.5030 LearningRate 0.1966 Epoch: 7 Global Step: 30790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:46,743-Speed 11513.62 samples/sec Loss 6.4618 LearningRate 0.1965 Epoch: 7 Global Step: 30800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:50,477-Speed 10971.32 samples/sec Loss 6.5329 LearningRate 0.1965 Epoch: 7 Global Step: 30810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:55:53,952-Speed 11789.41 samples/sec Loss 6.5254 LearningRate 0.1964 Epoch: 7 Global Step: 30820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:55:57,407-Speed 11858.89 samples/sec Loss 6.5614 LearningRate 0.1963 Epoch: 7 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:01,439-Speed 10162.47 samples/sec Loss 6.4858 LearningRate 0.1962 Epoch: 7 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:05,005-Speed 11488.36 samples/sec Loss 6.5234 LearningRate 0.1962 Epoch: 7 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:09,078-Speed 10059.13 samples/sec Loss 6.5079 LearningRate 0.1961 Epoch: 7 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:12,518-Speed 11910.65 samples/sec Loss 6.4973 LearningRate 0.1960 Epoch: 7 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:16,051-Speed 11595.14 samples/sec Loss 6.4795 LearningRate 0.1959 Epoch: 7 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:19,527-Speed 11787.80 samples/sec Loss 6.4805 LearningRate 0.1959 Epoch: 7 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:23,275-Speed 10930.64 samples/sec Loss 6.5190 LearningRate 0.1958 Epoch: 7 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:26,794-Speed 11644.29 samples/sec Loss 6.5400 LearningRate 0.1957 Epoch: 7 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:30,327-Speed 11595.99 samples/sec Loss 6.5559 LearningRate 0.1956 Epoch: 7 Global Step: 30920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:56:33,864-Speed 11582.62 samples/sec Loss 6.5304 LearningRate 0.1956 Epoch: 7 Global Step: 30930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:56:37,532-Speed 11169.23 samples/sec Loss 6.5137 LearningRate 0.1955 Epoch: 7 Global Step: 30940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:56:41,224-Speed 11097.88 samples/sec Loss 6.4714 LearningRate 0.1954 Epoch: 7 Global Step: 30950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:56:45,095-Speed 10582.27 samples/sec Loss 6.5418 LearningRate 0.1954 Epoch: 7 Global Step: 30960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:56:48,620-Speed 11625.67 samples/sec Loss 6.5050 LearningRate 0.1953 Epoch: 7 Global Step: 30970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:56:52,265-Speed 11238.40 samples/sec Loss 6.4972 LearningRate 0.1952 Epoch: 7 Global Step: 30980 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:56:55,728-Speed 11832.36 samples/sec Loss 6.5128 LearningRate 0.1951 Epoch: 7 Global Step: 30990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:56:59,102-Speed 12144.47 samples/sec Loss 6.5417 LearningRate 0.1951 Epoch: 7 Global Step: 31000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:02,934-Speed 10691.46 samples/sec Loss 6.5190 LearningRate 0.1950 Epoch: 7 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:06,995-Speed 10088.30 samples/sec Loss 6.4743 LearningRate 0.1949 Epoch: 7 Global Step: 31020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:10,613-Speed 11323.38 samples/sec Loss 6.5162 LearningRate 0.1948 Epoch: 7 Global Step: 31030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:14,059-Speed 11886.87 samples/sec Loss 6.5110 LearningRate 0.1948 Epoch: 7 Global Step: 31040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:17,748-Speed 11108.67 samples/sec Loss 6.4565 LearningRate 0.1947 Epoch: 7 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:21,279-Speed 11601.58 samples/sec Loss 6.5397 LearningRate 0.1946 Epoch: 7 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:24,660-Speed 12118.39 samples/sec Loss 6.5163 LearningRate 0.1945 Epoch: 7 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:28,028-Speed 12163.70 samples/sec Loss 6.5529 LearningRate 0.1945 Epoch: 7 Global Step: 31080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:31,742-Speed 11031.79 samples/sec Loss 6.4967 LearningRate 0.1944 Epoch: 7 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:35,698-Speed 10359.73 samples/sec Loss 6.5175 LearningRate 0.1943 Epoch: 7 Global Step: 31100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:39,405-Speed 11049.94 samples/sec Loss 6.5364 LearningRate 0.1942 Epoch: 7 Global Step: 31110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:42,835-Speed 11943.85 samples/sec Loss 6.5206 LearningRate 0.1942 Epoch: 7 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:46,499-Speed 11182.60 samples/sec Loss 6.4768 LearningRate 0.1941 Epoch: 7 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:50,018-Speed 11643.23 samples/sec Loss 6.4781 LearningRate 0.1940 Epoch: 7 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:53,481-Speed 11829.14 samples/sec Loss 6.5170 LearningRate 0.1939 Epoch: 7 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:57:56,923-Speed 11904.42 samples/sec Loss 6.5537 LearningRate 0.1939 Epoch: 7 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:00,494-Speed 11473.74 samples/sec Loss 6.5173 LearningRate 0.1938 Epoch: 7 Global Step: 31170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:04,453-Speed 10349.22 samples/sec Loss 6.4922 LearningRate 0.1937 Epoch: 7 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:07,999-Speed 11551.27 samples/sec Loss 6.4880 LearningRate 0.1936 Epoch: 7 Global Step: 31190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:58:11,991-Speed 10263.64 samples/sec Loss 6.4943 LearningRate 0.1936 Epoch: 7 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:15,725-Speed 10971.64 samples/sec Loss 6.4860 LearningRate 0.1935 Epoch: 7 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:19,538-Speed 10746.37 samples/sec Loss 6.5201 LearningRate 0.1934 Epoch: 7 Global Step: 31220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:23,119-Speed 11441.65 samples/sec Loss 6.4543 LearningRate 0.1933 Epoch: 7 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:26,720-Speed 11379.06 samples/sec Loss 6.4907 LearningRate 0.1933 Epoch: 7 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:30,251-Speed 11602.22 samples/sec Loss 6.4915 LearningRate 0.1932 Epoch: 7 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:33,827-Speed 11455.62 samples/sec Loss 6.5379 LearningRate 0.1931 Epoch: 7 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:37,505-Speed 11141.45 samples/sec Loss 6.4988 LearningRate 0.1930 Epoch: 7 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:41,843-Speed 9442.41 samples/sec Loss 6.5717 LearningRate 0.1930 Epoch: 7 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:45,210-Speed 12166.83 samples/sec Loss 6.4535 LearningRate 0.1929 Epoch: 7 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:58:48,720-Speed 11674.79 samples/sec Loss 6.4739 LearningRate 0.1928 Epoch: 7 Global Step: 31300 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:58:52,291-Speed 11473.84 samples/sec Loss 6.5218 LearningRate 0.1928 Epoch: 7 Global Step: 31310 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:58:55,745-Speed 11861.97 samples/sec Loss 6.6003 LearningRate 0.1927 Epoch: 7 Global Step: 31320 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:58:59,167-Speed 11969.45 samples/sec Loss 6.5150 LearningRate 0.1926 Epoch: 7 Global Step: 31330 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:59:02,543-Speed 12139.28 samples/sec Loss 6.5045 LearningRate 0.1925 Epoch: 7 Global Step: 31340 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:59:05,931-Speed 12089.88 samples/sec Loss 6.4515 LearningRate 0.1925 Epoch: 7 Global Step: 31350 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:59:09,581-Speed 11224.24 samples/sec Loss 6.4650 LearningRate 0.1924 Epoch: 7 Global Step: 31360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:59:13,264-Speed 11124.06 samples/sec Loss 6.4876 LearningRate 0.1923 Epoch: 7 Global Step: 31370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:59:16,972-Speed 11050.78 samples/sec Loss 6.5374 LearningRate 0.1922 Epoch: 7 Global Step: 31380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:20,729-Speed 10904.98 samples/sec Loss 6.4885 LearningRate 0.1922 Epoch: 7 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:24,140-Speed 12009.15 samples/sec Loss 6.5146 LearningRate 0.1921 Epoch: 7 Global Step: 31400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:27,695-Speed 11526.20 samples/sec Loss 6.5499 LearningRate 0.1920 Epoch: 7 Global Step: 31410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:31,501-Speed 10763.51 samples/sec Loss 6.5068 LearningRate 0.1919 Epoch: 7 Global Step: 31420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:35,322-Speed 10722.48 samples/sec Loss 6.5016 LearningRate 0.1919 Epoch: 7 Global Step: 31430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:38,710-Speed 12092.71 samples/sec Loss 6.5435 LearningRate 0.1918 Epoch: 7 Global Step: 31440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:42,404-Speed 11089.97 samples/sec Loss 6.5005 LearningRate 0.1917 Epoch: 7 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:46,399-Speed 10255.80 samples/sec Loss 6.4701 LearningRate 0.1916 Epoch: 7 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:49,971-Speed 11472.41 samples/sec Loss 6.5356 LearningRate 0.1916 Epoch: 7 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 00:59:53,449-Speed 11779.76 samples/sec Loss 6.5467 LearningRate 0.1915 Epoch: 7 Global Step: 31480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 00:59:57,360-Speed 10476.87 samples/sec Loss 6.5634 LearningRate 0.1914 Epoch: 7 Global Step: 31490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:00:00,819-Speed 11842.68 samples/sec Loss 6.5314 LearningRate 0.1914 Epoch: 7 Global Step: 31500 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:00:04,351-Speed 11602.79 samples/sec Loss 6.5265 LearningRate 0.1913 Epoch: 7 Global Step: 31510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:00:07,860-Speed 11674.56 samples/sec Loss 6.4955 LearningRate 0.1912 Epoch: 7 Global Step: 31520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:00:11,283-Speed 11969.96 samples/sec Loss 6.4600 LearningRate 0.1911 Epoch: 7 Global Step: 31530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:00:14,666-Speed 12110.65 samples/sec Loss 6.4718 LearningRate 0.1911 Epoch: 7 Global Step: 31540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:00:18,138-Speed 11799.14 samples/sec Loss 6.5324 LearningRate 0.1910 Epoch: 7 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:21,891-Speed 10918.14 samples/sec Loss 6.5136 LearningRate 0.1909 Epoch: 7 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:26,149-Speed 9622.07 samples/sec Loss 6.5031 LearningRate 0.1908 Epoch: 7 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:29,924-Speed 10852.53 samples/sec Loss 6.4705 LearningRate 0.1908 Epoch: 7 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:33,739-Speed 10738.65 samples/sec Loss 6.4911 LearningRate 0.1907 Epoch: 7 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:37,464-Speed 11000.79 samples/sec Loss 6.5057 LearningRate 0.1906 Epoch: 7 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:40,936-Speed 11798.67 samples/sec Loss 6.4955 LearningRate 0.1905 Epoch: 7 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:44,985-Speed 10118.56 samples/sec Loss 6.5222 LearningRate 0.1905 Epoch: 7 Global Step: 31620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:48,409-Speed 11965.84 samples/sec Loss 6.4884 LearningRate 0.1904 Epoch: 7 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:51,951-Speed 11566.40 samples/sec Loss 6.5354 LearningRate 0.1903 Epoch: 7 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:00:55,543-Speed 11406.82 samples/sec Loss 6.5083 LearningRate 0.1902 Epoch: 7 Global Step: 31650 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:00:59,359-Speed 10737.46 samples/sec Loss 6.5014 LearningRate 0.1902 Epoch: 7 Global Step: 31660 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:01:02,884-Speed 11621.60 samples/sec Loss 6.5011 LearningRate 0.1901 Epoch: 7 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:06,612-Speed 10994.32 samples/sec Loss 6.5284 LearningRate 0.1900 Epoch: 7 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:10,221-Speed 11353.67 samples/sec Loss 6.5265 LearningRate 0.1900 Epoch: 7 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:13,630-Speed 12014.43 samples/sec Loss 6.4669 LearningRate 0.1899 Epoch: 7 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:17,017-Speed 12099.29 samples/sec Loss 6.5402 LearningRate 0.1898 Epoch: 7 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:20,486-Speed 11810.10 samples/sec Loss 6.4873 LearningRate 0.1897 Epoch: 7 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:23,951-Speed 11822.97 samples/sec Loss 6.4537 LearningRate 0.1897 Epoch: 7 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:27,723-Speed 10864.29 samples/sec Loss 6.4407 LearningRate 0.1896 Epoch: 7 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:31,381-Speed 11196.85 samples/sec Loss 6.4626 LearningRate 0.1895 Epoch: 7 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:35,510-Speed 9925.51 samples/sec Loss 6.4892 LearningRate 0.1894 Epoch: 7 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:01:39,056-Speed 11553.13 samples/sec Loss 6.4968 LearningRate 0.1894 Epoch: 7 Global Step: 31770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:01:42,527-Speed 11802.75 samples/sec Loss 6.4660 LearningRate 0.1893 Epoch: 7 Global Step: 31780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:01:46,527-Speed 10241.61 samples/sec Loss 6.5610 LearningRate 0.1892 Epoch: 7 Global Step: 31790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:01:49,954-Speed 11954.07 samples/sec Loss 6.4859 LearningRate 0.1891 Epoch: 7 Global Step: 31800 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:01:53,436-Speed 11770.06 samples/sec Loss 6.4797 LearningRate 0.1891 Epoch: 7 Global Step: 31810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:01:56,849-Speed 12003.69 samples/sec Loss 6.4293 LearningRate 0.1890 Epoch: 7 Global Step: 31820 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:00,266-Speed 11990.53 samples/sec Loss 6.5631 LearningRate 0.1889 Epoch: 7 Global Step: 31830 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:03,762-Speed 11720.64 samples/sec Loss 6.4813 LearningRate 0.1889 Epoch: 7 Global Step: 31840 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:07,301-Speed 11576.12 samples/sec Loss 6.4259 LearningRate 0.1888 Epoch: 7 Global Step: 31850 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:10,783-Speed 11764.54 samples/sec Loss 6.5067 LearningRate 0.1887 Epoch: 7 Global Step: 31860 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:14,951-Speed 9830.52 samples/sec Loss 6.5184 LearningRate 0.1886 Epoch: 7 Global Step: 31870 Fp16 Grad Scale: 524288 Required: 6 hours Training: 2022-01-17 01:02:18,894-Speed 10389.61 samples/sec Loss 6.4622 LearningRate 0.1886 Epoch: 7 Global Step: 31880 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:22,422-Speed 11614.70 samples/sec Loss 6.4795 LearningRate 0.1885 Epoch: 7 Global Step: 31890 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:25,820-Speed 12056.07 samples/sec Loss 6.4800 LearningRate 0.1884 Epoch: 7 Global Step: 31900 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:29,270-Speed 11877.00 samples/sec Loss 6.5452 LearningRate 0.1883 Epoch: 7 Global Step: 31910 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:32,636-Speed 12171.37 samples/sec Loss 6.4845 LearningRate 0.1883 Epoch: 7 Global Step: 31920 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:36,674-Speed 10146.51 samples/sec Loss 6.5249 LearningRate 0.1882 Epoch: 7 Global Step: 31930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:40,154-Speed 11772.98 samples/sec Loss 6.4856 LearningRate 0.1881 Epoch: 7 Global Step: 31940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:43,818-Speed 11181.29 samples/sec Loss 6.4849 LearningRate 0.1880 Epoch: 7 Global Step: 31950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:47,621-Speed 10772.54 samples/sec Loss 6.5031 LearningRate 0.1880 Epoch: 7 Global Step: 31960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:51,303-Speed 11128.45 samples/sec Loss 6.5249 LearningRate 0.1879 Epoch: 7 Global Step: 31970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:54,719-Speed 11992.96 samples/sec Loss 6.4922 LearningRate 0.1878 Epoch: 7 Global Step: 31980 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:02:58,143-Speed 11966.34 samples/sec Loss 6.4797 LearningRate 0.1878 Epoch: 7 Global Step: 31990 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:01,797-Speed 11211.22 samples/sec Loss 6.4745 LearningRate 0.1877 Epoch: 7 Global Step: 32000 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:05,523-Speed 10994.26 samples/sec Loss 6.4389 LearningRate 0.1876 Epoch: 7 Global Step: 32010 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:08,944-Speed 11980.95 samples/sec Loss 6.4115 LearningRate 0.1875 Epoch: 7 Global Step: 32020 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:12,440-Speed 11716.45 samples/sec Loss 6.4575 LearningRate 0.1875 Epoch: 7 Global Step: 32030 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:16,566-Speed 9928.90 samples/sec Loss 6.4781 LearningRate 0.1874 Epoch: 7 Global Step: 32040 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:20,171-Speed 11366.21 samples/sec Loss 6.4815 LearningRate 0.1873 Epoch: 7 Global Step: 32050 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:23,834-Speed 11185.00 samples/sec Loss 6.5237 LearningRate 0.1872 Epoch: 7 Global Step: 32060 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:27,458-Speed 11305.49 samples/sec Loss 6.4701 LearningRate 0.1872 Epoch: 7 Global Step: 32070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:03:31,086-Speed 11296.01 samples/sec Loss 6.4675 LearningRate 0.1871 Epoch: 7 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:34,515-Speed 11948.08 samples/sec Loss 6.5333 LearningRate 0.1870 Epoch: 7 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:38,046-Speed 11603.29 samples/sec Loss 6.4886 LearningRate 0.1870 Epoch: 7 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:41,743-Speed 11080.73 samples/sec Loss 6.4960 LearningRate 0.1869 Epoch: 7 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:45,550-Speed 10783.62 samples/sec Loss 6.4909 LearningRate 0.1868 Epoch: 7 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:49,131-Speed 11440.59 samples/sec Loss 6.4486 LearningRate 0.1867 Epoch: 7 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:52,666-Speed 11589.84 samples/sec Loss 6.4863 LearningRate 0.1867 Epoch: 7 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:56,337-Speed 11159.10 samples/sec Loss 6.4777 LearningRate 0.1866 Epoch: 7 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:03:59,862-Speed 11628.51 samples/sec Loss 6.4926 LearningRate 0.1865 Epoch: 7 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:04:03,518-Speed 11207.52 samples/sec Loss 6.4645 LearningRate 0.1864 Epoch: 7 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:04:07,062-Speed 11561.36 samples/sec Loss 6.5159 LearningRate 0.1864 Epoch: 7 Global Step: 32180 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:04:10,559-Speed 11715.60 samples/sec Loss 6.4161 LearningRate 0.1863 Epoch: 7 Global Step: 32190 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:04:14,151-Speed 11405.03 samples/sec Loss 6.4828 LearningRate 0.1862 Epoch: 7 Global Step: 32200 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-17 01:04:18,311-Speed 9848.49 samples/sec Loss 6.4255 LearningRate 0.1862 Epoch: 7 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-17 01:04:22,146-Speed 10683.83 samples/sec Loss 6.5438 LearningRate 0.1861 Epoch: 7 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:25,670-Speed 11628.14 samples/sec Loss 6.4955 LearningRate 0.1860 Epoch: 7 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:29,635-Speed 10331.26 samples/sec Loss 6.4586 LearningRate 0.1859 Epoch: 7 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:33,067-Speed 11936.20 samples/sec Loss 6.4731 LearningRate 0.1859 Epoch: 7 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:36,730-Speed 11186.37 samples/sec Loss 6.4995 LearningRate 0.1858 Epoch: 7 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:40,277-Speed 11551.99 samples/sec Loss 6.4684 LearningRate 0.1857 Epoch: 7 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:43,828-Speed 11540.01 samples/sec Loss 6.5027 LearningRate 0.1856 Epoch: 7 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:47,575-Speed 10932.73 samples/sec Loss 6.4558 LearningRate 0.1856 Epoch: 7 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:51,214-Speed 11256.31 samples/sec Loss 6.5075 LearningRate 0.1855 Epoch: 7 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:04:54,878-Speed 11183.05 samples/sec Loss 6.5173 LearningRate 0.1854 Epoch: 7 Global Step: 32310 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:04:58,709-Speed 10693.47 samples/sec Loss 6.4663 LearningRate 0.1854 Epoch: 7 Global Step: 32320 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:02,176-Speed 11816.29 samples/sec Loss 6.4575 LearningRate 0.1853 Epoch: 7 Global Step: 32330 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:05,648-Speed 11800.70 samples/sec Loss 6.4793 LearningRate 0.1852 Epoch: 7 Global Step: 32340 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:09,300-Speed 11218.60 samples/sec Loss 6.4243 LearningRate 0.1851 Epoch: 7 Global Step: 32350 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:12,766-Speed 11820.91 samples/sec Loss 6.5302 LearningRate 0.1851 Epoch: 7 Global Step: 32360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:16,328-Speed 11502.06 samples/sec Loss 6.4860 LearningRate 0.1850 Epoch: 7 Global Step: 32370 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:20,474-Speed 9883.00 samples/sec Loss 6.4443 LearningRate 0.1849 Epoch: 7 Global Step: 32380 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:23,931-Speed 11848.61 samples/sec Loss 6.4718 LearningRate 0.1848 Epoch: 7 Global Step: 32390 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:27,696-Speed 10882.32 samples/sec Loss 6.4405 LearningRate 0.1848 Epoch: 7 Global Step: 32400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:31,249-Speed 11533.22 samples/sec Loss 6.4772 LearningRate 0.1847 Epoch: 7 Global Step: 32410 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:05:34,684-Speed 11925.42 samples/sec Loss 6.4741 LearningRate 0.1846 Epoch: 7 Global Step: 32420 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:38,132-Speed 11883.12 samples/sec Loss 6.4874 LearningRate 0.1846 Epoch: 7 Global Step: 32430 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:41,836-Speed 11060.95 samples/sec Loss 6.4825 LearningRate 0.1845 Epoch: 7 Global Step: 32440 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:45,745-Speed 10482.93 samples/sec Loss 6.4865 LearningRate 0.1844 Epoch: 7 Global Step: 32450 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:49,223-Speed 11777.28 samples/sec Loss 6.4533 LearningRate 0.1843 Epoch: 7 Global Step: 32460 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:05:52,816-Speed 11400.92 samples/sec Loss 6.4548 LearningRate 0.1843 Epoch: 7 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:05:56,539-Speed 11005.90 samples/sec Loss 6.4367 LearningRate 0.1842 Epoch: 7 Global Step: 32480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:00,008-Speed 11811.99 samples/sec Loss 6.4360 LearningRate 0.1841 Epoch: 7 Global Step: 32490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:03,526-Speed 11646.50 samples/sec Loss 6.4134 LearningRate 0.1841 Epoch: 7 Global Step: 32500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:07,664-Speed 9900.55 samples/sec Loss 6.4374 LearningRate 0.1840 Epoch: 7 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:11,184-Speed 11636.89 samples/sec Loss 6.4486 LearningRate 0.1839 Epoch: 7 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:14,595-Speed 12014.24 samples/sec Loss 6.5587 LearningRate 0.1838 Epoch: 7 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:17,976-Speed 12117.41 samples/sec Loss 6.4988 LearningRate 0.1838 Epoch: 7 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:22,346-Speed 9376.33 samples/sec Loss 6.4763 LearningRate 0.1837 Epoch: 7 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:25,814-Speed 11813.64 samples/sec Loss 6.4496 LearningRate 0.1836 Epoch: 7 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:29,284-Speed 11807.85 samples/sec Loss 6.3952 LearningRate 0.1835 Epoch: 7 Global Step: 32570 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:06:32,761-Speed 11781.21 samples/sec Loss 6.4536 LearningRate 0.1835 Epoch: 7 Global Step: 32580 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:06:36,330-Speed 11480.68 samples/sec Loss 6.4199 LearningRate 0.1834 Epoch: 7 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:39,803-Speed 11793.45 samples/sec Loss 6.3892 LearningRate 0.1833 Epoch: 7 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:43,301-Speed 11715.07 samples/sec Loss 6.4790 LearningRate 0.1833 Epoch: 7 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:46,870-Speed 11478.71 samples/sec Loss 6.4838 LearningRate 0.1832 Epoch: 7 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:50,290-Speed 11980.01 samples/sec Loss 6.4673 LearningRate 0.1831 Epoch: 7 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:53,784-Speed 11726.02 samples/sec Loss 6.5147 LearningRate 0.1830 Epoch: 7 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:06:57,455-Speed 11161.62 samples/sec Loss 6.4527 LearningRate 0.1830 Epoch: 7 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:01,554-Speed 9995.68 samples/sec Loss 6.4289 LearningRate 0.1829 Epoch: 7 Global Step: 32660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:05,287-Speed 10976.22 samples/sec Loss 6.4781 LearningRate 0.1828 Epoch: 7 Global Step: 32670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:08,854-Speed 11484.49 samples/sec Loss 6.4024 LearningRate 0.1828 Epoch: 7 Global Step: 32680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:12,584-Speed 10984.16 samples/sec Loss 6.4716 LearningRate 0.1827 Epoch: 7 Global Step: 32690 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:07:16,324-Speed 10956.12 samples/sec Loss 6.4035 LearningRate 0.1826 Epoch: 7 Global Step: 32700 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:07:20,092-Speed 10872.04 samples/sec Loss 6.4143 LearningRate 0.1825 Epoch: 7 Global Step: 32710 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:07:23,611-Speed 11641.63 samples/sec Loss 6.4065 LearningRate 0.1825 Epoch: 7 Global Step: 32720 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:07:27,318-Speed 11053.98 samples/sec Loss 6.4104 LearningRate 0.1824 Epoch: 7 Global Step: 32730 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:07:30,750-Speed 11936.10 samples/sec Loss 6.4689 LearningRate 0.1823 Epoch: 7 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:34,163-Speed 12006.09 samples/sec Loss 6.4285 LearningRate 0.1823 Epoch: 7 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:38,014-Speed 10637.71 samples/sec Loss 6.4453 LearningRate 0.1822 Epoch: 7 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:41,555-Speed 11570.39 samples/sec Loss 6.4104 LearningRate 0.1821 Epoch: 7 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:45,064-Speed 11678.92 samples/sec Loss 6.4440 LearningRate 0.1820 Epoch: 7 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:48,782-Speed 11018.66 samples/sec Loss 6.3848 LearningRate 0.1820 Epoch: 7 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:52,303-Speed 11635.07 samples/sec Loss 6.3810 LearningRate 0.1819 Epoch: 7 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:07:56,681-Speed 9357.61 samples/sec Loss 6.4132 LearningRate 0.1818 Epoch: 7 Global Step: 32810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:08:00,221-Speed 11574.95 samples/sec Loss 6.4512 LearningRate 0.1817 Epoch: 7 Global Step: 32820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:08:03,658-Speed 11919.48 samples/sec Loss 6.4162 LearningRate 0.1817 Epoch: 7 Global Step: 32830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:08:07,040-Speed 12117.26 samples/sec Loss 6.4206 LearningRate 0.1816 Epoch: 7 Global Step: 32840 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:10,418-Speed 12127.88 samples/sec Loss 6.3828 LearningRate 0.1815 Epoch: 7 Global Step: 32850 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:13,867-Speed 11878.41 samples/sec Loss 6.4426 LearningRate 0.1815 Epoch: 7 Global Step: 32860 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:17,432-Speed 11491.29 samples/sec Loss 6.4517 LearningRate 0.1814 Epoch: 7 Global Step: 32870 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:21,195-Speed 10887.73 samples/sec Loss 6.4500 LearningRate 0.1813 Epoch: 7 Global Step: 32880 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:25,171-Speed 10303.67 samples/sec Loss 6.4729 LearningRate 0.1812 Epoch: 7 Global Step: 32890 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:28,891-Speed 11014.20 samples/sec Loss 6.4064 LearningRate 0.1812 Epoch: 7 Global Step: 32900 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:32,497-Speed 11361.66 samples/sec Loss 6.4295 LearningRate 0.1811 Epoch: 7 Global Step: 32910 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:36,166-Speed 11166.02 samples/sec Loss 6.4782 LearningRate 0.1810 Epoch: 7 Global Step: 32920 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:39,803-Speed 11266.22 samples/sec Loss 6.3934 LearningRate 0.1810 Epoch: 7 Global Step: 32930 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:43,366-Speed 11500.36 samples/sec Loss 6.4363 LearningRate 0.1809 Epoch: 7 Global Step: 32940 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:08:46,799-Speed 11934.86 samples/sec Loss 6.3884 LearningRate 0.1808 Epoch: 7 Global Step: 32950 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:50,355-Speed 11519.95 samples/sec Loss 6.4467 LearningRate 0.1807 Epoch: 7 Global Step: 32960 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:54,030-Speed 11147.91 samples/sec Loss 6.3785 LearningRate 0.1807 Epoch: 7 Global Step: 32970 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:08:57,486-Speed 11854.88 samples/sec Loss 6.4612 LearningRate 0.1806 Epoch: 7 Global Step: 32980 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:01,538-Speed 10111.98 samples/sec Loss 6.4408 LearningRate 0.1805 Epoch: 7 Global Step: 32990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:05,050-Speed 11666.65 samples/sec Loss 6.4135 LearningRate 0.1805 Epoch: 7 Global Step: 33000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:08,705-Speed 11209.91 samples/sec Loss 6.4340 LearningRate 0.1804 Epoch: 7 Global Step: 33010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:12,298-Speed 11401.88 samples/sec Loss 6.4528 LearningRate 0.1803 Epoch: 7 Global Step: 33020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:15,934-Speed 11268.89 samples/sec Loss 6.4215 LearningRate 0.1802 Epoch: 7 Global Step: 33030 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:19,376-Speed 11903.25 samples/sec Loss 6.4263 LearningRate 0.1802 Epoch: 7 Global Step: 33040 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:22,802-Speed 11956.21 samples/sec Loss 6.4005 LearningRate 0.1801 Epoch: 7 Global Step: 33050 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:09:26,689-Speed 10539.46 samples/sec Loss 6.3816 LearningRate 0.1800 Epoch: 7 Global Step: 33060 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:30,337-Speed 11232.29 samples/sec Loss 6.4053 LearningRate 0.1800 Epoch: 7 Global Step: 33070 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:33,756-Speed 11984.07 samples/sec Loss 6.4203 LearningRate 0.1799 Epoch: 7 Global Step: 33080 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:37,178-Speed 11972.38 samples/sec Loss 6.4266 LearningRate 0.1798 Epoch: 7 Global Step: 33090 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:40,968-Speed 10809.51 samples/sec Loss 6.3529 LearningRate 0.1797 Epoch: 7 Global Step: 33100 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:44,369-Speed 12049.76 samples/sec Loss 6.4371 LearningRate 0.1797 Epoch: 7 Global Step: 33110 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:47,743-Speed 12140.98 samples/sec Loss 6.3785 LearningRate 0.1796 Epoch: 7 Global Step: 33120 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:09:51,305-Speed 11502.60 samples/sec Loss 6.3753 LearningRate 0.1795 Epoch: 7 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:09:55,119-Speed 10742.67 samples/sec Loss 6.5033 LearningRate 0.1795 Epoch: 7 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:09:59,013-Speed 10520.83 samples/sec Loss 6.4227 LearningRate 0.1794 Epoch: 7 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:03,554-Speed 9021.68 samples/sec Loss 6.3820 LearningRate 0.1793 Epoch: 7 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:07,165-Speed 11347.80 samples/sec Loss 6.3869 LearningRate 0.1792 Epoch: 7 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:10,621-Speed 11853.27 samples/sec Loss 6.3970 LearningRate 0.1792 Epoch: 7 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:14,137-Speed 11652.59 samples/sec Loss 6.4181 LearningRate 0.1791 Epoch: 7 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:17,690-Speed 11532.96 samples/sec Loss 6.3931 LearningRate 0.1790 Epoch: 7 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:21,174-Speed 11759.68 samples/sec Loss 6.4153 LearningRate 0.1790 Epoch: 7 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:24,639-Speed 11824.42 samples/sec Loss 6.3575 LearningRate 0.1789 Epoch: 7 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:28,573-Speed 10414.53 samples/sec Loss 6.4317 LearningRate 0.1788 Epoch: 7 Global Step: 33230 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:10:31,981-Speed 12020.99 samples/sec Loss 6.3709 LearningRate 0.1787 Epoch: 7 Global Step: 33240 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:10:35,668-Speed 11111.44 samples/sec Loss 6.3932 LearningRate 0.1787 Epoch: 7 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:39,224-Speed 11524.76 samples/sec Loss 6.3946 LearningRate 0.1786 Epoch: 7 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:42,797-Speed 11466.50 samples/sec Loss 6.4030 LearningRate 0.1785 Epoch: 7 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:46,369-Speed 11470.57 samples/sec Loss 6.4523 LearningRate 0.1785 Epoch: 7 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:49,814-Speed 11889.74 samples/sec Loss 6.3584 LearningRate 0.1784 Epoch: 7 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:53,357-Speed 11563.72 samples/sec Loss 6.3859 LearningRate 0.1783 Epoch: 7 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:10:56,798-Speed 11906.06 samples/sec Loss 6.4109 LearningRate 0.1782 Epoch: 7 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:11:00,366-Speed 11484.05 samples/sec Loss 6.3518 LearningRate 0.1782 Epoch: 7 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:11:04,193-Speed 10706.87 samples/sec Loss 6.3852 LearningRate 0.1781 Epoch: 7 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:11:08,730-Speed 9028.65 samples/sec Loss 6.3741 LearningRate 0.1780 Epoch: 7 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:11:12,237-Speed 11696.89 samples/sec Loss 6.4074 LearningRate 0.1780 Epoch: 7 Global Step: 33350 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:11:15,613-Speed 12136.57 samples/sec Loss 6.3716 LearningRate 0.1779 Epoch: 7 Global Step: 33360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:11:20,185-Speed 8960.66 samples/sec Loss 6.3719 LearningRate 0.1778 Epoch: 7 Global Step: 33370 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:11:52,106-Speed 1283.22 samples/sec Loss 6.0940 LearningRate 0.1777 Epoch: 8 Global Step: 33380 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:11:57,198-Speed 8045.82 samples/sec Loss 5.5792 LearningRate 0.1777 Epoch: 8 Global Step: 33390 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:12:00,867-Speed 11167.41 samples/sec Loss 5.5920 LearningRate 0.1776 Epoch: 8 Global Step: 33400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:12:05,405-Speed 9028.78 samples/sec Loss 5.6203 LearningRate 0.1775 Epoch: 8 Global Step: 33410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:12:09,393-Speed 10271.60 samples/sec Loss 5.5823 LearningRate 0.1775 Epoch: 8 Global Step: 33420 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:12:12,910-Speed 11652.32 samples/sec Loss 5.6491 LearningRate 0.1774 Epoch: 8 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:17,223-Speed 9500.00 samples/sec Loss 5.5841 LearningRate 0.1773 Epoch: 8 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:20,581-Speed 12200.74 samples/sec Loss 5.6341 LearningRate 0.1773 Epoch: 8 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:24,235-Speed 11213.75 samples/sec Loss 5.6902 LearningRate 0.1772 Epoch: 8 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:28,129-Speed 10520.48 samples/sec Loss 5.6394 LearningRate 0.1771 Epoch: 8 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:31,757-Speed 11291.53 samples/sec Loss 5.6665 LearningRate 0.1770 Epoch: 8 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:35,370-Speed 11342.14 samples/sec Loss 5.6281 LearningRate 0.1770 Epoch: 8 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:39,483-Speed 9959.81 samples/sec Loss 5.7125 LearningRate 0.1769 Epoch: 8 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:42,923-Speed 11912.75 samples/sec Loss 5.6249 LearningRate 0.1768 Epoch: 8 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:46,766-Speed 10659.40 samples/sec Loss 5.6970 LearningRate 0.1768 Epoch: 8 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:50,404-Speed 11263.47 samples/sec Loss 5.7170 LearningRate 0.1767 Epoch: 8 Global Step: 33530 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:12:54,535-Speed 9918.12 samples/sec Loss 5.7433 LearningRate 0.1766 Epoch: 8 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:12:58,414-Speed 10560.49 samples/sec Loss 5.7096 LearningRate 0.1765 Epoch: 8 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:01,876-Speed 11834.83 samples/sec Loss 5.7064 LearningRate 0.1765 Epoch: 8 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:05,600-Speed 11003.77 samples/sec Loss 5.7324 LearningRate 0.1764 Epoch: 8 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:09,165-Speed 11495.45 samples/sec Loss 5.7354 LearningRate 0.1763 Epoch: 8 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:12,918-Speed 10915.78 samples/sec Loss 5.6819 LearningRate 0.1763 Epoch: 8 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:16,519-Speed 11379.42 samples/sec Loss 5.7302 LearningRate 0.1762 Epoch: 8 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:20,262-Speed 10944.75 samples/sec Loss 5.7344 LearningRate 0.1761 Epoch: 8 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:23,901-Speed 11258.78 samples/sec Loss 5.8149 LearningRate 0.1760 Epoch: 8 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:27,589-Speed 11110.25 samples/sec Loss 5.7759 LearningRate 0.1760 Epoch: 8 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:13:31,160-Speed 11473.99 samples/sec Loss 5.7988 LearningRate 0.1759 Epoch: 8 Global Step: 33640 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:13:34,932-Speed 10863.55 samples/sec Loss 5.7305 LearningRate 0.1758 Epoch: 8 Global Step: 33650 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:13:38,730-Speed 10787.69 samples/sec Loss 5.8261 LearningRate 0.1758 Epoch: 8 Global Step: 33660 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:13:42,432-Speed 11067.76 samples/sec Loss 5.8201 LearningRate 0.1757 Epoch: 8 Global Step: 33670 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:13:46,053-Speed 11315.14 samples/sec Loss 5.8106 LearningRate 0.1756 Epoch: 8 Global Step: 33680 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:13:49,551-Speed 11714.29 samples/sec Loss 5.8517 LearningRate 0.1756 Epoch: 8 Global Step: 33690 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:13:53,143-Speed 11404.83 samples/sec Loss 5.8380 LearningRate 0.1755 Epoch: 8 Global Step: 33700 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:13:56,899-Speed 10910.23 samples/sec Loss 5.8574 LearningRate 0.1754 Epoch: 8 Global Step: 33710 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:14:00,388-Speed 11742.18 samples/sec Loss 5.7931 LearningRate 0.1753 Epoch: 8 Global Step: 33720 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:14:03,858-Speed 11806.88 samples/sec Loss 5.7473 LearningRate 0.1753 Epoch: 8 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:08,150-Speed 9546.53 samples/sec Loss 5.8224 LearningRate 0.1752 Epoch: 8 Global Step: 33740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:11,710-Speed 11511.40 samples/sec Loss 5.8383 LearningRate 0.1751 Epoch: 8 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:15,497-Speed 10817.22 samples/sec Loss 5.7979 LearningRate 0.1751 Epoch: 8 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:18,999-Speed 11699.84 samples/sec Loss 5.8770 LearningRate 0.1750 Epoch: 8 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:22,504-Speed 11688.91 samples/sec Loss 5.8659 LearningRate 0.1749 Epoch: 8 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:26,123-Speed 11322.55 samples/sec Loss 5.8696 LearningRate 0.1748 Epoch: 8 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:29,688-Speed 11492.37 samples/sec Loss 5.9132 LearningRate 0.1748 Epoch: 8 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:33,411-Speed 11001.26 samples/sec Loss 5.8854 LearningRate 0.1747 Epoch: 8 Global Step: 33810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:37,496-Speed 10031.58 samples/sec Loss 5.9335 LearningRate 0.1746 Epoch: 8 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:14:41,102-Speed 11360.27 samples/sec Loss 5.9189 LearningRate 0.1746 Epoch: 8 Global Step: 33830 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:14:44,611-Speed 11676.01 samples/sec Loss 5.9291 LearningRate 0.1745 Epoch: 8 Global Step: 33840 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:14:48,470-Speed 10616.35 samples/sec Loss 5.8654 LearningRate 0.1744 Epoch: 8 Global Step: 33850 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:14:52,192-Speed 11006.15 samples/sec Loss 5.9325 LearningRate 0.1744 Epoch: 8 Global Step: 33860 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:14:55,731-Speed 11579.04 samples/sec Loss 5.9018 LearningRate 0.1743 Epoch: 8 Global Step: 33870 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:14:59,285-Speed 11529.65 samples/sec Loss 5.9101 LearningRate 0.1742 Epoch: 8 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:02,692-Speed 12027.79 samples/sec Loss 5.9241 LearningRate 0.1741 Epoch: 8 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:06,430-Speed 10957.45 samples/sec Loss 5.9127 LearningRate 0.1741 Epoch: 8 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:10,014-Speed 11432.60 samples/sec Loss 6.0166 LearningRate 0.1740 Epoch: 8 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:13,826-Speed 10746.69 samples/sec Loss 5.9772 LearningRate 0.1739 Epoch: 8 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:17,497-Speed 11163.57 samples/sec Loss 5.9292 LearningRate 0.1739 Epoch: 8 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:21,416-Speed 10453.47 samples/sec Loss 5.9325 LearningRate 0.1738 Epoch: 8 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:24,853-Speed 11919.77 samples/sec Loss 5.9680 LearningRate 0.1737 Epoch: 8 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:28,585-Speed 10978.56 samples/sec Loss 5.8993 LearningRate 0.1737 Epoch: 8 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:32,224-Speed 11258.26 samples/sec Loss 5.9679 LearningRate 0.1736 Epoch: 8 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:35,914-Speed 11101.28 samples/sec Loss 6.0231 LearningRate 0.1735 Epoch: 8 Global Step: 33980 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:15:40,007-Speed 10010.58 samples/sec Loss 5.9799 LearningRate 0.1734 Epoch: 8 Global Step: 33990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:15:43,632-Speed 11300.96 samples/sec Loss 5.9843 LearningRate 0.1734 Epoch: 8 Global Step: 34000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:15:47,314-Speed 11127.58 samples/sec Loss 5.9380 LearningRate 0.1733 Epoch: 8 Global Step: 34010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:15:51,036-Speed 11007.19 samples/sec Loss 5.9776 LearningRate 0.1732 Epoch: 8 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:54,607-Speed 11477.26 samples/sec Loss 6.0069 LearningRate 0.1732 Epoch: 8 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:15:58,026-Speed 11982.82 samples/sec Loss 5.9331 LearningRate 0.1731 Epoch: 8 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:01,535-Speed 11675.37 samples/sec Loss 5.9415 LearningRate 0.1730 Epoch: 8 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:05,119-Speed 11433.42 samples/sec Loss 6.0003 LearningRate 0.1730 Epoch: 8 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:08,778-Speed 11197.48 samples/sec Loss 5.9516 LearningRate 0.1729 Epoch: 8 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:12,334-Speed 11522.35 samples/sec Loss 5.9140 LearningRate 0.1728 Epoch: 8 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:15,924-Speed 11414.06 samples/sec Loss 5.9416 LearningRate 0.1727 Epoch: 8 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:20,681-Speed 8613.30 samples/sec Loss 6.0224 LearningRate 0.1727 Epoch: 8 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:24,523-Speed 10662.91 samples/sec Loss 5.9541 LearningRate 0.1726 Epoch: 8 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:28,199-Speed 11149.45 samples/sec Loss 6.0113 LearningRate 0.1725 Epoch: 8 Global Step: 34120 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:16:31,768-Speed 11479.35 samples/sec Loss 5.9593 LearningRate 0.1725 Epoch: 8 Global Step: 34130 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:16:35,309-Speed 11571.29 samples/sec Loss 6.0270 LearningRate 0.1724 Epoch: 8 Global Step: 34140 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:16:38,963-Speed 11211.15 samples/sec Loss 5.9982 LearningRate 0.1723 Epoch: 8 Global Step: 34150 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:16:42,632-Speed 11166.60 samples/sec Loss 6.0081 LearningRate 0.1723 Epoch: 8 Global Step: 34160 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:16:46,189-Speed 11517.72 samples/sec Loss 6.0832 LearningRate 0.1722 Epoch: 8 Global Step: 34170 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:16:49,799-Speed 11350.57 samples/sec Loss 6.0278 LearningRate 0.1721 Epoch: 8 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:53,380-Speed 11441.22 samples/sec Loss 5.9783 LearningRate 0.1720 Epoch: 8 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:16:57,436-Speed 10101.17 samples/sec Loss 6.0258 LearningRate 0.1720 Epoch: 8 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:01,032-Speed 11393.96 samples/sec Loss 6.0265 LearningRate 0.1719 Epoch: 8 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:04,898-Speed 10596.70 samples/sec Loss 6.0334 LearningRate 0.1718 Epoch: 8 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:08,554-Speed 11206.69 samples/sec Loss 6.0929 LearningRate 0.1718 Epoch: 8 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:12,241-Speed 11112.90 samples/sec Loss 6.0538 LearningRate 0.1717 Epoch: 8 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:16,098-Speed 10622.53 samples/sec Loss 6.1398 LearningRate 0.1716 Epoch: 8 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:19,646-Speed 11545.51 samples/sec Loss 6.0135 LearningRate 0.1716 Epoch: 8 Global Step: 34260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:23,100-Speed 11863.13 samples/sec Loss 6.0533 LearningRate 0.1715 Epoch: 8 Global Step: 34270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:17:26,704-Speed 11367.23 samples/sec Loss 6.1156 LearningRate 0.1714 Epoch: 8 Global Step: 34280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:30,114-Speed 12018.80 samples/sec Loss 6.0899 LearningRate 0.1713 Epoch: 8 Global Step: 34290 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:33,497-Speed 12110.04 samples/sec Loss 6.0332 LearningRate 0.1713 Epoch: 8 Global Step: 34300 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:37,659-Speed 9843.72 samples/sec Loss 6.0662 LearningRate 0.1712 Epoch: 8 Global Step: 34310 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:41,066-Speed 12026.75 samples/sec Loss 6.1062 LearningRate 0.1711 Epoch: 8 Global Step: 34320 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:44,410-Speed 12249.41 samples/sec Loss 6.0581 LearningRate 0.1711 Epoch: 8 Global Step: 34330 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:48,025-Speed 11336.15 samples/sec Loss 6.0971 LearningRate 0.1710 Epoch: 8 Global Step: 34340 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:51,934-Speed 10479.74 samples/sec Loss 6.0866 LearningRate 0.1709 Epoch: 8 Global Step: 34350 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:55,895-Speed 10343.91 samples/sec Loss 6.0137 LearningRate 0.1709 Epoch: 8 Global Step: 34360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:17:59,525-Speed 11285.91 samples/sec Loss 6.0882 LearningRate 0.1708 Epoch: 8 Global Step: 34370 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:03,224-Speed 11075.47 samples/sec Loss 6.0895 LearningRate 0.1707 Epoch: 8 Global Step: 34380 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:18:06,895-Speed 11158.56 samples/sec Loss 6.0541 LearningRate 0.1706 Epoch: 8 Global Step: 34390 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:18:10,753-Speed 10621.34 samples/sec Loss 6.0557 LearningRate 0.1706 Epoch: 8 Global Step: 34400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:14,730-Speed 10300.27 samples/sec Loss 6.0271 LearningRate 0.1705 Epoch: 8 Global Step: 34410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:18,270-Speed 11576.00 samples/sec Loss 6.0796 LearningRate 0.1704 Epoch: 8 Global Step: 34420 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:21,702-Speed 11938.25 samples/sec Loss 6.0740 LearningRate 0.1704 Epoch: 8 Global Step: 34430 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:26,067-Speed 9386.16 samples/sec Loss 6.0551 LearningRate 0.1703 Epoch: 8 Global Step: 34440 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:29,691-Speed 11303.79 samples/sec Loss 6.0693 LearningRate 0.1702 Epoch: 8 Global Step: 34450 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:33,394-Speed 11063.16 samples/sec Loss 6.1114 LearningRate 0.1702 Epoch: 8 Global Step: 34460 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:18:36,981-Speed 11421.22 samples/sec Loss 6.1097 LearningRate 0.1701 Epoch: 8 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:18:40,769-Speed 10816.66 samples/sec Loss 6.0992 LearningRate 0.1700 Epoch: 8 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:18:44,357-Speed 11417.68 samples/sec Loss 6.1391 LearningRate 0.1700 Epoch: 8 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:18:48,034-Speed 11142.99 samples/sec Loss 6.1250 LearningRate 0.1699 Epoch: 8 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:18:51,468-Speed 11933.79 samples/sec Loss 6.1210 LearningRate 0.1698 Epoch: 8 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:18:55,033-Speed 11493.48 samples/sec Loss 6.0561 LearningRate 0.1697 Epoch: 8 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:18:58,872-Speed 10669.55 samples/sec Loss 6.0611 LearningRate 0.1697 Epoch: 8 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:02,657-Speed 10826.39 samples/sec Loss 6.0504 LearningRate 0.1696 Epoch: 8 Global Step: 34540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:06,209-Speed 11533.52 samples/sec Loss 6.1411 LearningRate 0.1695 Epoch: 8 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:09,718-Speed 11678.68 samples/sec Loss 6.1170 LearningRate 0.1695 Epoch: 8 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:13,208-Speed 11739.68 samples/sec Loss 6.1137 LearningRate 0.1694 Epoch: 8 Global Step: 34570 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:19:16,889-Speed 11128.03 samples/sec Loss 6.1491 LearningRate 0.1693 Epoch: 8 Global Step: 34580 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:19:20,532-Speed 11248.95 samples/sec Loss 6.1214 LearningRate 0.1693 Epoch: 8 Global Step: 34590 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:19:24,613-Speed 10037.80 samples/sec Loss 6.1228 LearningRate 0.1692 Epoch: 8 Global Step: 34600 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:19:28,321-Speed 11049.12 samples/sec Loss 6.0546 LearningRate 0.1691 Epoch: 8 Global Step: 34610 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:19:32,186-Speed 10599.31 samples/sec Loss 6.0829 LearningRate 0.1691 Epoch: 8 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:35,671-Speed 11757.30 samples/sec Loss 6.1498 LearningRate 0.1690 Epoch: 8 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:39,159-Speed 11744.32 samples/sec Loss 6.1083 LearningRate 0.1689 Epoch: 8 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:42,673-Speed 11661.41 samples/sec Loss 6.0738 LearningRate 0.1688 Epoch: 8 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:46,221-Speed 11548.65 samples/sec Loss 6.0861 LearningRate 0.1688 Epoch: 8 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:49,817-Speed 11392.75 samples/sec Loss 6.1421 LearningRate 0.1687 Epoch: 8 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:53,513-Speed 11085.44 samples/sec Loss 6.1509 LearningRate 0.1686 Epoch: 8 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:19:57,213-Speed 11073.81 samples/sec Loss 6.0860 LearningRate 0.1686 Epoch: 8 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:00,719-Speed 11685.76 samples/sec Loss 6.1071 LearningRate 0.1685 Epoch: 8 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:04,162-Speed 11900.75 samples/sec Loss 6.1359 LearningRate 0.1684 Epoch: 8 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:07,874-Speed 11036.38 samples/sec Loss 6.0994 LearningRate 0.1684 Epoch: 8 Global Step: 34720 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:20:11,379-Speed 11691.30 samples/sec Loss 6.0727 LearningRate 0.1683 Epoch: 8 Global Step: 34730 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:20:15,317-Speed 10401.69 samples/sec Loss 6.1254 LearningRate 0.1682 Epoch: 8 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:18,938-Speed 11317.78 samples/sec Loss 6.0621 LearningRate 0.1682 Epoch: 8 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:23,328-Speed 9330.75 samples/sec Loss 6.0921 LearningRate 0.1681 Epoch: 8 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:26,765-Speed 11924.10 samples/sec Loss 6.1141 LearningRate 0.1680 Epoch: 8 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:30,272-Speed 11681.51 samples/sec Loss 6.1065 LearningRate 0.1679 Epoch: 8 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:33,857-Speed 11428.46 samples/sec Loss 6.1038 LearningRate 0.1679 Epoch: 8 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:37,510-Speed 11214.17 samples/sec Loss 6.0698 LearningRate 0.1678 Epoch: 8 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:41,117-Speed 11359.54 samples/sec Loss 6.1086 LearningRate 0.1677 Epoch: 8 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:44,758-Speed 11252.36 samples/sec Loss 6.1647 LearningRate 0.1677 Epoch: 8 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:48,341-Speed 11436.80 samples/sec Loss 6.1067 LearningRate 0.1676 Epoch: 8 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:51,969-Speed 11292.11 samples/sec Loss 6.0993 LearningRate 0.1675 Epoch: 8 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:55,409-Speed 11910.09 samples/sec Loss 6.1020 LearningRate 0.1675 Epoch: 8 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:20:58,923-Speed 11658.13 samples/sec Loss 6.1301 LearningRate 0.1674 Epoch: 8 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:02,641-Speed 11018.25 samples/sec Loss 6.1775 LearningRate 0.1673 Epoch: 8 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:06,091-Speed 11875.24 samples/sec Loss 6.1303 LearningRate 0.1673 Epoch: 8 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:09,788-Speed 11083.45 samples/sec Loss 6.2100 LearningRate 0.1672 Epoch: 8 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:13,276-Speed 11746.38 samples/sec Loss 6.1300 LearningRate 0.1671 Epoch: 8 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:17,014-Speed 10958.41 samples/sec Loss 6.1070 LearningRate 0.1671 Epoch: 8 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:21,563-Speed 9007.39 samples/sec Loss 6.1189 LearningRate 0.1670 Epoch: 8 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:25,360-Speed 10789.63 samples/sec Loss 6.1198 LearningRate 0.1669 Epoch: 8 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:21:28,811-Speed 11874.58 samples/sec Loss 6.1426 LearningRate 0.1668 Epoch: 8 Global Step: 34940 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:21:32,272-Speed 11835.19 samples/sec Loss 6.1436 LearningRate 0.1668 Epoch: 8 Global Step: 34950 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:21:35,894-Speed 11311.98 samples/sec Loss 6.1433 LearningRate 0.1667 Epoch: 8 Global Step: 34960 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:21:39,560-Speed 11177.51 samples/sec Loss 6.1244 LearningRate 0.1666 Epoch: 8 Global Step: 34970 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:21:43,080-Speed 11642.36 samples/sec Loss 6.1550 LearningRate 0.1666 Epoch: 8 Global Step: 34980 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:21:46,874-Speed 10796.17 samples/sec Loss 6.1573 LearningRate 0.1665 Epoch: 8 Global Step: 34990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:21:50,756-Speed 10553.86 samples/sec Loss 6.1772 LearningRate 0.1664 Epoch: 8 Global Step: 35000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:22:11,870-[lfw][35000]XNorm: 10.840091 Training: 2022-01-17 01:22:11,870-[lfw][35000]Accuracy-Flip: 0.99650+-0.00337 Training: 2022-01-17 01:22:11,871-[lfw][35000]Accuracy-Highest: 0.99650 Training: 2022-01-17 01:22:36,285-[cfp_fp][35000]XNorm: 9.077263 Training: 2022-01-17 01:22:36,286-[cfp_fp][35000]Accuracy-Flip: 0.96600+-0.01093 Training: 2022-01-17 01:22:36,286-[cfp_fp][35000]Accuracy-Highest: 0.96600 Training: 2022-01-17 01:22:57,313-[agedb_30][35000]XNorm: 10.446590 Training: 2022-01-17 01:22:57,313-[agedb_30][35000]Accuracy-Flip: 0.96433+-0.00837 Training: 2022-01-17 01:22:57,314-[agedb_30][35000]Accuracy-Highest: 0.96433 Training: 2022-01-17 01:23:00,703-Speed 585.59 samples/sec Loss 6.1633 LearningRate 0.1664 Epoch: 8 Global Step: 35010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:23:04,065-Speed 12189.96 samples/sec Loss 6.1864 LearningRate 0.1663 Epoch: 8 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:07,396-Speed 12297.72 samples/sec Loss 6.1461 LearningRate 0.1662 Epoch: 8 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:10,760-Speed 12178.04 samples/sec Loss 6.1886 LearningRate 0.1662 Epoch: 8 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:14,286-Speed 11619.86 samples/sec Loss 6.1849 LearningRate 0.1661 Epoch: 8 Global Step: 35050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:17,849-Speed 11498.63 samples/sec Loss 6.1828 LearningRate 0.1660 Epoch: 8 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:21,424-Speed 11460.29 samples/sec Loss 6.1783 LearningRate 0.1660 Epoch: 8 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:25,054-Speed 11289.19 samples/sec Loss 6.1941 LearningRate 0.1659 Epoch: 8 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:29,497-Speed 9220.44 samples/sec Loss 6.1703 LearningRate 0.1658 Epoch: 8 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:33,199-Speed 11068.21 samples/sec Loss 6.1251 LearningRate 0.1657 Epoch: 8 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:36,681-Speed 11765.21 samples/sec Loss 6.2139 LearningRate 0.1657 Epoch: 8 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:40,121-Speed 11909.57 samples/sec Loss 6.1929 LearningRate 0.1656 Epoch: 8 Global Step: 35120 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:23:43,536-Speed 11996.56 samples/sec Loss 6.2052 LearningRate 0.1655 Epoch: 8 Global Step: 35130 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:23:46,936-Speed 12052.24 samples/sec Loss 6.1511 LearningRate 0.1655 Epoch: 8 Global Step: 35140 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:23:51,657-Speed 8678.34 samples/sec Loss 6.1712 LearningRate 0.1654 Epoch: 8 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:55,587-Speed 10424.34 samples/sec Loss 6.1618 LearningRate 0.1653 Epoch: 8 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:23:59,334-Speed 10934.59 samples/sec Loss 6.1376 LearningRate 0.1653 Epoch: 8 Global Step: 35170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:02,824-Speed 11741.35 samples/sec Loss 6.1452 LearningRate 0.1652 Epoch: 8 Global Step: 35180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:06,441-Speed 11327.04 samples/sec Loss 6.1518 LearningRate 0.1651 Epoch: 8 Global Step: 35190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:09,948-Speed 11683.78 samples/sec Loss 6.1262 LearningRate 0.1651 Epoch: 8 Global Step: 35200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:13,773-Speed 10710.00 samples/sec Loss 6.1393 LearningRate 0.1650 Epoch: 8 Global Step: 35210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:17,434-Speed 11192.40 samples/sec Loss 6.1606 LearningRate 0.1649 Epoch: 8 Global Step: 35220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:21,325-Speed 10528.12 samples/sec Loss 6.0922 LearningRate 0.1649 Epoch: 8 Global Step: 35230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:25,280-Speed 10360.16 samples/sec Loss 6.0901 LearningRate 0.1648 Epoch: 8 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:24:28,705-Speed 11960.02 samples/sec Loss 6.1235 LearningRate 0.1647 Epoch: 8 Global Step: 35250 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:32,357-Speed 11221.43 samples/sec Loss 6.1211 LearningRate 0.1646 Epoch: 8 Global Step: 35260 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:35,935-Speed 11448.55 samples/sec Loss 6.0801 LearningRate 0.1646 Epoch: 8 Global Step: 35270 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:39,448-Speed 11661.31 samples/sec Loss 6.1775 LearningRate 0.1645 Epoch: 8 Global Step: 35280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:43,618-Speed 9824.77 samples/sec Loss 6.1832 LearningRate 0.1644 Epoch: 8 Global Step: 35290 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:47,101-Speed 11765.09 samples/sec Loss 6.1119 LearningRate 0.1644 Epoch: 8 Global Step: 35300 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:50,529-Speed 11952.60 samples/sec Loss 6.1484 LearningRate 0.1643 Epoch: 8 Global Step: 35310 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:54,818-Speed 9551.57 samples/sec Loss 6.1890 LearningRate 0.1642 Epoch: 8 Global Step: 35320 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:24:58,450-Speed 11281.93 samples/sec Loss 6.1515 LearningRate 0.1642 Epoch: 8 Global Step: 35330 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:25:02,336-Speed 10541.65 samples/sec Loss 6.1495 LearningRate 0.1641 Epoch: 8 Global Step: 35340 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:25:05,950-Speed 11338.78 samples/sec Loss 6.0897 LearningRate 0.1640 Epoch: 8 Global Step: 35350 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:25:09,356-Speed 12026.94 samples/sec Loss 6.0802 LearningRate 0.1640 Epoch: 8 Global Step: 35360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:25:13,236-Speed 10558.70 samples/sec Loss 6.1286 LearningRate 0.1639 Epoch: 8 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:17,192-Speed 10357.23 samples/sec Loss 6.1396 LearningRate 0.1638 Epoch: 8 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:20,619-Speed 11954.84 samples/sec Loss 6.2525 LearningRate 0.1638 Epoch: 8 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:24,060-Speed 11910.47 samples/sec Loss 6.2162 LearningRate 0.1637 Epoch: 8 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:27,806-Speed 10934.87 samples/sec Loss 6.1160 LearningRate 0.1636 Epoch: 8 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:31,794-Speed 10273.11 samples/sec Loss 6.0947 LearningRate 0.1636 Epoch: 8 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:35,254-Speed 11843.39 samples/sec Loss 6.0980 LearningRate 0.1635 Epoch: 8 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:38,991-Speed 10961.27 samples/sec Loss 6.1848 LearningRate 0.1634 Epoch: 8 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:42,401-Speed 12016.60 samples/sec Loss 6.1809 LearningRate 0.1634 Epoch: 8 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:45,842-Speed 11909.98 samples/sec Loss 6.1535 LearningRate 0.1633 Epoch: 8 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:25:49,457-Speed 11332.06 samples/sec Loss 6.1561 LearningRate 0.1632 Epoch: 8 Global Step: 35470 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:25:53,011-Speed 11527.87 samples/sec Loss 6.1482 LearningRate 0.1631 Epoch: 8 Global Step: 35480 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:25:57,475-Speed 9178.24 samples/sec Loss 6.1328 LearningRate 0.1631 Epoch: 8 Global Step: 35490 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:26:00,990-Speed 11653.84 samples/sec Loss 6.1796 LearningRate 0.1630 Epoch: 8 Global Step: 35500 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:26:04,510-Speed 11640.35 samples/sec Loss 6.1816 LearningRate 0.1629 Epoch: 8 Global Step: 35510 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:26:07,999-Speed 11744.16 samples/sec Loss 6.1888 LearningRate 0.1629 Epoch: 8 Global Step: 35520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:11,606-Speed 11358.36 samples/sec Loss 6.1991 LearningRate 0.1628 Epoch: 8 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:15,165-Speed 11510.26 samples/sec Loss 6.1657 LearningRate 0.1627 Epoch: 8 Global Step: 35540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:18,854-Speed 11106.82 samples/sec Loss 6.1418 LearningRate 0.1627 Epoch: 8 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:22,584-Speed 10983.84 samples/sec Loss 6.1425 LearningRate 0.1626 Epoch: 8 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:26,221-Speed 11265.90 samples/sec Loss 6.1641 LearningRate 0.1625 Epoch: 8 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:29,724-Speed 11694.25 samples/sec Loss 6.1580 LearningRate 0.1625 Epoch: 8 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:33,391-Speed 11173.73 samples/sec Loss 6.1143 LearningRate 0.1624 Epoch: 8 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:37,128-Speed 10964.70 samples/sec Loss 6.1143 LearningRate 0.1623 Epoch: 8 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:41,204-Speed 10050.38 samples/sec Loss 6.1484 LearningRate 0.1623 Epoch: 8 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:26:45,042-Speed 10675.62 samples/sec Loss 6.2206 LearningRate 0.1622 Epoch: 8 Global Step: 35620 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:26:48,572-Speed 11606.49 samples/sec Loss 6.1492 LearningRate 0.1621 Epoch: 8 Global Step: 35630 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:26:52,314-Speed 10946.97 samples/sec Loss 6.1319 LearningRate 0.1621 Epoch: 8 Global Step: 35640 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:26:55,957-Speed 11248.60 samples/sec Loss 6.1126 LearningRate 0.1620 Epoch: 8 Global Step: 35650 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:26:59,756-Speed 10785.21 samples/sec Loss 6.1672 LearningRate 0.1619 Epoch: 8 Global Step: 35660 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:27:03,444-Speed 11110.03 samples/sec Loss 6.1764 LearningRate 0.1619 Epoch: 8 Global Step: 35670 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:27:07,026-Speed 11434.47 samples/sec Loss 6.1345 LearningRate 0.1618 Epoch: 8 Global Step: 35680 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:27:10,541-Speed 11658.39 samples/sec Loss 6.1233 LearningRate 0.1617 Epoch: 8 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:14,144-Speed 11370.66 samples/sec Loss 6.1042 LearningRate 0.1617 Epoch: 8 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:17,726-Speed 11437.91 samples/sec Loss 6.1471 LearningRate 0.1616 Epoch: 8 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:21,300-Speed 11463.07 samples/sec Loss 6.1286 LearningRate 0.1615 Epoch: 8 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:24,920-Speed 11318.62 samples/sec Loss 6.1558 LearningRate 0.1615 Epoch: 8 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:28,682-Speed 10890.60 samples/sec Loss 6.1454 LearningRate 0.1614 Epoch: 8 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:32,264-Speed 11436.23 samples/sec Loss 6.1569 LearningRate 0.1613 Epoch: 8 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:36,042-Speed 10847.82 samples/sec Loss 6.1655 LearningRate 0.1612 Epoch: 8 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:39,537-Speed 11721.65 samples/sec Loss 6.1736 LearningRate 0.1612 Epoch: 8 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:43,187-Speed 11222.35 samples/sec Loss 6.1598 LearningRate 0.1611 Epoch: 8 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:27:46,867-Speed 11135.12 samples/sec Loss 6.1104 LearningRate 0.1610 Epoch: 8 Global Step: 35790 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:27:50,561-Speed 11090.10 samples/sec Loss 6.2105 LearningRate 0.1610 Epoch: 8 Global Step: 35800 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:27:54,295-Speed 10971.61 samples/sec Loss 6.1790 LearningRate 0.1609 Epoch: 8 Global Step: 35810 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:27:57,724-Speed 11948.02 samples/sec Loss 6.1488 LearningRate 0.1608 Epoch: 8 Global Step: 35820 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:28:01,775-Speed 10114.34 samples/sec Loss 6.1829 LearningRate 0.1608 Epoch: 8 Global Step: 35830 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:28:05,289-Speed 11658.17 samples/sec Loss 6.1346 LearningRate 0.1607 Epoch: 8 Global Step: 35840 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:28:08,915-Speed 11298.11 samples/sec Loss 6.1197 LearningRate 0.1606 Epoch: 8 Global Step: 35850 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:28:12,358-Speed 11901.60 samples/sec Loss 6.1989 LearningRate 0.1606 Epoch: 8 Global Step: 35860 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:28:15,925-Speed 11488.46 samples/sec Loss 6.1615 LearningRate 0.1605 Epoch: 8 Global Step: 35870 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:28:19,570-Speed 11237.75 samples/sec Loss 6.1457 LearningRate 0.1604 Epoch: 8 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:23,017-Speed 11885.09 samples/sec Loss 6.1450 LearningRate 0.1604 Epoch: 8 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:26,782-Speed 10884.41 samples/sec Loss 6.1597 LearningRate 0.1603 Epoch: 8 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:30,360-Speed 11449.11 samples/sec Loss 6.1525 LearningRate 0.1602 Epoch: 8 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:33,952-Speed 11407.25 samples/sec Loss 6.1668 LearningRate 0.1602 Epoch: 8 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:37,549-Speed 11390.50 samples/sec Loss 6.1609 LearningRate 0.1601 Epoch: 8 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:41,022-Speed 11793.60 samples/sec Loss 6.1730 LearningRate 0.1600 Epoch: 8 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:44,883-Speed 10610.70 samples/sec Loss 6.1098 LearningRate 0.1600 Epoch: 8 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:48,544-Speed 11195.04 samples/sec Loss 6.1290 LearningRate 0.1599 Epoch: 8 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:52,201-Speed 11201.57 samples/sec Loss 6.1130 LearningRate 0.1598 Epoch: 8 Global Step: 35970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:28:55,661-Speed 11840.71 samples/sec Loss 6.1287 LearningRate 0.1598 Epoch: 8 Global Step: 35980 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:28:59,176-Speed 11655.54 samples/sec Loss 6.1612 LearningRate 0.1597 Epoch: 8 Global Step: 35990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:02,860-Speed 11121.08 samples/sec Loss 6.1866 LearningRate 0.1596 Epoch: 8 Global Step: 36000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:06,333-Speed 11799.91 samples/sec Loss 6.1293 LearningRate 0.1596 Epoch: 8 Global Step: 36010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:10,000-Speed 11170.67 samples/sec Loss 6.1467 LearningRate 0.1595 Epoch: 8 Global Step: 36020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:13,515-Speed 11655.47 samples/sec Loss 6.1739 LearningRate 0.1594 Epoch: 8 Global Step: 36030 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:17,324-Speed 10755.60 samples/sec Loss 6.1768 LearningRate 0.1594 Epoch: 8 Global Step: 36040 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:20,883-Speed 11512.17 samples/sec Loss 6.1367 LearningRate 0.1593 Epoch: 8 Global Step: 36050 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:25,331-Speed 9212.32 samples/sec Loss 6.1558 LearningRate 0.1592 Epoch: 8 Global Step: 36060 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:28,790-Speed 11861.65 samples/sec Loss 6.1010 LearningRate 0.1592 Epoch: 8 Global Step: 36070 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:32,154-Speed 12178.94 samples/sec Loss 6.1808 LearningRate 0.1591 Epoch: 8 Global Step: 36080 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:35,791-Speed 11265.63 samples/sec Loss 6.1376 LearningRate 0.1590 Epoch: 8 Global Step: 36090 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:39,436-Speed 11240.89 samples/sec Loss 6.1554 LearningRate 0.1590 Epoch: 8 Global Step: 36100 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:43,025-Speed 11413.45 samples/sec Loss 6.1218 LearningRate 0.1589 Epoch: 8 Global Step: 36110 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:46,970-Speed 10387.59 samples/sec Loss 6.1762 LearningRate 0.1588 Epoch: 8 Global Step: 36120 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:50,414-Speed 11894.38 samples/sec Loss 6.1547 LearningRate 0.1588 Epoch: 8 Global Step: 36130 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:53,803-Speed 12091.25 samples/sec Loss 6.1587 LearningRate 0.1587 Epoch: 8 Global Step: 36140 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:29:57,250-Speed 11884.10 samples/sec Loss 6.1027 LearningRate 0.1586 Epoch: 8 Global Step: 36150 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:30:00,793-Speed 11563.76 samples/sec Loss 6.1307 LearningRate 0.1586 Epoch: 8 Global Step: 36160 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:30:04,467-Speed 11151.80 samples/sec Loss 6.1468 LearningRate 0.1585 Epoch: 8 Global Step: 36170 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:30:08,381-Speed 10468.48 samples/sec Loss 6.0913 LearningRate 0.1584 Epoch: 8 Global Step: 36180 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:30:12,251-Speed 10585.57 samples/sec Loss 6.1799 LearningRate 0.1584 Epoch: 8 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:15,760-Speed 11675.60 samples/sec Loss 6.1019 LearningRate 0.1583 Epoch: 8 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:19,173-Speed 12001.90 samples/sec Loss 6.1420 LearningRate 0.1582 Epoch: 8 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:22,682-Speed 11677.78 samples/sec Loss 6.1821 LearningRate 0.1582 Epoch: 8 Global Step: 36220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:26,821-Speed 9898.24 samples/sec Loss 6.1393 LearningRate 0.1581 Epoch: 8 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:30,652-Speed 10694.32 samples/sec Loss 6.1525 LearningRate 0.1580 Epoch: 8 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:34,267-Speed 11335.30 samples/sec Loss 6.0837 LearningRate 0.1580 Epoch: 8 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:37,753-Speed 11752.60 samples/sec Loss 6.1249 LearningRate 0.1579 Epoch: 8 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:41,793-Speed 10141.12 samples/sec Loss 6.1235 LearningRate 0.1578 Epoch: 8 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:45,265-Speed 11799.47 samples/sec Loss 6.2015 LearningRate 0.1578 Epoch: 8 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:48,743-Speed 11781.53 samples/sec Loss 6.0741 LearningRate 0.1577 Epoch: 8 Global Step: 36290 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:30:52,238-Speed 11721.40 samples/sec Loss 6.1443 LearningRate 0.1576 Epoch: 8 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:55,868-Speed 11288.62 samples/sec Loss 6.1466 LearningRate 0.1576 Epoch: 8 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:30:59,563-Speed 11087.34 samples/sec Loss 6.1574 LearningRate 0.1575 Epoch: 8 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:02,999-Speed 11925.53 samples/sec Loss 6.1242 LearningRate 0.1574 Epoch: 8 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:06,494-Speed 11723.33 samples/sec Loss 6.1045 LearningRate 0.1574 Epoch: 8 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:10,019-Speed 11621.07 samples/sec Loss 6.0843 LearningRate 0.1573 Epoch: 8 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:13,753-Speed 10972.56 samples/sec Loss 6.1032 LearningRate 0.1572 Epoch: 8 Global Step: 36360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:17,828-Speed 10053.95 samples/sec Loss 6.0907 LearningRate 0.1572 Epoch: 8 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:21,624-Speed 10794.07 samples/sec Loss 6.1649 LearningRate 0.1571 Epoch: 8 Global Step: 36380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:25,153-Speed 11609.97 samples/sec Loss 6.1449 LearningRate 0.1570 Epoch: 8 Global Step: 36390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:29,864-Speed 8696.62 samples/sec Loss 6.1077 LearningRate 0.1569 Epoch: 8 Global Step: 36400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:31:33,664-Speed 10779.06 samples/sec Loss 6.1516 LearningRate 0.1569 Epoch: 8 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:37,218-Speed 11530.50 samples/sec Loss 6.1553 LearningRate 0.1568 Epoch: 8 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:40,643-Speed 11963.84 samples/sec Loss 6.0706 LearningRate 0.1567 Epoch: 8 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:44,694-Speed 10112.43 samples/sec Loss 6.1004 LearningRate 0.1567 Epoch: 8 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:48,249-Speed 11522.60 samples/sec Loss 6.1400 LearningRate 0.1566 Epoch: 8 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:51,953-Speed 11064.00 samples/sec Loss 6.0869 LearningRate 0.1565 Epoch: 8 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:55,463-Speed 11671.99 samples/sec Loss 6.1618 LearningRate 0.1565 Epoch: 8 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:31:59,289-Speed 10710.89 samples/sec Loss 6.1264 LearningRate 0.1564 Epoch: 8 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:03,275-Speed 10276.11 samples/sec Loss 6.1543 LearningRate 0.1563 Epoch: 8 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:06,985-Speed 11045.79 samples/sec Loss 6.1681 LearningRate 0.1563 Epoch: 8 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:10,393-Speed 12020.78 samples/sec Loss 6.1727 LearningRate 0.1562 Epoch: 8 Global Step: 36510 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:32:13,819-Speed 11959.95 samples/sec Loss 6.1230 LearningRate 0.1562 Epoch: 8 Global Step: 36520 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:32:17,239-Speed 11978.52 samples/sec Loss 6.1533 LearningRate 0.1561 Epoch: 8 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:20,883-Speed 11244.69 samples/sec Loss 6.1627 LearningRate 0.1560 Epoch: 8 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:24,394-Speed 11669.04 samples/sec Loss 6.1596 LearningRate 0.1560 Epoch: 8 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:28,170-Speed 10848.51 samples/sec Loss 6.1581 LearningRate 0.1559 Epoch: 8 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:32,127-Speed 10353.72 samples/sec Loss 6.1795 LearningRate 0.1558 Epoch: 8 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:35,636-Speed 11677.55 samples/sec Loss 6.1183 LearningRate 0.1558 Epoch: 8 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:39,281-Speed 11239.63 samples/sec Loss 6.0545 LearningRate 0.1557 Epoch: 8 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:42,857-Speed 11459.08 samples/sec Loss 6.1878 LearningRate 0.1556 Epoch: 8 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:46,822-Speed 10330.89 samples/sec Loss 6.1444 LearningRate 0.1556 Epoch: 8 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:50,343-Speed 11636.24 samples/sec Loss 6.1066 LearningRate 0.1555 Epoch: 8 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:32:53,918-Speed 11461.89 samples/sec Loss 6.1810 LearningRate 0.1554 Epoch: 8 Global Step: 36630 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:32:57,544-Speed 11297.95 samples/sec Loss 6.1595 LearningRate 0.1554 Epoch: 8 Global Step: 36640 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:01,006-Speed 11834.48 samples/sec Loss 6.0797 LearningRate 0.1553 Epoch: 8 Global Step: 36650 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:05,438-Speed 9244.89 samples/sec Loss 6.1980 LearningRate 0.1552 Epoch: 8 Global Step: 36660 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:08,970-Speed 11599.93 samples/sec Loss 6.1630 LearningRate 0.1552 Epoch: 8 Global Step: 36670 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:12,517-Speed 11550.42 samples/sec Loss 6.1656 LearningRate 0.1551 Epoch: 8 Global Step: 36680 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:16,155-Speed 11264.35 samples/sec Loss 6.0889 LearningRate 0.1550 Epoch: 8 Global Step: 36690 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:19,848-Speed 11093.88 samples/sec Loss 6.0885 LearningRate 0.1550 Epoch: 8 Global Step: 36700 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:23,498-Speed 11221.41 samples/sec Loss 6.0584 LearningRate 0.1549 Epoch: 8 Global Step: 36710 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:33:27,487-Speed 10271.32 samples/sec Loss 6.1108 LearningRate 0.1548 Epoch: 8 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:31,027-Speed 11574.85 samples/sec Loss 6.1447 LearningRate 0.1548 Epoch: 8 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:34,475-Speed 11883.89 samples/sec Loss 6.0825 LearningRate 0.1547 Epoch: 8 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:38,069-Speed 11398.95 samples/sec Loss 6.1515 LearningRate 0.1546 Epoch: 8 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:41,574-Speed 11690.22 samples/sec Loss 6.1331 LearningRate 0.1546 Epoch: 8 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:45,041-Speed 11816.54 samples/sec Loss 6.1391 LearningRate 0.1545 Epoch: 8 Global Step: 36770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:48,535-Speed 11726.79 samples/sec Loss 6.0757 LearningRate 0.1544 Epoch: 8 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:52,211-Speed 11146.16 samples/sec Loss 6.0866 LearningRate 0.1544 Epoch: 8 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:55,831-Speed 11317.85 samples/sec Loss 6.1215 LearningRate 0.1543 Epoch: 8 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:33:59,559-Speed 10989.28 samples/sec Loss 6.1465 LearningRate 0.1542 Epoch: 8 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:03,382-Speed 10717.01 samples/sec Loss 6.1421 LearningRate 0.1542 Epoch: 8 Global Step: 36820 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:34:07,738-Speed 9404.27 samples/sec Loss 6.1469 LearningRate 0.1541 Epoch: 8 Global Step: 36830 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:34:11,432-Speed 11091.58 samples/sec Loss 6.0810 LearningRate 0.1540 Epoch: 8 Global Step: 36840 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:34:15,052-Speed 11318.62 samples/sec Loss 6.1496 LearningRate 0.1540 Epoch: 8 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:18,493-Speed 11906.88 samples/sec Loss 6.0929 LearningRate 0.1539 Epoch: 8 Global Step: 36860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:21,935-Speed 11901.68 samples/sec Loss 6.1017 LearningRate 0.1538 Epoch: 8 Global Step: 36870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:25,753-Speed 10730.01 samples/sec Loss 6.1312 LearningRate 0.1538 Epoch: 8 Global Step: 36880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:29,330-Speed 11453.41 samples/sec Loss 6.1298 LearningRate 0.1537 Epoch: 8 Global Step: 36890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:33,021-Speed 11100.02 samples/sec Loss 6.1230 LearningRate 0.1536 Epoch: 8 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:36,498-Speed 11791.06 samples/sec Loss 6.0946 LearningRate 0.1536 Epoch: 8 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:39,995-Speed 11713.49 samples/sec Loss 6.0944 LearningRate 0.1535 Epoch: 8 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:43,642-Speed 11235.33 samples/sec Loss 6.0917 LearningRate 0.1534 Epoch: 8 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:47,331-Speed 11103.85 samples/sec Loss 6.1316 LearningRate 0.1534 Epoch: 8 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:34:51,127-Speed 10791.85 samples/sec Loss 6.0744 LearningRate 0.1533 Epoch: 8 Global Step: 36950 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:34:54,643-Speed 11653.81 samples/sec Loss 6.1051 LearningRate 0.1532 Epoch: 8 Global Step: 36960 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:34:58,067-Speed 11967.08 samples/sec Loss 6.0662 LearningRate 0.1532 Epoch: 8 Global Step: 36970 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:35:02,051-Speed 10283.51 samples/sec Loss 6.0628 LearningRate 0.1531 Epoch: 8 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:05,988-Speed 10405.07 samples/sec Loss 6.0596 LearningRate 0.1530 Epoch: 8 Global Step: 36990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:10,556-Speed 8969.70 samples/sec Loss 6.0799 LearningRate 0.1530 Epoch: 8 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:14,187-Speed 11284.58 samples/sec Loss 6.0483 LearningRate 0.1529 Epoch: 8 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:18,106-Speed 10454.17 samples/sec Loss 6.1174 LearningRate 0.1528 Epoch: 8 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:21,680-Speed 11462.69 samples/sec Loss 6.0727 LearningRate 0.1528 Epoch: 8 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:25,133-Speed 11864.60 samples/sec Loss 6.1032 LearningRate 0.1527 Epoch: 8 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:28,530-Speed 12061.06 samples/sec Loss 6.0768 LearningRate 0.1526 Epoch: 8 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:31,995-Speed 11825.41 samples/sec Loss 6.1301 LearningRate 0.1526 Epoch: 8 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:35,542-Speed 11551.73 samples/sec Loss 6.1000 LearningRate 0.1525 Epoch: 8 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:35:39,054-Speed 11665.79 samples/sec Loss 6.1025 LearningRate 0.1524 Epoch: 8 Global Step: 37080 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:35:42,691-Speed 11265.58 samples/sec Loss 6.0471 LearningRate 0.1524 Epoch: 8 Global Step: 37090 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:35:46,304-Speed 11338.80 samples/sec Loss 6.0677 LearningRate 0.1523 Epoch: 8 Global Step: 37100 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:35:49,830-Speed 11620.66 samples/sec Loss 6.1051 LearningRate 0.1522 Epoch: 8 Global Step: 37110 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:35:53,287-Speed 11851.86 samples/sec Loss 6.1038 LearningRate 0.1522 Epoch: 8 Global Step: 37120 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:35:57,271-Speed 10282.31 samples/sec Loss 6.1269 LearningRate 0.1521 Epoch: 8 Global Step: 37130 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:01,040-Speed 10871.54 samples/sec Loss 6.1098 LearningRate 0.1521 Epoch: 8 Global Step: 37140 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:04,704-Speed 11182.07 samples/sec Loss 6.0977 LearningRate 0.1520 Epoch: 8 Global Step: 37150 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:08,097-Speed 12073.08 samples/sec Loss 6.1218 LearningRate 0.1519 Epoch: 8 Global Step: 37160 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:11,504-Speed 12030.49 samples/sec Loss 6.1152 LearningRate 0.1519 Epoch: 8 Global Step: 37170 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:15,246-Speed 10947.33 samples/sec Loss 6.0415 LearningRate 0.1518 Epoch: 8 Global Step: 37180 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:36:18,933-Speed 11111.79 samples/sec Loss 6.0835 LearningRate 0.1517 Epoch: 8 Global Step: 37190 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:22,465-Speed 11598.37 samples/sec Loss 6.0849 LearningRate 0.1517 Epoch: 8 Global Step: 37200 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:26,122-Speed 11206.86 samples/sec Loss 6.1380 LearningRate 0.1516 Epoch: 8 Global Step: 37210 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:29,660-Speed 11578.93 samples/sec Loss 6.0943 LearningRate 0.1515 Epoch: 8 Global Step: 37220 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:33,378-Speed 11018.19 samples/sec Loss 6.0692 LearningRate 0.1515 Epoch: 8 Global Step: 37230 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:36,953-Speed 11459.78 samples/sec Loss 6.0168 LearningRate 0.1514 Epoch: 8 Global Step: 37240 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:40,637-Speed 11120.61 samples/sec Loss 6.0396 LearningRate 0.1513 Epoch: 8 Global Step: 37250 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:44,432-Speed 10796.31 samples/sec Loss 6.0906 LearningRate 0.1513 Epoch: 8 Global Step: 37260 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:48,399-Speed 10329.54 samples/sec Loss 6.0937 LearningRate 0.1512 Epoch: 8 Global Step: 37270 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:51,860-Speed 11834.30 samples/sec Loss 6.1208 LearningRate 0.1511 Epoch: 8 Global Step: 37280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:36:55,443-Speed 11434.67 samples/sec Loss 6.0984 LearningRate 0.1511 Epoch: 8 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:36:59,101-Speed 11203.63 samples/sec Loss 6.0754 LearningRate 0.1510 Epoch: 8 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:02,670-Speed 11479.36 samples/sec Loss 6.0094 LearningRate 0.1509 Epoch: 8 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:06,764-Speed 10008.38 samples/sec Loss 6.0839 LearningRate 0.1509 Epoch: 8 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:10,459-Speed 11086.58 samples/sec Loss 6.0880 LearningRate 0.1508 Epoch: 8 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:14,166-Speed 11052.71 samples/sec Loss 6.0925 LearningRate 0.1507 Epoch: 8 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:18,542-Speed 9362.42 samples/sec Loss 6.0707 LearningRate 0.1507 Epoch: 8 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:22,035-Speed 11727.09 samples/sec Loss 6.0692 LearningRate 0.1506 Epoch: 8 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:25,646-Speed 11346.69 samples/sec Loss 6.0196 LearningRate 0.1505 Epoch: 8 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:29,176-Speed 11607.08 samples/sec Loss 6.0637 LearningRate 0.1505 Epoch: 8 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:37:32,761-Speed 11430.80 samples/sec Loss 5.9976 LearningRate 0.1504 Epoch: 8 Global Step: 37390 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:37:36,315-Speed 11527.89 samples/sec Loss 6.0603 LearningRate 0.1503 Epoch: 8 Global Step: 37400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:37:39,888-Speed 11464.80 samples/sec Loss 6.0868 LearningRate 0.1503 Epoch: 8 Global Step: 37410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:37:43,613-Speed 11000.68 samples/sec Loss 6.0725 LearningRate 0.1502 Epoch: 8 Global Step: 37420 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:37:47,137-Speed 11624.82 samples/sec Loss 6.0853 LearningRate 0.1502 Epoch: 8 Global Step: 37430 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:37:50,925-Speed 10814.89 samples/sec Loss 6.0480 LearningRate 0.1501 Epoch: 8 Global Step: 37440 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:37:54,745-Speed 10724.37 samples/sec Loss 6.0788 LearningRate 0.1500 Epoch: 8 Global Step: 37450 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:37:58,367-Speed 11314.04 samples/sec Loss 6.1052 LearningRate 0.1500 Epoch: 8 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:01,811-Speed 11895.04 samples/sec Loss 6.0593 LearningRate 0.1499 Epoch: 8 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:05,203-Speed 12080.14 samples/sec Loss 6.0522 LearningRate 0.1498 Epoch: 8 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:08,637-Speed 11929.95 samples/sec Loss 6.0740 LearningRate 0.1498 Epoch: 8 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:11,974-Speed 12277.54 samples/sec Loss 6.0367 LearningRate 0.1497 Epoch: 8 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:15,462-Speed 11745.45 samples/sec Loss 6.0609 LearningRate 0.1496 Epoch: 8 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:19,486-Speed 10181.79 samples/sec Loss 6.1331 LearningRate 0.1496 Epoch: 8 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:22,993-Speed 11683.48 samples/sec Loss 6.0137 LearningRate 0.1495 Epoch: 8 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:38:27,263-Speed 9595.32 samples/sec Loss 6.0791 LearningRate 0.1494 Epoch: 8 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:39:05,764-Speed 1063.90 samples/sec Loss 5.9187 LearningRate 0.1494 Epoch: 9 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:39:10,555-Speed 8551.05 samples/sec Loss 5.2812 LearningRate 0.1493 Epoch: 9 Global Step: 37560 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:14,386-Speed 10694.28 samples/sec Loss 5.3005 LearningRate 0.1492 Epoch: 9 Global Step: 37570 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:18,839-Speed 9199.42 samples/sec Loss 5.2592 LearningRate 0.1492 Epoch: 9 Global Step: 37580 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:22,586-Speed 10934.00 samples/sec Loss 5.2420 LearningRate 0.1491 Epoch: 9 Global Step: 37590 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:26,638-Speed 10112.68 samples/sec Loss 5.2506 LearningRate 0.1490 Epoch: 9 Global Step: 37600 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:30,341-Speed 11064.65 samples/sec Loss 5.3633 LearningRate 0.1490 Epoch: 9 Global Step: 37610 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:34,957-Speed 8873.66 samples/sec Loss 5.2573 LearningRate 0.1489 Epoch: 9 Global Step: 37620 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:38,379-Speed 11972.41 samples/sec Loss 5.3041 LearningRate 0.1488 Epoch: 9 Global Step: 37630 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:41,804-Speed 11962.90 samples/sec Loss 5.2796 LearningRate 0.1488 Epoch: 9 Global Step: 37640 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:45,228-Speed 11974.71 samples/sec Loss 5.3646 LearningRate 0.1487 Epoch: 9 Global Step: 37650 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:48,664-Speed 11923.90 samples/sec Loss 5.3668 LearningRate 0.1487 Epoch: 9 Global Step: 37660 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:52,256-Speed 11408.39 samples/sec Loss 5.3221 LearningRate 0.1486 Epoch: 9 Global Step: 37670 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:55,636-Speed 12119.60 samples/sec Loss 5.3734 LearningRate 0.1485 Epoch: 9 Global Step: 37680 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:39:59,158-Speed 11631.79 samples/sec Loss 5.3775 LearningRate 0.1485 Epoch: 9 Global Step: 37690 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:40:02,897-Speed 10956.44 samples/sec Loss 5.4230 LearningRate 0.1484 Epoch: 9 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:06,921-Speed 10182.57 samples/sec Loss 5.4300 LearningRate 0.1483 Epoch: 9 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:10,477-Speed 11520.99 samples/sec Loss 5.4349 LearningRate 0.1483 Epoch: 9 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:13,929-Speed 11869.13 samples/sec Loss 5.3813 LearningRate 0.1482 Epoch: 9 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:17,460-Speed 11604.19 samples/sec Loss 5.4242 LearningRate 0.1481 Epoch: 9 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:21,115-Speed 11209.27 samples/sec Loss 5.4354 LearningRate 0.1481 Epoch: 9 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:24,540-Speed 11962.64 samples/sec Loss 5.4383 LearningRate 0.1480 Epoch: 9 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:28,064-Speed 11626.22 samples/sec Loss 5.4740 LearningRate 0.1479 Epoch: 9 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:31,569-Speed 11687.49 samples/sec Loss 5.4242 LearningRate 0.1479 Epoch: 9 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:35,050-Speed 11773.08 samples/sec Loss 5.4144 LearningRate 0.1478 Epoch: 9 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:40:38,635-Speed 11427.10 samples/sec Loss 5.4861 LearningRate 0.1477 Epoch: 9 Global Step: 37800 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:40:42,279-Speed 11242.26 samples/sec Loss 5.4567 LearningRate 0.1477 Epoch: 9 Global Step: 37810 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:40:45,869-Speed 11416.78 samples/sec Loss 5.4532 LearningRate 0.1476 Epoch: 9 Global Step: 37820 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:40:49,761-Speed 10525.28 samples/sec Loss 5.4832 LearningRate 0.1476 Epoch: 9 Global Step: 37830 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:40:53,412-Speed 11222.57 samples/sec Loss 5.5045 LearningRate 0.1475 Epoch: 9 Global Step: 37840 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:40:56,947-Speed 11588.30 samples/sec Loss 5.4397 LearningRate 0.1474 Epoch: 9 Global Step: 37850 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:41:00,779-Speed 10691.96 samples/sec Loss 5.4838 LearningRate 0.1474 Epoch: 9 Global Step: 37860 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:41:04,337-Speed 11515.84 samples/sec Loss 5.4620 LearningRate 0.1473 Epoch: 9 Global Step: 37870 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:41:07,781-Speed 11897.87 samples/sec Loss 5.4517 LearningRate 0.1472 Epoch: 9 Global Step: 37880 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:41:11,291-Speed 11672.69 samples/sec Loss 5.5066 LearningRate 0.1472 Epoch: 9 Global Step: 37890 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:41:15,019-Speed 10989.27 samples/sec Loss 5.4767 LearningRate 0.1471 Epoch: 9 Global Step: 37900 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:41:18,868-Speed 10645.35 samples/sec Loss 5.5582 LearningRate 0.1470 Epoch: 9 Global Step: 37910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:22,661-Speed 10801.39 samples/sec Loss 5.5109 LearningRate 0.1470 Epoch: 9 Global Step: 37920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:26,227-Speed 11488.06 samples/sec Loss 5.5319 LearningRate 0.1469 Epoch: 9 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:29,876-Speed 11228.35 samples/sec Loss 5.5187 LearningRate 0.1468 Epoch: 9 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:33,534-Speed 11199.61 samples/sec Loss 5.5300 LearningRate 0.1468 Epoch: 9 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:37,362-Speed 10705.13 samples/sec Loss 5.5566 LearningRate 0.1467 Epoch: 9 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:41,070-Speed 11048.66 samples/sec Loss 5.5154 LearningRate 0.1466 Epoch: 9 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:44,528-Speed 11846.87 samples/sec Loss 5.5521 LearningRate 0.1466 Epoch: 9 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:47,989-Speed 11836.43 samples/sec Loss 5.5977 LearningRate 0.1465 Epoch: 9 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:51,526-Speed 11584.84 samples/sec Loss 5.5109 LearningRate 0.1465 Epoch: 9 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:41:55,326-Speed 10782.01 samples/sec Loss 5.5396 LearningRate 0.1464 Epoch: 9 Global Step: 38010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:41:59,161-Speed 10682.90 samples/sec Loss 5.6295 LearningRate 0.1463 Epoch: 9 Global Step: 38020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:42:02,718-Speed 11519.37 samples/sec Loss 5.5786 LearningRate 0.1463 Epoch: 9 Global Step: 38030 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:42:06,277-Speed 11512.62 samples/sec Loss 5.6221 LearningRate 0.1462 Epoch: 9 Global Step: 38040 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:42:10,059-Speed 10831.90 samples/sec Loss 5.5640 LearningRate 0.1461 Epoch: 9 Global Step: 38050 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:42:13,588-Speed 11611.32 samples/sec Loss 5.6129 LearningRate 0.1461 Epoch: 9 Global Step: 38060 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:42:17,780-Speed 9772.97 samples/sec Loss 5.6279 LearningRate 0.1460 Epoch: 9 Global Step: 38070 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:42:21,470-Speed 11102.67 samples/sec Loss 5.5744 LearningRate 0.1459 Epoch: 9 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:25,030-Speed 11508.15 samples/sec Loss 5.5751 LearningRate 0.1459 Epoch: 9 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:28,834-Speed 10769.58 samples/sec Loss 5.6214 LearningRate 0.1458 Epoch: 9 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:32,798-Speed 10336.17 samples/sec Loss 5.6368 LearningRate 0.1457 Epoch: 9 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:36,447-Speed 11229.23 samples/sec Loss 5.5888 LearningRate 0.1457 Epoch: 9 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:40,215-Speed 10872.12 samples/sec Loss 5.5743 LearningRate 0.1456 Epoch: 9 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:43,685-Speed 11809.16 samples/sec Loss 5.6193 LearningRate 0.1456 Epoch: 9 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:47,285-Speed 11379.38 samples/sec Loss 5.6275 LearningRate 0.1455 Epoch: 9 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:50,798-Speed 11664.70 samples/sec Loss 5.6026 LearningRate 0.1454 Epoch: 9 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:54,489-Speed 11099.21 samples/sec Loss 5.5897 LearningRate 0.1454 Epoch: 9 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:42:57,935-Speed 11887.74 samples/sec Loss 5.6729 LearningRate 0.1453 Epoch: 9 Global Step: 38180 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:01,425-Speed 11737.92 samples/sec Loss 5.7232 LearningRate 0.1452 Epoch: 9 Global Step: 38190 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:05,100-Speed 11150.03 samples/sec Loss 5.6522 LearningRate 0.1452 Epoch: 9 Global Step: 38200 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:08,639-Speed 11579.46 samples/sec Loss 5.6016 LearningRate 0.1451 Epoch: 9 Global Step: 38210 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:12,235-Speed 11391.15 samples/sec Loss 5.6039 LearningRate 0.1450 Epoch: 9 Global Step: 38220 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:15,919-Speed 11122.20 samples/sec Loss 5.6556 LearningRate 0.1450 Epoch: 9 Global Step: 38230 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:19,549-Speed 11286.30 samples/sec Loss 5.6531 LearningRate 0.1449 Epoch: 9 Global Step: 38240 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:23,099-Speed 11541.48 samples/sec Loss 5.6509 LearningRate 0.1448 Epoch: 9 Global Step: 38250 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:26,541-Speed 11902.58 samples/sec Loss 5.6352 LearningRate 0.1448 Epoch: 9 Global Step: 38260 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:30,114-Speed 11466.48 samples/sec Loss 5.6640 LearningRate 0.1447 Epoch: 9 Global Step: 38270 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:34,043-Speed 10428.39 samples/sec Loss 5.6812 LearningRate 0.1447 Epoch: 9 Global Step: 38280 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 01:43:37,547-Speed 11693.09 samples/sec Loss 5.7074 LearningRate 0.1446 Epoch: 9 Global Step: 38290 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:43:40,984-Speed 11919.08 samples/sec Loss 5.7211 LearningRate 0.1445 Epoch: 9 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:43:44,488-Speed 11691.19 samples/sec Loss 5.7247 LearningRate 0.1445 Epoch: 9 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:43:48,029-Speed 11572.41 samples/sec Loss 5.6641 LearningRate 0.1444 Epoch: 9 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:43:51,595-Speed 11487.84 samples/sec Loss 5.6874 LearningRate 0.1443 Epoch: 9 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:43:55,192-Speed 11393.70 samples/sec Loss 5.6559 LearningRate 0.1443 Epoch: 9 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:43:58,952-Speed 10894.85 samples/sec Loss 5.6607 LearningRate 0.1442 Epoch: 9 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:02,608-Speed 11206.04 samples/sec Loss 5.6813 LearningRate 0.1441 Epoch: 9 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:06,143-Speed 11589.15 samples/sec Loss 5.7216 LearningRate 0.1441 Epoch: 9 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:09,923-Speed 10839.03 samples/sec Loss 5.6702 LearningRate 0.1440 Epoch: 9 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:14,019-Speed 10002.76 samples/sec Loss 5.6993 LearningRate 0.1440 Epoch: 9 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:17,581-Speed 11502.66 samples/sec Loss 5.6880 LearningRate 0.1439 Epoch: 9 Global Step: 38400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:44:21,062-Speed 11770.47 samples/sec Loss 5.7038 LearningRate 0.1438 Epoch: 9 Global Step: 38410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:44:24,461-Speed 12052.44 samples/sec Loss 5.7340 LearningRate 0.1438 Epoch: 9 Global Step: 38420 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:44:28,287-Speed 10709.57 samples/sec Loss 5.7061 LearningRate 0.1437 Epoch: 9 Global Step: 38430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:32,295-Speed 10220.71 samples/sec Loss 5.6673 LearningRate 0.1436 Epoch: 9 Global Step: 38440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:35,838-Speed 11565.38 samples/sec Loss 5.7048 LearningRate 0.1436 Epoch: 9 Global Step: 38450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:39,543-Speed 11055.91 samples/sec Loss 5.6931 LearningRate 0.1435 Epoch: 9 Global Step: 38460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:43,304-Speed 10894.11 samples/sec Loss 5.7530 LearningRate 0.1434 Epoch: 9 Global Step: 38470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:46,960-Speed 11209.41 samples/sec Loss 5.7184 LearningRate 0.1434 Epoch: 9 Global Step: 38480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:50,398-Speed 11918.14 samples/sec Loss 5.6472 LearningRate 0.1433 Epoch: 9 Global Step: 38490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:53,821-Speed 11968.91 samples/sec Loss 5.7288 LearningRate 0.1432 Epoch: 9 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:44:57,626-Speed 10768.33 samples/sec Loss 5.7532 LearningRate 0.1432 Epoch: 9 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:45:01,403-Speed 10848.35 samples/sec Loss 5.6961 LearningRate 0.1431 Epoch: 9 Global Step: 38520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:45:04,933-Speed 11606.31 samples/sec Loss 5.7343 LearningRate 0.1431 Epoch: 9 Global Step: 38530 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:08,429-Speed 11720.73 samples/sec Loss 5.6821 LearningRate 0.1430 Epoch: 9 Global Step: 38540 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:11,960-Speed 11602.26 samples/sec Loss 5.6802 LearningRate 0.1429 Epoch: 9 Global Step: 38550 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:15,514-Speed 11527.14 samples/sec Loss 5.6955 LearningRate 0.1429 Epoch: 9 Global Step: 38560 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:19,377-Speed 10604.83 samples/sec Loss 5.6997 LearningRate 0.1428 Epoch: 9 Global Step: 38570 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:23,027-Speed 11226.50 samples/sec Loss 5.7023 LearningRate 0.1427 Epoch: 9 Global Step: 38580 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:26,654-Speed 11295.60 samples/sec Loss 5.7257 LearningRate 0.1427 Epoch: 9 Global Step: 38590 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:30,193-Speed 11575.96 samples/sec Loss 5.7018 LearningRate 0.1426 Epoch: 9 Global Step: 38600 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:33,955-Speed 10890.83 samples/sec Loss 5.7236 LearningRate 0.1425 Epoch: 9 Global Step: 38610 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:37,502-Speed 11550.01 samples/sec Loss 5.7489 LearningRate 0.1425 Epoch: 9 Global Step: 38620 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:45:40,948-Speed 11889.46 samples/sec Loss 5.7165 LearningRate 0.1424 Epoch: 9 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:45:44,377-Speed 11949.96 samples/sec Loss 5.7480 LearningRate 0.1424 Epoch: 9 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:45:48,446-Speed 10068.66 samples/sec Loss 5.7096 LearningRate 0.1423 Epoch: 9 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:45:52,604-Speed 9852.07 samples/sec Loss 5.7713 LearningRate 0.1422 Epoch: 9 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:45:56,192-Speed 11419.92 samples/sec Loss 5.7340 LearningRate 0.1422 Epoch: 9 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:45:59,661-Speed 11811.58 samples/sec Loss 5.7183 LearningRate 0.1421 Epoch: 9 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:03,316-Speed 11210.10 samples/sec Loss 5.7875 LearningRate 0.1420 Epoch: 9 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:06,928-Speed 11344.71 samples/sec Loss 5.7686 LearningRate 0.1420 Epoch: 9 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:10,797-Speed 10587.05 samples/sec Loss 5.7788 LearningRate 0.1419 Epoch: 9 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:14,394-Speed 11389.95 samples/sec Loss 5.7571 LearningRate 0.1419 Epoch: 9 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:17,949-Speed 11528.54 samples/sec Loss 5.7353 LearningRate 0.1418 Epoch: 9 Global Step: 38730 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:46:21,485-Speed 11586.39 samples/sec Loss 5.7175 LearningRate 0.1417 Epoch: 9 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:25,008-Speed 11630.21 samples/sec Loss 5.8008 LearningRate 0.1417 Epoch: 9 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:28,779-Speed 10861.61 samples/sec Loss 5.7907 LearningRate 0.1416 Epoch: 9 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:32,382-Speed 11372.39 samples/sec Loss 5.7551 LearningRate 0.1415 Epoch: 9 Global Step: 38770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:35,901-Speed 11643.77 samples/sec Loss 5.7882 LearningRate 0.1415 Epoch: 9 Global Step: 38780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:39,485-Speed 11434.67 samples/sec Loss 5.7478 LearningRate 0.1414 Epoch: 9 Global Step: 38790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:43,253-Speed 10871.27 samples/sec Loss 5.7580 LearningRate 0.1413 Epoch: 9 Global Step: 38800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:46,778-Speed 11625.03 samples/sec Loss 5.7479 LearningRate 0.1413 Epoch: 9 Global Step: 38810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:50,355-Speed 11452.09 samples/sec Loss 5.7852 LearningRate 0.1412 Epoch: 9 Global Step: 38820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:54,991-Speed 8839.41 samples/sec Loss 5.7354 LearningRate 0.1412 Epoch: 9 Global Step: 38830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:46:58,505-Speed 11660.79 samples/sec Loss 5.7644 LearningRate 0.1411 Epoch: 9 Global Step: 38840 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:47:02,290-Speed 10823.36 samples/sec Loss 5.7803 LearningRate 0.1410 Epoch: 9 Global Step: 38850 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:47:05,980-Speed 11104.06 samples/sec Loss 5.7500 LearningRate 0.1410 Epoch: 9 Global Step: 38860 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:47:09,570-Speed 11412.89 samples/sec Loss 5.8014 LearningRate 0.1409 Epoch: 9 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:13,050-Speed 11773.38 samples/sec Loss 5.7266 LearningRate 0.1408 Epoch: 9 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:16,983-Speed 10416.84 samples/sec Loss 5.7348 LearningRate 0.1408 Epoch: 9 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:20,459-Speed 11786.89 samples/sec Loss 5.7901 LearningRate 0.1407 Epoch: 9 Global Step: 38900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:24,199-Speed 10954.82 samples/sec Loss 5.8237 LearningRate 0.1406 Epoch: 9 Global Step: 38910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:27,704-Speed 11689.06 samples/sec Loss 5.7660 LearningRate 0.1406 Epoch: 9 Global Step: 38920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:31,069-Speed 12177.59 samples/sec Loss 5.7542 LearningRate 0.1405 Epoch: 9 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:34,705-Speed 11267.65 samples/sec Loss 5.7585 LearningRate 0.1405 Epoch: 9 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:47:38,231-Speed 11621.62 samples/sec Loss 5.8417 LearningRate 0.1404 Epoch: 9 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:47:41,696-Speed 11822.91 samples/sec Loss 5.7650 LearningRate 0.1403 Epoch: 9 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:47:45,149-Speed 11864.66 samples/sec Loss 5.7562 LearningRate 0.1403 Epoch: 9 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:47:48,745-Speed 11393.09 samples/sec Loss 5.7521 LearningRate 0.1402 Epoch: 9 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:47:52,798-Speed 10108.36 samples/sec Loss 5.7271 LearningRate 0.1401 Epoch: 9 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:47:56,404-Speed 11362.92 samples/sec Loss 5.7659 LearningRate 0.1401 Epoch: 9 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:48:00,062-Speed 11200.58 samples/sec Loss 5.7775 LearningRate 0.1400 Epoch: 9 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:48:03,522-Speed 11839.17 samples/sec Loss 5.8114 LearningRate 0.1400 Epoch: 9 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:48:06,980-Speed 11848.23 samples/sec Loss 5.7850 LearningRate 0.1399 Epoch: 9 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:48:10,355-Speed 12140.68 samples/sec Loss 5.7811 LearningRate 0.1398 Epoch: 9 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-17 01:48:14,161-Speed 10762.84 samples/sec Loss 5.8027 LearningRate 0.1398 Epoch: 9 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:18,472-Speed 9504.01 samples/sec Loss 5.7893 LearningRate 0.1397 Epoch: 9 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:21,931-Speed 11844.49 samples/sec Loss 5.7833 LearningRate 0.1396 Epoch: 9 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:25,361-Speed 11946.36 samples/sec Loss 5.7844 LearningRate 0.1396 Epoch: 9 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:28,924-Speed 11497.14 samples/sec Loss 5.7770 LearningRate 0.1395 Epoch: 9 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:32,563-Speed 11262.41 samples/sec Loss 5.7934 LearningRate 0.1394 Epoch: 9 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:36,181-Speed 11322.91 samples/sec Loss 5.7255 LearningRate 0.1394 Epoch: 9 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:39,924-Speed 10943.94 samples/sec Loss 5.7747 LearningRate 0.1393 Epoch: 9 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:43,826-Speed 10499.35 samples/sec Loss 5.8064 LearningRate 0.1393 Epoch: 9 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:47,233-Speed 12027.46 samples/sec Loss 5.7703 LearningRate 0.1392 Epoch: 9 Global Step: 39140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:50,622-Speed 12090.47 samples/sec Loss 5.8075 LearningRate 0.1391 Epoch: 9 Global Step: 39150 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:48:55,111-Speed 9125.81 samples/sec Loss 5.7847 LearningRate 0.1391 Epoch: 9 Global Step: 39160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:48:58,555-Speed 11895.11 samples/sec Loss 5.8365 LearningRate 0.1390 Epoch: 9 Global Step: 39170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:01,927-Speed 12151.70 samples/sec Loss 5.7834 LearningRate 0.1389 Epoch: 9 Global Step: 39180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:05,741-Speed 10741.80 samples/sec Loss 5.8056 LearningRate 0.1389 Epoch: 9 Global Step: 39190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:09,290-Speed 11544.16 samples/sec Loss 5.8432 LearningRate 0.1388 Epoch: 9 Global Step: 39200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:12,821-Speed 11602.74 samples/sec Loss 5.7469 LearningRate 0.1388 Epoch: 9 Global Step: 39210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:16,276-Speed 11860.78 samples/sec Loss 5.8549 LearningRate 0.1387 Epoch: 9 Global Step: 39220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:19,778-Speed 11698.11 samples/sec Loss 5.7762 LearningRate 0.1386 Epoch: 9 Global Step: 39230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:23,161-Speed 12111.67 samples/sec Loss 5.7374 LearningRate 0.1386 Epoch: 9 Global Step: 39240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:26,810-Speed 11227.43 samples/sec Loss 5.7978 LearningRate 0.1385 Epoch: 9 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:49:30,283-Speed 11796.93 samples/sec Loss 5.7709 LearningRate 0.1384 Epoch: 9 Global Step: 39260 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:34,243-Speed 10346.89 samples/sec Loss 5.8604 LearningRate 0.1384 Epoch: 9 Global Step: 39270 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:37,700-Speed 11849.77 samples/sec Loss 5.8534 LearningRate 0.1383 Epoch: 9 Global Step: 39280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:41,299-Speed 11385.24 samples/sec Loss 5.8341 LearningRate 0.1383 Epoch: 9 Global Step: 39290 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:45,036-Speed 10964.02 samples/sec Loss 5.7624 LearningRate 0.1382 Epoch: 9 Global Step: 39300 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:48,510-Speed 11792.01 samples/sec Loss 5.8039 LearningRate 0.1381 Epoch: 9 Global Step: 39310 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:52,742-Speed 9681.90 samples/sec Loss 5.8388 LearningRate 0.1381 Epoch: 9 Global Step: 39320 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:56,306-Speed 11493.39 samples/sec Loss 5.7925 LearningRate 0.1380 Epoch: 9 Global Step: 39330 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:49:59,954-Speed 11231.32 samples/sec Loss 5.7718 LearningRate 0.1379 Epoch: 9 Global Step: 39340 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:50:03,380-Speed 11959.77 samples/sec Loss 5.7835 LearningRate 0.1379 Epoch: 9 Global Step: 39350 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:50:06,762-Speed 12114.51 samples/sec Loss 5.7683 LearningRate 0.1378 Epoch: 9 Global Step: 39360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:50:10,403-Speed 11250.77 samples/sec Loss 5.7802 LearningRate 0.1378 Epoch: 9 Global Step: 39370 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:50:14,010-Speed 11358.52 samples/sec Loss 5.7486 LearningRate 0.1377 Epoch: 9 Global Step: 39380 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:50:17,764-Speed 10914.81 samples/sec Loss 5.8378 LearningRate 0.1376 Epoch: 9 Global Step: 39390 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:50:21,410-Speed 11235.57 samples/sec Loss 5.8472 LearningRate 0.1376 Epoch: 9 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:24,896-Speed 11756.02 samples/sec Loss 5.7982 LearningRate 0.1375 Epoch: 9 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:28,334-Speed 11915.48 samples/sec Loss 5.8021 LearningRate 0.1374 Epoch: 9 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:31,810-Speed 11788.24 samples/sec Loss 5.8378 LearningRate 0.1374 Epoch: 9 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:35,363-Speed 11528.44 samples/sec Loss 5.8565 LearningRate 0.1373 Epoch: 9 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:39,082-Speed 11016.33 samples/sec Loss 5.8347 LearningRate 0.1373 Epoch: 9 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:42,862-Speed 10841.26 samples/sec Loss 5.8431 LearningRate 0.1372 Epoch: 9 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:46,350-Speed 11745.44 samples/sec Loss 5.8102 LearningRate 0.1371 Epoch: 9 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:49,913-Speed 11498.01 samples/sec Loss 5.8076 LearningRate 0.1371 Epoch: 9 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:54,260-Speed 9426.00 samples/sec Loss 5.7695 LearningRate 0.1370 Epoch: 9 Global Step: 39490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:50:57,899-Speed 11256.90 samples/sec Loss 5.8002 LearningRate 0.1369 Epoch: 9 Global Step: 39500 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:51:01,350-Speed 11876.61 samples/sec Loss 5.8099 LearningRate 0.1369 Epoch: 9 Global Step: 39510 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:51:04,765-Speed 11996.87 samples/sec Loss 5.7757 LearningRate 0.1368 Epoch: 9 Global Step: 39520 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:51:08,265-Speed 11705.54 samples/sec Loss 5.7853 LearningRate 0.1368 Epoch: 9 Global Step: 39530 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:51:11,933-Speed 11169.68 samples/sec Loss 5.8778 LearningRate 0.1367 Epoch: 9 Global Step: 39540 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:51:15,466-Speed 11603.30 samples/sec Loss 5.7868 LearningRate 0.1366 Epoch: 9 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:19,072-Speed 11364.17 samples/sec Loss 5.8354 LearningRate 0.1366 Epoch: 9 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:22,744-Speed 11157.47 samples/sec Loss 5.8091 LearningRate 0.1365 Epoch: 9 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:26,576-Speed 10692.16 samples/sec Loss 5.8309 LearningRate 0.1364 Epoch: 9 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:30,075-Speed 11709.59 samples/sec Loss 5.7511 LearningRate 0.1364 Epoch: 9 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:33,641-Speed 11487.58 samples/sec Loss 5.8106 LearningRate 0.1363 Epoch: 9 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:37,215-Speed 11465.26 samples/sec Loss 5.8966 LearningRate 0.1363 Epoch: 9 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:40,658-Speed 11900.84 samples/sec Loss 5.8797 LearningRate 0.1362 Epoch: 9 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:44,329-Speed 11160.05 samples/sec Loss 5.8301 LearningRate 0.1361 Epoch: 9 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:47,791-Speed 11831.48 samples/sec Loss 5.7768 LearningRate 0.1361 Epoch: 9 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:51,272-Speed 11769.84 samples/sec Loss 5.7688 LearningRate 0.1360 Epoch: 9 Global Step: 39650 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:51:55,996-Speed 8674.16 samples/sec Loss 5.8149 LearningRate 0.1359 Epoch: 9 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:51:59,475-Speed 11779.18 samples/sec Loss 5.8165 LearningRate 0.1359 Epoch: 9 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:02,955-Speed 11771.16 samples/sec Loss 5.8164 LearningRate 0.1358 Epoch: 9 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:06,499-Speed 11562.60 samples/sec Loss 5.8384 LearningRate 0.1358 Epoch: 9 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:10,056-Speed 11514.85 samples/sec Loss 5.8194 LearningRate 0.1357 Epoch: 9 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:14,784-Speed 8666.64 samples/sec Loss 5.8476 LearningRate 0.1356 Epoch: 9 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:18,543-Speed 10898.57 samples/sec Loss 5.7884 LearningRate 0.1356 Epoch: 9 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:22,488-Speed 10384.02 samples/sec Loss 5.8046 LearningRate 0.1355 Epoch: 9 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:25,917-Speed 11949.31 samples/sec Loss 5.8225 LearningRate 0.1355 Epoch: 9 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:29,549-Speed 11281.45 samples/sec Loss 5.7945 LearningRate 0.1354 Epoch: 9 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:52:32,998-Speed 11877.52 samples/sec Loss 5.8338 LearningRate 0.1353 Epoch: 9 Global Step: 39760 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:52:36,535-Speed 11584.20 samples/sec Loss 5.8137 LearningRate 0.1353 Epoch: 9 Global Step: 39770 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:52:40,060-Speed 11623.94 samples/sec Loss 5.8107 LearningRate 0.1352 Epoch: 9 Global Step: 39780 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:52:43,561-Speed 11701.09 samples/sec Loss 5.7742 LearningRate 0.1351 Epoch: 9 Global Step: 39790 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:52:47,162-Speed 11376.98 samples/sec Loss 5.8033 LearningRate 0.1351 Epoch: 9 Global Step: 39800 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:52:50,637-Speed 11792.85 samples/sec Loss 5.8468 LearningRate 0.1350 Epoch: 9 Global Step: 39810 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:52:54,112-Speed 11787.95 samples/sec Loss 5.8673 LearningRate 0.1350 Epoch: 9 Global Step: 39820 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:52:58,240-Speed 9923.82 samples/sec Loss 5.8500 LearningRate 0.1349 Epoch: 9 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:01,978-Speed 10962.30 samples/sec Loss 5.8407 LearningRate 0.1348 Epoch: 9 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:05,457-Speed 11775.91 samples/sec Loss 5.7889 LearningRate 0.1348 Epoch: 9 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:09,231-Speed 10857.42 samples/sec Loss 5.7857 LearningRate 0.1347 Epoch: 9 Global Step: 39860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:12,684-Speed 11865.90 samples/sec Loss 5.8196 LearningRate 0.1346 Epoch: 9 Global Step: 39870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:16,502-Speed 10729.66 samples/sec Loss 5.8103 LearningRate 0.1346 Epoch: 9 Global Step: 39880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:20,233-Speed 10982.00 samples/sec Loss 5.8876 LearningRate 0.1345 Epoch: 9 Global Step: 39890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:23,672-Speed 11911.74 samples/sec Loss 5.8423 LearningRate 0.1345 Epoch: 9 Global Step: 39900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:27,063-Speed 12083.18 samples/sec Loss 5.8142 LearningRate 0.1344 Epoch: 9 Global Step: 39910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:30,587-Speed 11626.68 samples/sec Loss 5.7857 LearningRate 0.1343 Epoch: 9 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:53:34,209-Speed 11311.42 samples/sec Loss 5.7904 LearningRate 0.1343 Epoch: 9 Global Step: 39930 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:53:37,588-Speed 12125.58 samples/sec Loss 5.8132 LearningRate 0.1342 Epoch: 9 Global Step: 39940 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:53:40,976-Speed 12092.28 samples/sec Loss 5.8006 LearningRate 0.1342 Epoch: 9 Global Step: 39950 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:53:44,828-Speed 10637.06 samples/sec Loss 5.8286 LearningRate 0.1341 Epoch: 9 Global Step: 39960 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:53:48,650-Speed 10721.52 samples/sec Loss 5.8860 LearningRate 0.1340 Epoch: 9 Global Step: 39970 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:53:52,242-Speed 11402.83 samples/sec Loss 5.7852 LearningRate 0.1340 Epoch: 9 Global Step: 39980 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:53:56,128-Speed 10543.72 samples/sec Loss 5.7930 LearningRate 0.1339 Epoch: 9 Global Step: 39990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:53:59,839-Speed 11039.35 samples/sec Loss 5.8508 LearningRate 0.1338 Epoch: 9 Global Step: 40000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:54:21,115-[lfw][40000]XNorm: 10.562779 Training: 2022-01-17 01:54:21,116-[lfw][40000]Accuracy-Flip: 0.99600+-0.00367 Training: 2022-01-17 01:54:21,116-[lfw][40000]Accuracy-Highest: 0.99650 Training: 2022-01-17 01:54:45,685-[cfp_fp][40000]XNorm: 8.920896 Training: 2022-01-17 01:54:45,686-[cfp_fp][40000]Accuracy-Flip: 0.96629+-0.00712 Training: 2022-01-17 01:54:45,686-[cfp_fp][40000]Accuracy-Highest: 0.96629 Training: 2022-01-17 01:55:06,876-[agedb_30][40000]XNorm: 10.131247 Training: 2022-01-17 01:55:06,877-[agedb_30][40000]Accuracy-Flip: 0.96467+-0.00745 Training: 2022-01-17 01:55:06,877-[agedb_30][40000]Accuracy-Highest: 0.96467 Training: 2022-01-17 01:55:10,244-Speed 581.79 samples/sec Loss 5.7992 LearningRate 0.1338 Epoch: 9 Global Step: 40010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:55:13,625-Speed 12115.44 samples/sec Loss 5.8521 LearningRate 0.1337 Epoch: 9 Global Step: 40020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:55:17,009-Speed 12110.73 samples/sec Loss 5.7834 LearningRate 0.1337 Epoch: 9 Global Step: 40030 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:55:20,431-Speed 11972.16 samples/sec Loss 5.8035 LearningRate 0.1336 Epoch: 9 Global Step: 40040 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:55:24,367-Speed 10409.65 samples/sec Loss 5.8176 LearningRate 0.1335 Epoch: 9 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:27,808-Speed 11902.75 samples/sec Loss 5.8801 LearningRate 0.1335 Epoch: 9 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:31,336-Speed 11617.09 samples/sec Loss 5.8050 LearningRate 0.1334 Epoch: 9 Global Step: 40070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:34,784-Speed 11880.54 samples/sec Loss 5.8230 LearningRate 0.1334 Epoch: 9 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:38,578-Speed 10804.28 samples/sec Loss 5.8213 LearningRate 0.1333 Epoch: 9 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:41,968-Speed 12085.13 samples/sec Loss 5.8400 LearningRate 0.1332 Epoch: 9 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:45,511-Speed 11564.84 samples/sec Loss 5.7588 LearningRate 0.1332 Epoch: 9 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:49,330-Speed 10727.06 samples/sec Loss 5.7815 LearningRate 0.1331 Epoch: 9 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:53,086-Speed 10908.87 samples/sec Loss 5.8059 LearningRate 0.1330 Epoch: 9 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:55:56,850-Speed 10886.36 samples/sec Loss 5.8047 LearningRate 0.1330 Epoch: 9 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:00,458-Speed 11354.55 samples/sec Loss 5.8120 LearningRate 0.1329 Epoch: 9 Global Step: 40150 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:56:04,323-Speed 10600.16 samples/sec Loss 5.8468 LearningRate 0.1329 Epoch: 9 Global Step: 40160 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:56:08,179-Speed 10623.82 samples/sec Loss 5.8043 LearningRate 0.1328 Epoch: 9 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:11,769-Speed 11413.61 samples/sec Loss 5.8691 LearningRate 0.1327 Epoch: 9 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:15,150-Speed 12117.86 samples/sec Loss 5.8372 LearningRate 0.1327 Epoch: 9 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:18,695-Speed 11557.78 samples/sec Loss 5.7970 LearningRate 0.1326 Epoch: 9 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:22,379-Speed 11120.23 samples/sec Loss 5.7614 LearningRate 0.1326 Epoch: 9 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:26,482-Speed 9987.33 samples/sec Loss 5.8410 LearningRate 0.1325 Epoch: 9 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:30,307-Speed 10710.78 samples/sec Loss 5.7998 LearningRate 0.1324 Epoch: 9 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:34,054-Speed 10935.17 samples/sec Loss 5.7910 LearningRate 0.1324 Epoch: 9 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:37,493-Speed 11913.26 samples/sec Loss 5.8339 LearningRate 0.1323 Epoch: 9 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:41,083-Speed 11410.09 samples/sec Loss 5.8233 LearningRate 0.1322 Epoch: 9 Global Step: 40260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:45,168-Speed 10030.44 samples/sec Loss 5.7809 LearningRate 0.1322 Epoch: 9 Global Step: 40270 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:56:48,764-Speed 11393.49 samples/sec Loss 5.8206 LearningRate 0.1321 Epoch: 9 Global Step: 40280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:56:52,247-Speed 11762.84 samples/sec Loss 5.8690 LearningRate 0.1321 Epoch: 9 Global Step: 40290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:56,093-Speed 10654.37 samples/sec Loss 5.8066 LearningRate 0.1320 Epoch: 9 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:56:59,713-Speed 11315.20 samples/sec Loss 5.7765 LearningRate 0.1319 Epoch: 9 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:03,178-Speed 11827.94 samples/sec Loss 5.8624 LearningRate 0.1319 Epoch: 9 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:06,711-Speed 11596.58 samples/sec Loss 5.8290 LearningRate 0.1318 Epoch: 9 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:10,465-Speed 10914.41 samples/sec Loss 5.7908 LearningRate 0.1318 Epoch: 9 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:13,949-Speed 11757.67 samples/sec Loss 5.7885 LearningRate 0.1317 Epoch: 9 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:17,517-Speed 11482.48 samples/sec Loss 5.8258 LearningRate 0.1316 Epoch: 9 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:21,111-Speed 11400.87 samples/sec Loss 5.7786 LearningRate 0.1316 Epoch: 9 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:25,090-Speed 10296.06 samples/sec Loss 5.8008 LearningRate 0.1315 Epoch: 9 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:57:28,718-Speed 11294.90 samples/sec Loss 5.8152 LearningRate 0.1315 Epoch: 9 Global Step: 40390 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:32,249-Speed 11602.13 samples/sec Loss 5.8067 LearningRate 0.1314 Epoch: 9 Global Step: 40400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:36,005-Speed 10908.38 samples/sec Loss 5.8046 LearningRate 0.1313 Epoch: 9 Global Step: 40410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:39,477-Speed 11800.17 samples/sec Loss 5.7634 LearningRate 0.1313 Epoch: 9 Global Step: 40420 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:42,973-Speed 11719.97 samples/sec Loss 5.8187 LearningRate 0.1312 Epoch: 9 Global Step: 40430 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:46,497-Speed 11626.25 samples/sec Loss 5.7695 LearningRate 0.1311 Epoch: 9 Global Step: 40440 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:51,190-Speed 8728.70 samples/sec Loss 5.8127 LearningRate 0.1311 Epoch: 9 Global Step: 40450 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:54,644-Speed 11862.06 samples/sec Loss 5.8563 LearningRate 0.1310 Epoch: 9 Global Step: 40460 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:57:58,059-Speed 12000.66 samples/sec Loss 5.8209 LearningRate 0.1310 Epoch: 9 Global Step: 40470 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:01,983-Speed 10441.57 samples/sec Loss 5.8272 LearningRate 0.1309 Epoch: 9 Global Step: 40480 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:05,768-Speed 10822.43 samples/sec Loss 5.8556 LearningRate 0.1308 Epoch: 9 Global Step: 40490 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:09,357-Speed 11416.84 samples/sec Loss 5.8301 LearningRate 0.1308 Epoch: 9 Global Step: 40500 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:12,786-Speed 11946.56 samples/sec Loss 5.7710 LearningRate 0.1307 Epoch: 9 Global Step: 40510 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:16,337-Speed 11538.08 samples/sec Loss 5.8270 LearningRate 0.1307 Epoch: 9 Global Step: 40520 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:19,754-Speed 11992.88 samples/sec Loss 5.7441 LearningRate 0.1306 Epoch: 9 Global Step: 40530 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:23,290-Speed 11584.34 samples/sec Loss 5.7741 LearningRate 0.1305 Epoch: 9 Global Step: 40540 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:26,989-Speed 11077.99 samples/sec Loss 5.8141 LearningRate 0.1305 Epoch: 9 Global Step: 40550 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:30,457-Speed 11811.56 samples/sec Loss 5.8273 LearningRate 0.1304 Epoch: 9 Global Step: 40560 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:34,338-Speed 10557.88 samples/sec Loss 5.7720 LearningRate 0.1304 Epoch: 9 Global Step: 40570 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:38,204-Speed 10598.93 samples/sec Loss 5.8139 LearningRate 0.1303 Epoch: 9 Global Step: 40580 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:41,873-Speed 11166.64 samples/sec Loss 5.7708 LearningRate 0.1302 Epoch: 9 Global Step: 40590 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:58:45,703-Speed 10695.88 samples/sec Loss 5.8030 LearningRate 0.1302 Epoch: 9 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:58:49,402-Speed 11076.65 samples/sec Loss 5.8055 LearningRate 0.1301 Epoch: 9 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:58:53,530-Speed 9925.70 samples/sec Loss 5.8326 LearningRate 0.1301 Epoch: 9 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:58:56,949-Speed 11984.79 samples/sec Loss 5.8185 LearningRate 0.1300 Epoch: 9 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:01,141-Speed 9771.82 samples/sec Loss 5.8218 LearningRate 0.1299 Epoch: 9 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:04,820-Speed 11135.38 samples/sec Loss 5.7461 LearningRate 0.1299 Epoch: 9 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:08,565-Speed 10940.07 samples/sec Loss 5.8403 LearningRate 0.1298 Epoch: 9 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:11,945-Speed 12119.54 samples/sec Loss 5.8399 LearningRate 0.1297 Epoch: 9 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:16,146-Speed 9752.62 samples/sec Loss 5.8398 LearningRate 0.1297 Epoch: 9 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:19,547-Speed 12049.75 samples/sec Loss 5.7496 LearningRate 0.1296 Epoch: 9 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:22,938-Speed 12079.74 samples/sec Loss 5.8347 LearningRate 0.1296 Epoch: 9 Global Step: 40700 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 01:59:26,561-Speed 11310.67 samples/sec Loss 5.8412 LearningRate 0.1295 Epoch: 9 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:30,162-Speed 11378.15 samples/sec Loss 5.8204 LearningRate 0.1294 Epoch: 9 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:33,696-Speed 11594.37 samples/sec Loss 5.7943 LearningRate 0.1294 Epoch: 9 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:37,283-Speed 11422.57 samples/sec Loss 5.8015 LearningRate 0.1293 Epoch: 9 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:40,825-Speed 11566.44 samples/sec Loss 5.7983 LearningRate 0.1293 Epoch: 9 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:44,427-Speed 11371.35 samples/sec Loss 5.7516 LearningRate 0.1292 Epoch: 9 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:48,000-Speed 11468.06 samples/sec Loss 5.7804 LearningRate 0.1291 Epoch: 9 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:52,228-Speed 9689.73 samples/sec Loss 5.8459 LearningRate 0.1291 Epoch: 9 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:55,733-Speed 11691.45 samples/sec Loss 5.7571 LearningRate 0.1290 Epoch: 9 Global Step: 40790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 01:59:59,232-Speed 11709.49 samples/sec Loss 5.8384 LearningRate 0.1290 Epoch: 9 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:03,044-Speed 10748.00 samples/sec Loss 5.7575 LearningRate 0.1289 Epoch: 9 Global Step: 40810 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:00:06,614-Speed 11475.17 samples/sec Loss 5.8206 LearningRate 0.1288 Epoch: 9 Global Step: 40820 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:00:10,476-Speed 10609.37 samples/sec Loss 5.8108 LearningRate 0.1288 Epoch: 9 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:14,223-Speed 10933.95 samples/sec Loss 5.7482 LearningRate 0.1287 Epoch: 9 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:17,995-Speed 10862.04 samples/sec Loss 5.7871 LearningRate 0.1287 Epoch: 9 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:21,576-Speed 11440.46 samples/sec Loss 5.7882 LearningRate 0.1286 Epoch: 9 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:25,171-Speed 11400.00 samples/sec Loss 5.8225 LearningRate 0.1285 Epoch: 9 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:28,645-Speed 11791.43 samples/sec Loss 5.8339 LearningRate 0.1285 Epoch: 9 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:32,242-Speed 11393.15 samples/sec Loss 5.7477 LearningRate 0.1284 Epoch: 9 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:35,945-Speed 11062.07 samples/sec Loss 5.7881 LearningRate 0.1284 Epoch: 9 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:39,715-Speed 10865.57 samples/sec Loss 5.7462 LearningRate 0.1283 Epoch: 9 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:43,539-Speed 10714.65 samples/sec Loss 5.7917 LearningRate 0.1282 Epoch: 9 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:00:47,020-Speed 11770.15 samples/sec Loss 5.7289 LearningRate 0.1282 Epoch: 9 Global Step: 40930 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:00:50,853-Speed 10688.32 samples/sec Loss 5.7294 LearningRate 0.1281 Epoch: 9 Global Step: 40940 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:00:54,742-Speed 10535.35 samples/sec Loss 5.8061 LearningRate 0.1281 Epoch: 9 Global Step: 40950 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:00:58,610-Speed 10591.70 samples/sec Loss 5.7849 LearningRate 0.1280 Epoch: 9 Global Step: 40960 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:02,069-Speed 11843.57 samples/sec Loss 5.7463 LearningRate 0.1279 Epoch: 9 Global Step: 40970 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:05,715-Speed 11239.16 samples/sec Loss 5.7835 LearningRate 0.1279 Epoch: 9 Global Step: 40980 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:09,432-Speed 11021.38 samples/sec Loss 5.8121 LearningRate 0.1278 Epoch: 9 Global Step: 40990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:12,929-Speed 11715.94 samples/sec Loss 5.8321 LearningRate 0.1278 Epoch: 9 Global Step: 41000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:16,871-Speed 10392.43 samples/sec Loss 5.7936 LearningRate 0.1277 Epoch: 9 Global Step: 41010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:20,621-Speed 10924.52 samples/sec Loss 5.7697 LearningRate 0.1276 Epoch: 9 Global Step: 41020 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:24,196-Speed 11460.77 samples/sec Loss 5.7337 LearningRate 0.1276 Epoch: 9 Global Step: 41030 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 02:01:27,623-Speed 11954.69 samples/sec Loss 5.8278 LearningRate 0.1275 Epoch: 9 Global Step: 41040 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:31,225-Speed 11377.04 samples/sec Loss 5.8325 LearningRate 0.1275 Epoch: 9 Global Step: 41050 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:34,795-Speed 11475.07 samples/sec Loss 5.7799 LearningRate 0.1274 Epoch: 9 Global Step: 41060 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:38,310-Speed 11654.49 samples/sec Loss 5.7920 LearningRate 0.1273 Epoch: 9 Global Step: 41070 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:41,867-Speed 11519.14 samples/sec Loss 5.7958 LearningRate 0.1273 Epoch: 9 Global Step: 41080 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:45,346-Speed 11775.40 samples/sec Loss 5.7743 LearningRate 0.1272 Epoch: 9 Global Step: 41090 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:49,002-Speed 11206.05 samples/sec Loss 5.8287 LearningRate 0.1272 Epoch: 9 Global Step: 41100 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:52,548-Speed 11557.71 samples/sec Loss 5.7989 LearningRate 0.1271 Epoch: 9 Global Step: 41110 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:01:56,546-Speed 10247.01 samples/sec Loss 5.7818 LearningRate 0.1270 Epoch: 9 Global Step: 41120 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:01,149-Speed 8899.15 samples/sec Loss 5.7564 LearningRate 0.1270 Epoch: 9 Global Step: 41130 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:04,663-Speed 11659.45 samples/sec Loss 5.8083 LearningRate 0.1269 Epoch: 9 Global Step: 41140 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 02:02:08,178-Speed 11658.44 samples/sec Loss 5.7464 LearningRate 0.1269 Epoch: 9 Global Step: 41150 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 02:02:11,845-Speed 11173.25 samples/sec Loss 5.7989 LearningRate 0.1268 Epoch: 9 Global Step: 41160 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:15,726-Speed 10555.91 samples/sec Loss 5.7569 LearningRate 0.1267 Epoch: 9 Global Step: 41170 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:19,328-Speed 11375.70 samples/sec Loss 5.7609 LearningRate 0.1267 Epoch: 9 Global Step: 41180 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:23,134-Speed 10764.05 samples/sec Loss 5.7835 LearningRate 0.1266 Epoch: 9 Global Step: 41190 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:26,844-Speed 11041.49 samples/sec Loss 5.7843 LearningRate 0.1266 Epoch: 9 Global Step: 41200 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:30,269-Speed 11963.18 samples/sec Loss 5.7607 LearningRate 0.1265 Epoch: 9 Global Step: 41210 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:33,794-Speed 11626.28 samples/sec Loss 5.7368 LearningRate 0.1264 Epoch: 9 Global Step: 41220 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:37,517-Speed 11002.84 samples/sec Loss 5.7570 LearningRate 0.1264 Epoch: 9 Global Step: 41230 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:41,028-Speed 11666.88 samples/sec Loss 5.7401 LearningRate 0.1263 Epoch: 9 Global Step: 41240 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:44,694-Speed 11177.89 samples/sec Loss 5.7556 LearningRate 0.1263 Epoch: 9 Global Step: 41250 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:02:48,145-Speed 11871.33 samples/sec Loss 5.7739 LearningRate 0.1262 Epoch: 9 Global Step: 41260 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 02:02:51,559-Speed 12002.49 samples/sec Loss 5.7702 LearningRate 0.1261 Epoch: 9 Global Step: 41270 Fp16 Grad Scale: 524288 Required: 5 hours Training: 2022-01-17 02:02:54,981-Speed 11972.21 samples/sec Loss 5.7570 LearningRate 0.1261 Epoch: 9 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:02:58,574-Speed 11403.73 samples/sec Loss 5.7764 LearningRate 0.1260 Epoch: 9 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:02,067-Speed 11727.35 samples/sec Loss 5.8084 LearningRate 0.1260 Epoch: 9 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:05,573-Speed 11686.38 samples/sec Loss 5.7853 LearningRate 0.1259 Epoch: 9 Global Step: 41310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:09,091-Speed 11646.75 samples/sec Loss 5.7511 LearningRate 0.1258 Epoch: 9 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:12,910-Speed 10727.63 samples/sec Loss 5.7493 LearningRate 0.1258 Epoch: 9 Global Step: 41330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:16,422-Speed 11666.03 samples/sec Loss 5.8298 LearningRate 0.1257 Epoch: 9 Global Step: 41340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:20,462-Speed 10138.63 samples/sec Loss 5.7683 LearningRate 0.1257 Epoch: 9 Global Step: 41350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:24,976-Speed 9075.44 samples/sec Loss 5.6986 LearningRate 0.1256 Epoch: 9 Global Step: 41360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:28,531-Speed 11526.25 samples/sec Loss 5.7283 LearningRate 0.1255 Epoch: 9 Global Step: 41370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:31,989-Speed 11847.91 samples/sec Loss 5.7797 LearningRate 0.1255 Epoch: 9 Global Step: 41380 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:03:35,510-Speed 11639.41 samples/sec Loss 5.7506 LearningRate 0.1254 Epoch: 9 Global Step: 41390 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:03:39,109-Speed 11383.34 samples/sec Loss 5.7809 LearningRate 0.1254 Epoch: 9 Global Step: 41400 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:03:42,548-Speed 11913.78 samples/sec Loss 5.7525 LearningRate 0.1253 Epoch: 9 Global Step: 41410 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:03:46,282-Speed 10968.59 samples/sec Loss 5.7622 LearningRate 0.1252 Epoch: 9 Global Step: 41420 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:03:50,132-Speed 10644.38 samples/sec Loss 5.7397 LearningRate 0.1252 Epoch: 9 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:53,989-Speed 10622.34 samples/sec Loss 5.7400 LearningRate 0.1251 Epoch: 9 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:03:57,900-Speed 10476.00 samples/sec Loss 5.8063 LearningRate 0.1251 Epoch: 9 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:01,703-Speed 10771.64 samples/sec Loss 5.7574 LearningRate 0.1250 Epoch: 9 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:05,545-Speed 10663.51 samples/sec Loss 5.7122 LearningRate 0.1249 Epoch: 9 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:09,339-Speed 10799.82 samples/sec Loss 5.7713 LearningRate 0.1249 Epoch: 9 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:12,817-Speed 11782.28 samples/sec Loss 5.7139 LearningRate 0.1248 Epoch: 9 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:16,267-Speed 11875.61 samples/sec Loss 5.7643 LearningRate 0.1248 Epoch: 9 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:20,146-Speed 10561.19 samples/sec Loss 5.7016 LearningRate 0.1247 Epoch: 9 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:23,587-Speed 11906.60 samples/sec Loss 5.7773 LearningRate 0.1246 Epoch: 9 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-17 02:04:27,616-Speed 10170.41 samples/sec Loss 5.7181 LearningRate 0.1246 Epoch: 9 Global Step: 41530 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:04:31,067-Speed 11872.99 samples/sec Loss 5.7940 LearningRate 0.1245 Epoch: 9 Global Step: 41540 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-17 02:04:34,480-Speed 12002.94 samples/sec Loss 5.7821 LearningRate 0.1245 Epoch: 9 Global Step: 41550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:04:37,922-Speed 11903.70 samples/sec Loss 5.7282 LearningRate 0.1244 Epoch: 9 Global Step: 41560 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:04:41,603-Speed 11129.63 samples/sec Loss 5.7426 LearningRate 0.1243 Epoch: 9 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:04:45,242-Speed 11257.26 samples/sec Loss 5.7904 LearningRate 0.1243 Epoch: 9 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:04:48,669-Speed 11955.71 samples/sec Loss 5.7263 LearningRate 0.1242 Epoch: 9 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:04:52,223-Speed 11527.54 samples/sec Loss 5.7420 LearningRate 0.1242 Epoch: 9 Global Step: 41600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:04:55,637-Speed 12002.52 samples/sec Loss 5.7882 LearningRate 0.1241 Epoch: 9 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:04:59,055-Speed 11984.70 samples/sec Loss 5.7616 LearningRate 0.1240 Epoch: 9 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:05:02,757-Speed 11066.81 samples/sec Loss 5.7461 LearningRate 0.1240 Epoch: 9 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:05:06,317-Speed 11510.05 samples/sec Loss 5.7718 LearningRate 0.1239 Epoch: 9 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:05:10,402-Speed 10029.40 samples/sec Loss 5.7094 LearningRate 0.1239 Epoch: 9 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:05:13,800-Speed 12058.18 samples/sec Loss 5.7192 LearningRate 0.1238 Epoch: 9 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:05:17,305-Speed 11689.30 samples/sec Loss 5.7338 LearningRate 0.1238 Epoch: 9 Global Step: 41670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:05:20,731-Speed 11959.33 samples/sec Loss 5.7451 LearningRate 0.1237 Epoch: 9 Global Step: 41680 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:05:24,237-Speed 11682.56 samples/sec Loss 5.7535 LearningRate 0.1236 Epoch: 9 Global Step: 41690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:05:27,684-Speed 11888.59 samples/sec Loss 5.6634 LearningRate 0.1236 Epoch: 9 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:05:32,010-Speed 9470.96 samples/sec Loss 5.7474 LearningRate 0.1235 Epoch: 9 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:05:35,521-Speed 11665.23 samples/sec Loss 5.7151 LearningRate 0.1235 Epoch: 9 Global Step: 41720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:08,445-Speed 1244.14 samples/sec Loss 4.9366 LearningRate 0.1234 Epoch: 10 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:12,329-Speed 10550.17 samples/sec Loss 4.9764 LearningRate 0.1233 Epoch: 10 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:16,738-Speed 9291.93 samples/sec Loss 4.9459 LearningRate 0.1233 Epoch: 10 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:20,620-Speed 10553.56 samples/sec Loss 4.9542 LearningRate 0.1232 Epoch: 10 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:24,517-Speed 10511.88 samples/sec Loss 4.9975 LearningRate 0.1232 Epoch: 10 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:28,728-Speed 9729.76 samples/sec Loss 4.9813 LearningRate 0.1231 Epoch: 10 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:32,320-Speed 11406.10 samples/sec Loss 4.9814 LearningRate 0.1230 Epoch: 10 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:06:35,896-Speed 11456.89 samples/sec Loss 4.9532 LearningRate 0.1230 Epoch: 10 Global Step: 41800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:06:39,721-Speed 10709.39 samples/sec Loss 4.9916 LearningRate 0.1229 Epoch: 10 Global Step: 41810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:06:43,452-Speed 10982.69 samples/sec Loss 4.9963 LearningRate 0.1229 Epoch: 10 Global Step: 41820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:06:47,416-Speed 10335.12 samples/sec Loss 5.0400 LearningRate 0.1228 Epoch: 10 Global Step: 41830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:06:51,103-Speed 11113.17 samples/sec Loss 5.0399 LearningRate 0.1227 Epoch: 10 Global Step: 41840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:06:54,716-Speed 11339.30 samples/sec Loss 5.0297 LearningRate 0.1227 Epoch: 10 Global Step: 41850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:06:58,526-Speed 10753.27 samples/sec Loss 5.0541 LearningRate 0.1226 Epoch: 10 Global Step: 41860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:02,332-Speed 10764.50 samples/sec Loss 5.0499 LearningRate 0.1226 Epoch: 10 Global Step: 41870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:06,049-Speed 11022.21 samples/sec Loss 5.0817 LearningRate 0.1225 Epoch: 10 Global Step: 41880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:10,711-Speed 8788.41 samples/sec Loss 5.0870 LearningRate 0.1225 Epoch: 10 Global Step: 41890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:14,591-Speed 10557.87 samples/sec Loss 5.0625 LearningRate 0.1224 Epoch: 10 Global Step: 41900 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:18,061-Speed 11808.82 samples/sec Loss 5.1052 LearningRate 0.1223 Epoch: 10 Global Step: 41910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:21,554-Speed 11730.41 samples/sec Loss 5.1620 LearningRate 0.1223 Epoch: 10 Global Step: 41920 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:25,076-Speed 11629.34 samples/sec Loss 5.0468 LearningRate 0.1222 Epoch: 10 Global Step: 41930 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:28,513-Speed 11923.58 samples/sec Loss 5.1435 LearningRate 0.1222 Epoch: 10 Global Step: 41940 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:32,095-Speed 11437.10 samples/sec Loss 5.0828 LearningRate 0.1221 Epoch: 10 Global Step: 41950 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:35,477-Speed 12112.87 samples/sec Loss 5.1544 LearningRate 0.1220 Epoch: 10 Global Step: 41960 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:38,940-Speed 11832.24 samples/sec Loss 5.1173 LearningRate 0.1220 Epoch: 10 Global Step: 41970 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:42,394-Speed 11862.68 samples/sec Loss 5.1423 LearningRate 0.1219 Epoch: 10 Global Step: 41980 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:45,822-Speed 11952.12 samples/sec Loss 5.1428 LearningRate 0.1219 Epoch: 10 Global Step: 41990 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:49,504-Speed 11130.06 samples/sec Loss 5.1418 LearningRate 0.1218 Epoch: 10 Global Step: 42000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:53,217-Speed 11031.24 samples/sec Loss 5.1319 LearningRate 0.1217 Epoch: 10 Global Step: 42010 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:07:56,990-Speed 10859.60 samples/sec Loss 5.1227 LearningRate 0.1217 Epoch: 10 Global Step: 42020 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:08:00,875-Speed 10545.07 samples/sec Loss 5.1271 LearningRate 0.1216 Epoch: 10 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:04,608-Speed 10973.29 samples/sec Loss 5.1122 LearningRate 0.1216 Epoch: 10 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:08,041-Speed 11935.35 samples/sec Loss 5.1459 LearningRate 0.1215 Epoch: 10 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:11,517-Speed 11787.65 samples/sec Loss 5.1420 LearningRate 0.1215 Epoch: 10 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:15,070-Speed 11530.51 samples/sec Loss 5.1342 LearningRate 0.1214 Epoch: 10 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:18,874-Speed 10770.47 samples/sec Loss 5.1760 LearningRate 0.1213 Epoch: 10 Global Step: 42080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:22,491-Speed 11328.70 samples/sec Loss 5.2059 LearningRate 0.1213 Epoch: 10 Global Step: 42090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:25,924-Speed 11932.85 samples/sec Loss 5.1484 LearningRate 0.1212 Epoch: 10 Global Step: 42100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:29,376-Speed 11868.06 samples/sec Loss 5.2147 LearningRate 0.1212 Epoch: 10 Global Step: 42110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:33,057-Speed 11130.71 samples/sec Loss 5.2569 LearningRate 0.1211 Epoch: 10 Global Step: 42120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:08:36,559-Speed 11699.57 samples/sec Loss 5.2685 LearningRate 0.1210 Epoch: 10 Global Step: 42130 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:08:40,038-Speed 11776.13 samples/sec Loss 5.1774 LearningRate 0.1210 Epoch: 10 Global Step: 42140 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:08:43,538-Speed 11706.56 samples/sec Loss 5.2309 LearningRate 0.1209 Epoch: 10 Global Step: 42150 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:08:46,998-Speed 11843.32 samples/sec Loss 5.2168 LearningRate 0.1209 Epoch: 10 Global Step: 42160 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:08:50,540-Speed 11567.24 samples/sec Loss 5.2449 LearningRate 0.1208 Epoch: 10 Global Step: 42170 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:08:54,341-Speed 10776.69 samples/sec Loss 5.2185 LearningRate 0.1207 Epoch: 10 Global Step: 42180 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:08:57,986-Speed 11240.90 samples/sec Loss 5.2992 LearningRate 0.1207 Epoch: 10 Global Step: 42190 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:09:01,629-Speed 11245.55 samples/sec Loss 5.2794 LearningRate 0.1206 Epoch: 10 Global Step: 42200 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:09:05,841-Speed 9727.10 samples/sec Loss 5.2371 LearningRate 0.1206 Epoch: 10 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:09,238-Speed 12061.93 samples/sec Loss 5.2078 LearningRate 0.1205 Epoch: 10 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:14,123-Speed 8386.50 samples/sec Loss 5.2888 LearningRate 0.1205 Epoch: 10 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:17,570-Speed 11886.21 samples/sec Loss 5.2471 LearningRate 0.1204 Epoch: 10 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:21,010-Speed 11910.65 samples/sec Loss 5.2202 LearningRate 0.1203 Epoch: 10 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:24,540-Speed 11605.37 samples/sec Loss 5.2439 LearningRate 0.1203 Epoch: 10 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:28,336-Speed 10793.17 samples/sec Loss 5.2841 LearningRate 0.1202 Epoch: 10 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:31,858-Speed 11632.81 samples/sec Loss 5.3342 LearningRate 0.1202 Epoch: 10 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:35,562-Speed 11061.33 samples/sec Loss 5.2850 LearningRate 0.1201 Epoch: 10 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:38,940-Speed 12127.28 samples/sec Loss 5.2740 LearningRate 0.1200 Epoch: 10 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:09:42,320-Speed 12124.51 samples/sec Loss 5.3335 LearningRate 0.1200 Epoch: 10 Global Step: 42310 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:09:45,735-Speed 11996.30 samples/sec Loss 5.2785 LearningRate 0.1199 Epoch: 10 Global Step: 42320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:09:49,152-Speed 11993.24 samples/sec Loss 5.3253 LearningRate 0.1199 Epoch: 10 Global Step: 42330 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:09:52,897-Speed 10937.74 samples/sec Loss 5.3537 LearningRate 0.1198 Epoch: 10 Global Step: 42340 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:09:56,408-Speed 11669.76 samples/sec Loss 5.2756 LearningRate 0.1198 Epoch: 10 Global Step: 42350 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:09:59,795-Speed 12095.83 samples/sec Loss 5.2909 LearningRate 0.1197 Epoch: 10 Global Step: 42360 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:10:04,038-Speed 9657.71 samples/sec Loss 5.2766 LearningRate 0.1196 Epoch: 10 Global Step: 42370 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:10:07,533-Speed 11723.01 samples/sec Loss 5.3057 LearningRate 0.1196 Epoch: 10 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:11,161-Speed 11291.27 samples/sec Loss 5.2644 LearningRate 0.1195 Epoch: 10 Global Step: 42390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:14,657-Speed 11720.36 samples/sec Loss 5.3329 LearningRate 0.1195 Epoch: 10 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:18,130-Speed 11796.10 samples/sec Loss 5.2698 LearningRate 0.1194 Epoch: 10 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:21,701-Speed 11473.85 samples/sec Loss 5.3354 LearningRate 0.1193 Epoch: 10 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:25,343-Speed 11248.68 samples/sec Loss 5.3013 LearningRate 0.1193 Epoch: 10 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:28,929-Speed 11426.21 samples/sec Loss 5.2782 LearningRate 0.1192 Epoch: 10 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:32,445-Speed 11653.46 samples/sec Loss 5.3251 LearningRate 0.1192 Epoch: 10 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:35,877-Speed 11936.49 samples/sec Loss 5.3401 LearningRate 0.1191 Epoch: 10 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:39,615-Speed 10960.05 samples/sec Loss 5.3751 LearningRate 0.1191 Epoch: 10 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:43,179-Speed 11496.43 samples/sec Loss 5.3638 LearningRate 0.1190 Epoch: 10 Global Step: 42480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:10:46,932-Speed 10918.04 samples/sec Loss 5.3306 LearningRate 0.1189 Epoch: 10 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:50,620-Speed 11110.79 samples/sec Loss 5.3203 LearningRate 0.1189 Epoch: 10 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:54,421-Speed 10777.39 samples/sec Loss 5.4111 LearningRate 0.1188 Epoch: 10 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:10:58,116-Speed 11088.63 samples/sec Loss 5.3584 LearningRate 0.1188 Epoch: 10 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:01,984-Speed 10592.82 samples/sec Loss 5.3548 LearningRate 0.1187 Epoch: 10 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:05,710-Speed 10997.41 samples/sec Loss 5.3692 LearningRate 0.1187 Epoch: 10 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:09,382-Speed 11159.30 samples/sec Loss 5.3294 LearningRate 0.1186 Epoch: 10 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:12,913-Speed 11602.18 samples/sec Loss 5.2935 LearningRate 0.1185 Epoch: 10 Global Step: 42560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:17,091-Speed 9805.40 samples/sec Loss 5.3720 LearningRate 0.1185 Epoch: 10 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:20,513-Speed 11975.66 samples/sec Loss 5.3640 LearningRate 0.1184 Epoch: 10 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:24,000-Speed 11746.19 samples/sec Loss 5.3869 LearningRate 0.1184 Epoch: 10 Global Step: 42590 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:11:27,684-Speed 11120.31 samples/sec Loss 5.3381 LearningRate 0.1183 Epoch: 10 Global Step: 42600 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:11:31,116-Speed 11937.02 samples/sec Loss 5.3393 LearningRate 0.1182 Epoch: 10 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:34,515-Speed 12056.89 samples/sec Loss 5.3978 LearningRate 0.1182 Epoch: 10 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:38,074-Speed 11513.60 samples/sec Loss 5.3469 LearningRate 0.1181 Epoch: 10 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:41,601-Speed 11616.48 samples/sec Loss 5.3644 LearningRate 0.1181 Epoch: 10 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:45,262-Speed 11189.75 samples/sec Loss 5.4017 LearningRate 0.1180 Epoch: 10 Global Step: 42650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:48,689-Speed 11955.01 samples/sec Loss 5.4115 LearningRate 0.1180 Epoch: 10 Global Step: 42660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:52,529-Speed 10669.44 samples/sec Loss 5.4321 LearningRate 0.1179 Epoch: 10 Global Step: 42670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:56,043-Speed 11661.25 samples/sec Loss 5.3801 LearningRate 0.1178 Epoch: 10 Global Step: 42680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:11:59,466-Speed 11969.02 samples/sec Loss 5.3717 LearningRate 0.1178 Epoch: 10 Global Step: 42690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:02,858-Speed 12075.76 samples/sec Loss 5.3635 LearningRate 0.1177 Epoch: 10 Global Step: 42700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:06,459-Speed 11380.59 samples/sec Loss 5.3547 LearningRate 0.1177 Epoch: 10 Global Step: 42710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:12:10,333-Speed 10572.24 samples/sec Loss 5.4094 LearningRate 0.1176 Epoch: 10 Global Step: 42720 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:12:14,053-Speed 11015.23 samples/sec Loss 5.4075 LearningRate 0.1176 Epoch: 10 Global Step: 42730 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:12:18,566-Speed 9079.08 samples/sec Loss 5.3973 LearningRate 0.1175 Epoch: 10 Global Step: 42740 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:12:22,108-Speed 11568.44 samples/sec Loss 5.4122 LearningRate 0.1174 Epoch: 10 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:25,742-Speed 11273.59 samples/sec Loss 5.3974 LearningRate 0.1174 Epoch: 10 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:29,178-Speed 11923.32 samples/sec Loss 5.3968 LearningRate 0.1173 Epoch: 10 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:32,884-Speed 11054.82 samples/sec Loss 5.3881 LearningRate 0.1173 Epoch: 10 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:36,805-Speed 10450.29 samples/sec Loss 5.3960 LearningRate 0.1172 Epoch: 10 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:40,365-Speed 11507.97 samples/sec Loss 5.4256 LearningRate 0.1171 Epoch: 10 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:43,938-Speed 11468.69 samples/sec Loss 5.4260 LearningRate 0.1171 Epoch: 10 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:47,544-Speed 11362.78 samples/sec Loss 5.3949 LearningRate 0.1170 Epoch: 10 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:51,013-Speed 11810.33 samples/sec Loss 5.4227 LearningRate 0.1170 Epoch: 10 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:54,434-Speed 11976.73 samples/sec Loss 5.4009 LearningRate 0.1169 Epoch: 10 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:12:57,994-Speed 11511.03 samples/sec Loss 5.4139 LearningRate 0.1169 Epoch: 10 Global Step: 42850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:13:01,559-Speed 11489.80 samples/sec Loss 5.4346 LearningRate 0.1168 Epoch: 10 Global Step: 42860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:13:05,023-Speed 11827.56 samples/sec Loss 5.4113 LearningRate 0.1167 Epoch: 10 Global Step: 42870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:13:08,656-Speed 11279.84 samples/sec Loss 5.3606 LearningRate 0.1167 Epoch: 10 Global Step: 42880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:13:12,391-Speed 10969.62 samples/sec Loss 5.4003 LearningRate 0.1166 Epoch: 10 Global Step: 42890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:13:16,823-Speed 9242.67 samples/sec Loss 5.3750 LearningRate 0.1166 Epoch: 10 Global Step: 42900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:20,373-Speed 11542.38 samples/sec Loss 5.3853 LearningRate 0.1165 Epoch: 10 Global Step: 42910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:24,104-Speed 10980.04 samples/sec Loss 5.4413 LearningRate 0.1165 Epoch: 10 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:27,578-Speed 11794.84 samples/sec Loss 5.4107 LearningRate 0.1164 Epoch: 10 Global Step: 42930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:30,975-Speed 12059.91 samples/sec Loss 5.4117 LearningRate 0.1163 Epoch: 10 Global Step: 42940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:34,457-Speed 11768.82 samples/sec Loss 5.4507 LearningRate 0.1163 Epoch: 10 Global Step: 42950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:38,250-Speed 10799.95 samples/sec Loss 5.4353 LearningRate 0.1162 Epoch: 10 Global Step: 42960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:41,721-Speed 11802.62 samples/sec Loss 5.4549 LearningRate 0.1162 Epoch: 10 Global Step: 42970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:45,374-Speed 11216.50 samples/sec Loss 5.4934 LearningRate 0.1161 Epoch: 10 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:49,085-Speed 11042.12 samples/sec Loss 5.4351 LearningRate 0.1161 Epoch: 10 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:13:52,665-Speed 11446.18 samples/sec Loss 5.4804 LearningRate 0.1160 Epoch: 10 Global Step: 43000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:13:56,245-Speed 11452.91 samples/sec Loss 5.4602 LearningRate 0.1159 Epoch: 10 Global Step: 43010 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:13:59,680-Speed 11927.10 samples/sec Loss 5.4397 LearningRate 0.1159 Epoch: 10 Global Step: 43020 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:03,722-Speed 10138.51 samples/sec Loss 5.4630 LearningRate 0.1158 Epoch: 10 Global Step: 43030 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:07,277-Speed 11523.38 samples/sec Loss 5.3838 LearningRate 0.1158 Epoch: 10 Global Step: 43040 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:10,878-Speed 11375.59 samples/sec Loss 5.4426 LearningRate 0.1157 Epoch: 10 Global Step: 43050 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:14,281-Speed 12039.36 samples/sec Loss 5.4273 LearningRate 0.1157 Epoch: 10 Global Step: 43060 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:18,545-Speed 9610.23 samples/sec Loss 5.4066 LearningRate 0.1156 Epoch: 10 Global Step: 43070 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:22,021-Speed 11786.43 samples/sec Loss 5.4123 LearningRate 0.1155 Epoch: 10 Global Step: 43080 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:25,460-Speed 11916.55 samples/sec Loss 5.4684 LearningRate 0.1155 Epoch: 10 Global Step: 43090 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:29,113-Speed 11213.85 samples/sec Loss 5.4364 LearningRate 0.1154 Epoch: 10 Global Step: 43100 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-17 02:14:32,596-Speed 11764.78 samples/sec Loss 5.4387 LearningRate 0.1154 Epoch: 10 Global Step: 43110 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:36,100-Speed 11690.51 samples/sec Loss 5.4289 LearningRate 0.1153 Epoch: 10 Global Step: 43120 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:39,536-Speed 11924.55 samples/sec Loss 5.4266 LearningRate 0.1153 Epoch: 10 Global Step: 43130 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:43,132-Speed 11393.85 samples/sec Loss 5.4061 LearningRate 0.1152 Epoch: 10 Global Step: 43140 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:47,096-Speed 10334.94 samples/sec Loss 5.4530 LearningRate 0.1151 Epoch: 10 Global Step: 43150 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:50,655-Speed 11512.37 samples/sec Loss 5.4684 LearningRate 0.1151 Epoch: 10 Global Step: 43160 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:54,068-Speed 12004.51 samples/sec Loss 5.4414 LearningRate 0.1150 Epoch: 10 Global Step: 43170 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:14:57,742-Speed 11153.70 samples/sec Loss 5.4677 LearningRate 0.1150 Epoch: 10 Global Step: 43180 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:01,957-Speed 9719.79 samples/sec Loss 5.4323 LearningRate 0.1149 Epoch: 10 Global Step: 43190 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:05,478-Speed 11633.35 samples/sec Loss 5.4777 LearningRate 0.1149 Epoch: 10 Global Step: 43200 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:08,986-Speed 11679.52 samples/sec Loss 5.4529 LearningRate 0.1148 Epoch: 10 Global Step: 43210 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:12,589-Speed 11371.56 samples/sec Loss 5.4790 LearningRate 0.1147 Epoch: 10 Global Step: 43220 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:16,374-Speed 10824.91 samples/sec Loss 5.4510 LearningRate 0.1147 Epoch: 10 Global Step: 43230 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:20,037-Speed 11185.47 samples/sec Loss 5.4451 LearningRate 0.1146 Epoch: 10 Global Step: 43240 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:23,946-Speed 10482.63 samples/sec Loss 5.4499 LearningRate 0.1146 Epoch: 10 Global Step: 43250 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:27,566-Speed 11317.69 samples/sec Loss 5.4215 LearningRate 0.1145 Epoch: 10 Global Step: 43260 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:31,320-Speed 10914.63 samples/sec Loss 5.4909 LearningRate 0.1145 Epoch: 10 Global Step: 43270 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:35,162-Speed 10662.32 samples/sec Loss 5.4926 LearningRate 0.1144 Epoch: 10 Global Step: 43280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:39,271-Speed 9970.92 samples/sec Loss 5.4493 LearningRate 0.1143 Epoch: 10 Global Step: 43290 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:43,037-Speed 10879.98 samples/sec Loss 5.4612 LearningRate 0.1143 Epoch: 10 Global Step: 43300 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:46,518-Speed 11769.84 samples/sec Loss 5.4488 LearningRate 0.1142 Epoch: 10 Global Step: 43310 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:50,121-Speed 11371.97 samples/sec Loss 5.4562 LearningRate 0.1142 Epoch: 10 Global Step: 43320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:53,789-Speed 11170.68 samples/sec Loss 5.4608 LearningRate 0.1141 Epoch: 10 Global Step: 43330 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:15:57,558-Speed 10870.35 samples/sec Loss 5.4787 LearningRate 0.1141 Epoch: 10 Global Step: 43340 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:16:01,021-Speed 11831.36 samples/sec Loss 5.4523 LearningRate 0.1140 Epoch: 10 Global Step: 43350 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:16:04,738-Speed 11020.95 samples/sec Loss 5.4924 LearningRate 0.1139 Epoch: 10 Global Step: 43360 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:16:08,212-Speed 11792.93 samples/sec Loss 5.4712 LearningRate 0.1139 Epoch: 10 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:11,668-Speed 11855.84 samples/sec Loss 5.4433 LearningRate 0.1138 Epoch: 10 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:15,461-Speed 10801.95 samples/sec Loss 5.4960 LearningRate 0.1138 Epoch: 10 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:19,149-Speed 11108.89 samples/sec Loss 5.4631 LearningRate 0.1137 Epoch: 10 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:22,635-Speed 11750.63 samples/sec Loss 5.4285 LearningRate 0.1137 Epoch: 10 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:26,193-Speed 11515.14 samples/sec Loss 5.4831 LearningRate 0.1136 Epoch: 10 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:29,783-Speed 11413.34 samples/sec Loss 5.4604 LearningRate 0.1135 Epoch: 10 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:33,263-Speed 11774.14 samples/sec Loss 5.5072 LearningRate 0.1135 Epoch: 10 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:37,509-Speed 9649.27 samples/sec Loss 5.5120 LearningRate 0.1134 Epoch: 10 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:41,083-Speed 11461.78 samples/sec Loss 5.4772 LearningRate 0.1134 Epoch: 10 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:16:44,585-Speed 11698.74 samples/sec Loss 5.4521 LearningRate 0.1133 Epoch: 10 Global Step: 43470 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:16:48,470-Speed 10545.25 samples/sec Loss 5.4946 LearningRate 0.1133 Epoch: 10 Global Step: 43480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:16:52,582-Speed 9962.78 samples/sec Loss 5.4911 LearningRate 0.1132 Epoch: 10 Global Step: 43490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:16:56,127-Speed 11560.04 samples/sec Loss 5.4678 LearningRate 0.1131 Epoch: 10 Global Step: 43500 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:16:59,550-Speed 11968.25 samples/sec Loss 5.4579 LearningRate 0.1131 Epoch: 10 Global Step: 43510 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:17:02,958-Speed 12023.57 samples/sec Loss 5.4529 LearningRate 0.1130 Epoch: 10 Global Step: 43520 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:17:06,859-Speed 10500.62 samples/sec Loss 5.5248 LearningRate 0.1130 Epoch: 10 Global Step: 43530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:10,332-Speed 11797.54 samples/sec Loss 5.4930 LearningRate 0.1129 Epoch: 10 Global Step: 43540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:13,730-Speed 12055.93 samples/sec Loss 5.4755 LearningRate 0.1129 Epoch: 10 Global Step: 43550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:17,193-Speed 11831.32 samples/sec Loss 5.5191 LearningRate 0.1128 Epoch: 10 Global Step: 43560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:20,858-Speed 11179.57 samples/sec Loss 5.4712 LearningRate 0.1128 Epoch: 10 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:24,335-Speed 11784.33 samples/sec Loss 5.4602 LearningRate 0.1127 Epoch: 10 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:27,833-Speed 11713.54 samples/sec Loss 5.4440 LearningRate 0.1126 Epoch: 10 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:31,495-Speed 11189.58 samples/sec Loss 5.5067 LearningRate 0.1126 Epoch: 10 Global Step: 43600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:35,029-Speed 11591.73 samples/sec Loss 5.4567 LearningRate 0.1125 Epoch: 10 Global Step: 43610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:39,375-Speed 9427.79 samples/sec Loss 5.5014 LearningRate 0.1125 Epoch: 10 Global Step: 43620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:17:42,861-Speed 11751.72 samples/sec Loss 5.4939 LearningRate 0.1124 Epoch: 10 Global Step: 43630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:17:46,310-Speed 11881.06 samples/sec Loss 5.4764 LearningRate 0.1124 Epoch: 10 Global Step: 43640 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:17:49,769-Speed 11844.65 samples/sec Loss 5.4726 LearningRate 0.1123 Epoch: 10 Global Step: 43650 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:17:53,264-Speed 11724.49 samples/sec Loss 5.4868 LearningRate 0.1122 Epoch: 10 Global Step: 43660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:17:56,693-Speed 11945.35 samples/sec Loss 5.4014 LearningRate 0.1122 Epoch: 10 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:00,212-Speed 11645.52 samples/sec Loss 5.5419 LearningRate 0.1121 Epoch: 10 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:03,650-Speed 11913.84 samples/sec Loss 5.4977 LearningRate 0.1121 Epoch: 10 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:07,063-Speed 12005.42 samples/sec Loss 5.3949 LearningRate 0.1120 Epoch: 10 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:11,179-Speed 9955.24 samples/sec Loss 5.4916 LearningRate 0.1120 Epoch: 10 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:14,864-Speed 11118.46 samples/sec Loss 5.4535 LearningRate 0.1119 Epoch: 10 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:18,410-Speed 11552.33 samples/sec Loss 5.4362 LearningRate 0.1118 Epoch: 10 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:21,873-Speed 11828.35 samples/sec Loss 5.4391 LearningRate 0.1118 Epoch: 10 Global Step: 43740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:25,312-Speed 11915.06 samples/sec Loss 5.4879 LearningRate 0.1117 Epoch: 10 Global Step: 43750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:28,794-Speed 11767.88 samples/sec Loss 5.4493 LearningRate 0.1117 Epoch: 10 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:18:32,648-Speed 10630.91 samples/sec Loss 5.5305 LearningRate 0.1116 Epoch: 10 Global Step: 43770 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:18:36,367-Speed 11016.71 samples/sec Loss 5.4627 LearningRate 0.1116 Epoch: 10 Global Step: 43780 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:18:39,814-Speed 11883.74 samples/sec Loss 5.5236 LearningRate 0.1115 Epoch: 10 Global Step: 43790 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:18:43,183-Speed 12164.71 samples/sec Loss 5.4765 LearningRate 0.1115 Epoch: 10 Global Step: 43800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:18:46,725-Speed 11567.18 samples/sec Loss 5.4770 LearningRate 0.1114 Epoch: 10 Global Step: 43810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:18:50,368-Speed 11247.55 samples/sec Loss 5.4692 LearningRate 0.1113 Epoch: 10 Global Step: 43820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:18:54,055-Speed 11111.74 samples/sec Loss 5.4648 LearningRate 0.1113 Epoch: 10 Global Step: 43830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:18:57,718-Speed 11183.18 samples/sec Loss 5.4465 LearningRate 0.1112 Epoch: 10 Global Step: 43840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:19:01,315-Speed 11394.35 samples/sec Loss 5.5217 LearningRate 0.1112 Epoch: 10 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:05,760-Speed 9215.75 samples/sec Loss 5.4688 LearningRate 0.1111 Epoch: 10 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:09,180-Speed 11980.26 samples/sec Loss 5.5386 LearningRate 0.1111 Epoch: 10 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:12,820-Speed 11254.79 samples/sec Loss 5.4428 LearningRate 0.1110 Epoch: 10 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:16,564-Speed 10943.61 samples/sec Loss 5.5245 LearningRate 0.1109 Epoch: 10 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:20,071-Speed 11682.70 samples/sec Loss 5.4902 LearningRate 0.1109 Epoch: 10 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:23,592-Speed 11637.61 samples/sec Loss 5.4970 LearningRate 0.1108 Epoch: 10 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:27,369-Speed 10846.60 samples/sec Loss 5.4879 LearningRate 0.1108 Epoch: 10 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:31,070-Speed 11071.14 samples/sec Loss 5.5387 LearningRate 0.1107 Epoch: 10 Global Step: 43930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:34,486-Speed 11994.94 samples/sec Loss 5.5053 LearningRate 0.1107 Epoch: 10 Global Step: 43940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:38,096-Speed 11347.01 samples/sec Loss 5.4716 LearningRate 0.1106 Epoch: 10 Global Step: 43950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:41,537-Speed 11908.57 samples/sec Loss 5.5102 LearningRate 0.1106 Epoch: 10 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:45,143-Speed 11360.64 samples/sec Loss 5.5178 LearningRate 0.1105 Epoch: 10 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:48,569-Speed 11959.05 samples/sec Loss 5.4458 LearningRate 0.1104 Epoch: 10 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:52,030-Speed 11836.04 samples/sec Loss 5.4298 LearningRate 0.1104 Epoch: 10 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:55,668-Speed 11261.98 samples/sec Loss 5.5119 LearningRate 0.1103 Epoch: 10 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:19:59,335-Speed 11173.99 samples/sec Loss 5.4772 LearningRate 0.1103 Epoch: 10 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:20:03,654-Speed 9485.72 samples/sec Loss 5.5484 LearningRate 0.1102 Epoch: 10 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:20:07,135-Speed 11769.37 samples/sec Loss 5.4658 LearningRate 0.1102 Epoch: 10 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:20:10,571-Speed 11924.92 samples/sec Loss 5.4971 LearningRate 0.1101 Epoch: 10 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:20:14,112-Speed 11570.03 samples/sec Loss 5.5211 LearningRate 0.1101 Epoch: 10 Global Step: 44050 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:17,602-Speed 11740.09 samples/sec Loss 5.4680 LearningRate 0.1100 Epoch: 10 Global Step: 44060 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:21,036-Speed 11932.63 samples/sec Loss 5.4877 LearningRate 0.1099 Epoch: 10 Global Step: 44070 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:24,574-Speed 11579.75 samples/sec Loss 5.4595 LearningRate 0.1099 Epoch: 10 Global Step: 44080 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:28,177-Speed 11369.50 samples/sec Loss 5.4791 LearningRate 0.1098 Epoch: 10 Global Step: 44090 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:31,784-Speed 11361.36 samples/sec Loss 5.4907 LearningRate 0.1098 Epoch: 10 Global Step: 44100 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:35,521-Speed 10961.57 samples/sec Loss 5.4904 LearningRate 0.1097 Epoch: 10 Global Step: 44110 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:39,400-Speed 10562.94 samples/sec Loss 5.4861 LearningRate 0.1097 Epoch: 10 Global Step: 44120 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:42,868-Speed 11811.85 samples/sec Loss 5.4827 LearningRate 0.1096 Epoch: 10 Global Step: 44130 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:46,401-Speed 11598.02 samples/sec Loss 5.4869 LearningRate 0.1095 Epoch: 10 Global Step: 44140 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:49,916-Speed 11656.57 samples/sec Loss 5.4700 LearningRate 0.1095 Epoch: 10 Global Step: 44150 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:53,570-Speed 11210.94 samples/sec Loss 5.4407 LearningRate 0.1094 Epoch: 10 Global Step: 44160 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:20:57,234-Speed 11184.39 samples/sec Loss 5.4265 LearningRate 0.1094 Epoch: 10 Global Step: 44170 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:21:00,675-Speed 11904.86 samples/sec Loss 5.5065 LearningRate 0.1093 Epoch: 10 Global Step: 44180 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:21:04,102-Speed 11958.61 samples/sec Loss 5.5623 LearningRate 0.1093 Epoch: 10 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:07,835-Speed 10973.31 samples/sec Loss 5.4938 LearningRate 0.1092 Epoch: 10 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:11,339-Speed 11691.97 samples/sec Loss 5.5029 LearningRate 0.1092 Epoch: 10 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:15,261-Speed 10445.60 samples/sec Loss 5.5105 LearningRate 0.1091 Epoch: 10 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:18,878-Speed 11328.03 samples/sec Loss 5.4807 LearningRate 0.1090 Epoch: 10 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:22,990-Speed 9963.32 samples/sec Loss 5.4833 LearningRate 0.1090 Epoch: 10 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:26,425-Speed 11925.53 samples/sec Loss 5.5037 LearningRate 0.1089 Epoch: 10 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:30,319-Speed 10524.43 samples/sec Loss 5.4703 LearningRate 0.1089 Epoch: 10 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:33,886-Speed 11485.02 samples/sec Loss 5.4989 LearningRate 0.1088 Epoch: 10 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:37,299-Speed 12002.81 samples/sec Loss 5.5130 LearningRate 0.1088 Epoch: 10 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:21:40,735-Speed 11924.16 samples/sec Loss 5.5043 LearningRate 0.1087 Epoch: 10 Global Step: 44290 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:21:44,255-Speed 11639.15 samples/sec Loss 5.5044 LearningRate 0.1087 Epoch: 10 Global Step: 44300 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:21:47,792-Speed 11582.09 samples/sec Loss 5.4435 LearningRate 0.1086 Epoch: 10 Global Step: 44310 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:21:51,261-Speed 11811.60 samples/sec Loss 5.4459 LearningRate 0.1085 Epoch: 10 Global Step: 44320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:21:54,837-Speed 11459.03 samples/sec Loss 5.5509 LearningRate 0.1085 Epoch: 10 Global Step: 44330 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:21:58,397-Speed 11505.87 samples/sec Loss 5.5365 LearningRate 0.1084 Epoch: 10 Global Step: 44340 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:22:01,989-Speed 11406.73 samples/sec Loss 5.5370 LearningRate 0.1084 Epoch: 10 Global Step: 44350 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:22:05,800-Speed 10749.06 samples/sec Loss 5.5070 LearningRate 0.1083 Epoch: 10 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:09,959-Speed 9851.73 samples/sec Loss 5.4815 LearningRate 0.1083 Epoch: 10 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:13,399-Speed 11911.66 samples/sec Loss 5.5352 LearningRate 0.1082 Epoch: 10 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:16,770-Speed 12154.65 samples/sec Loss 5.4801 LearningRate 0.1082 Epoch: 10 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:20,205-Speed 11925.33 samples/sec Loss 5.5344 LearningRate 0.1081 Epoch: 10 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:23,675-Speed 11807.53 samples/sec Loss 5.4919 LearningRate 0.1080 Epoch: 10 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:27,858-Speed 9793.99 samples/sec Loss 5.5025 LearningRate 0.1080 Epoch: 10 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:31,252-Speed 12073.35 samples/sec Loss 5.4303 LearningRate 0.1079 Epoch: 10 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:34,794-Speed 11569.13 samples/sec Loss 5.4780 LearningRate 0.1079 Epoch: 10 Global Step: 44440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:38,634-Speed 10668.54 samples/sec Loss 5.5515 LearningRate 0.1078 Epoch: 10 Global Step: 44450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:22:42,219-Speed 11426.23 samples/sec Loss 5.4586 LearningRate 0.1078 Epoch: 10 Global Step: 44460 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:22:45,858-Speed 11258.29 samples/sec Loss 5.5318 LearningRate 0.1077 Epoch: 10 Global Step: 44470 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:22:49,459-Speed 11378.28 samples/sec Loss 5.4575 LearningRate 0.1077 Epoch: 10 Global Step: 44480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:22:53,071-Speed 11344.02 samples/sec Loss 5.4396 LearningRate 0.1076 Epoch: 10 Global Step: 44490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:22:56,616-Speed 11555.07 samples/sec Loss 5.5279 LearningRate 0.1076 Epoch: 10 Global Step: 44500 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:00,363-Speed 10934.35 samples/sec Loss 5.4914 LearningRate 0.1075 Epoch: 10 Global Step: 44510 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:04,057-Speed 11093.20 samples/sec Loss 5.5370 LearningRate 0.1074 Epoch: 10 Global Step: 44520 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:07,493-Speed 11921.54 samples/sec Loss 5.4557 LearningRate 0.1074 Epoch: 10 Global Step: 44530 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:11,180-Speed 11112.21 samples/sec Loss 5.4502 LearningRate 0.1073 Epoch: 10 Global Step: 44540 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:14,696-Speed 11651.44 samples/sec Loss 5.4573 LearningRate 0.1073 Epoch: 10 Global Step: 44550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:18,180-Speed 11761.48 samples/sec Loss 5.5068 LearningRate 0.1072 Epoch: 10 Global Step: 44560 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-17 02:23:21,744-Speed 11493.61 samples/sec Loss 5.5048 LearningRate 0.1072 Epoch: 10 Global Step: 44570 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:25,275-Speed 11605.00 samples/sec Loss 5.4911 LearningRate 0.1071 Epoch: 10 Global Step: 44580 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:29,716-Speed 9224.64 samples/sec Loss 5.4777 LearningRate 0.1071 Epoch: 10 Global Step: 44590 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:33,226-Speed 11673.62 samples/sec Loss 5.5457 LearningRate 0.1070 Epoch: 10 Global Step: 44600 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:36,819-Speed 11401.76 samples/sec Loss 5.4692 LearningRate 0.1069 Epoch: 10 Global Step: 44610 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:40,341-Speed 11631.97 samples/sec Loss 5.4974 LearningRate 0.1069 Epoch: 10 Global Step: 44620 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:44,170-Speed 10703.17 samples/sec Loss 5.4717 LearningRate 0.1068 Epoch: 10 Global Step: 44630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:47,970-Speed 10778.27 samples/sec Loss 5.4677 LearningRate 0.1068 Epoch: 10 Global Step: 44640 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:51,416-Speed 11891.05 samples/sec Loss 5.5385 LearningRate 0.1067 Epoch: 10 Global Step: 44650 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:55,059-Speed 11244.78 samples/sec Loss 5.5358 LearningRate 0.1067 Epoch: 10 Global Step: 44660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:23:58,537-Speed 11780.64 samples/sec Loss 5.4398 LearningRate 0.1066 Epoch: 10 Global Step: 44670 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-17 02:24:02,125-Speed 11420.24 samples/sec Loss 5.4974 LearningRate 0.1066 Epoch: 10 Global Step: 44680 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:24:05,803-Speed 11142.11 samples/sec Loss 5.5185 LearningRate 0.1065 Epoch: 10 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:09,295-Speed 11732.26 samples/sec Loss 5.4517 LearningRate 0.1064 Epoch: 10 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:12,829-Speed 11596.81 samples/sec Loss 5.5177 LearningRate 0.1064 Epoch: 10 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:16,754-Speed 10436.44 samples/sec Loss 5.4935 LearningRate 0.1063 Epoch: 10 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:20,230-Speed 11788.31 samples/sec Loss 5.5017 LearningRate 0.1063 Epoch: 10 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:24,159-Speed 10426.74 samples/sec Loss 5.4839 LearningRate 0.1062 Epoch: 10 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:27,627-Speed 11814.21 samples/sec Loss 5.4835 LearningRate 0.1062 Epoch: 10 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:32,286-Speed 8794.19 samples/sec Loss 5.4273 LearningRate 0.1061 Epoch: 10 Global Step: 44760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:35,768-Speed 11766.51 samples/sec Loss 5.4930 LearningRate 0.1061 Epoch: 10 Global Step: 44770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:39,181-Speed 12003.92 samples/sec Loss 5.4992 LearningRate 0.1060 Epoch: 10 Global Step: 44780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:24:42,658-Speed 11782.93 samples/sec Loss 5.5159 LearningRate 0.1060 Epoch: 10 Global Step: 44790 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:24:46,099-Speed 11906.73 samples/sec Loss 5.5217 LearningRate 0.1059 Epoch: 10 Global Step: 44800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:24:49,690-Speed 11409.17 samples/sec Loss 5.4481 LearningRate 0.1058 Epoch: 10 Global Step: 44810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:24:53,125-Speed 11930.98 samples/sec Loss 5.4426 LearningRate 0.1058 Epoch: 10 Global Step: 44820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:24:56,609-Speed 11760.96 samples/sec Loss 5.4809 LearningRate 0.1057 Epoch: 10 Global Step: 44830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:00,098-Speed 11744.05 samples/sec Loss 5.4709 LearningRate 0.1057 Epoch: 10 Global Step: 44840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:03,640-Speed 11564.88 samples/sec Loss 5.5199 LearningRate 0.1056 Epoch: 10 Global Step: 44850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:07,260-Speed 11320.42 samples/sec Loss 5.4839 LearningRate 0.1056 Epoch: 10 Global Step: 44860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:10,748-Speed 11744.54 samples/sec Loss 5.4432 LearningRate 0.1055 Epoch: 10 Global Step: 44870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:14,812-Speed 10081.09 samples/sec Loss 5.5040 LearningRate 0.1055 Epoch: 10 Global Step: 44880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:18,497-Speed 11117.48 samples/sec Loss 5.4334 LearningRate 0.1054 Epoch: 10 Global Step: 44890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:22,058-Speed 11504.10 samples/sec Loss 5.4934 LearningRate 0.1054 Epoch: 10 Global Step: 44900 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:25:25,607-Speed 11546.34 samples/sec Loss 5.5092 LearningRate 0.1053 Epoch: 10 Global Step: 44910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:29,072-Speed 11826.46 samples/sec Loss 5.5017 LearningRate 0.1052 Epoch: 10 Global Step: 44920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:32,581-Speed 11673.38 samples/sec Loss 5.5252 LearningRate 0.1052 Epoch: 10 Global Step: 44930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:36,045-Speed 11827.51 samples/sec Loss 5.4961 LearningRate 0.1051 Epoch: 10 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:39,567-Speed 11634.39 samples/sec Loss 5.4939 LearningRate 0.1051 Epoch: 10 Global Step: 44950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:43,032-Speed 11824.64 samples/sec Loss 5.5059 LearningRate 0.1050 Epoch: 10 Global Step: 44960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:46,513-Speed 11769.96 samples/sec Loss 5.4812 LearningRate 0.1050 Epoch: 10 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:50,215-Speed 11068.46 samples/sec Loss 5.4803 LearningRate 0.1049 Epoch: 10 Global Step: 44980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:53,693-Speed 11777.56 samples/sec Loss 5.5104 LearningRate 0.1049 Epoch: 10 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:25:57,181-Speed 11746.05 samples/sec Loss 5.4893 LearningRate 0.1048 Epoch: 10 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:26:18,455-[lfw][45000]XNorm: 9.878356 Training: 2022-01-17 02:26:18,456-[lfw][45000]Accuracy-Flip: 0.99583+-0.00359 Training: 2022-01-17 02:26:18,456-[lfw][45000]Accuracy-Highest: 0.99650 Training: 2022-01-17 02:26:42,953-[cfp_fp][45000]XNorm: 8.377233 Training: 2022-01-17 02:26:42,954-[cfp_fp][45000]Accuracy-Flip: 0.97014+-0.00955 Training: 2022-01-17 02:26:42,954-[cfp_fp][45000]Accuracy-Highest: 0.97014 Training: 2022-01-17 02:27:04,105-[agedb_30][45000]XNorm: 9.457024 Training: 2022-01-17 02:27:04,105-[agedb_30][45000]Accuracy-Flip: 0.96500+-0.00711 Training: 2022-01-17 02:27:04,105-[agedb_30][45000]Accuracy-Highest: 0.96500 Training: 2022-01-17 02:27:07,487-Speed 582.61 samples/sec Loss 5.5259 LearningRate 0.1048 Epoch: 10 Global Step: 45010 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:27:10,862-Speed 12136.67 samples/sec Loss 5.4755 LearningRate 0.1047 Epoch: 10 Global Step: 45020 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:27:14,256-Speed 12074.31 samples/sec Loss 5.4607 LearningRate 0.1046 Epoch: 10 Global Step: 45030 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:27:17,780-Speed 11625.72 samples/sec Loss 5.4767 LearningRate 0.1046 Epoch: 10 Global Step: 45040 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:27:21,319-Speed 11577.47 samples/sec Loss 5.4835 LearningRate 0.1045 Epoch: 10 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:25,006-Speed 11109.60 samples/sec Loss 5.4628 LearningRate 0.1045 Epoch: 10 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:28,525-Speed 11644.24 samples/sec Loss 5.4809 LearningRate 0.1044 Epoch: 10 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:31,994-Speed 11810.13 samples/sec Loss 5.4342 LearningRate 0.1044 Epoch: 10 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:35,866-Speed 10582.19 samples/sec Loss 5.4366 LearningRate 0.1043 Epoch: 10 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:39,693-Speed 10705.32 samples/sec Loss 5.4647 LearningRate 0.1043 Epoch: 10 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:43,302-Speed 11351.25 samples/sec Loss 5.4514 LearningRate 0.1042 Epoch: 10 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:46,731-Speed 11947.31 samples/sec Loss 5.5250 LearningRate 0.1042 Epoch: 10 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:50,189-Speed 11850.12 samples/sec Loss 5.4511 LearningRate 0.1041 Epoch: 10 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:53,640-Speed 11871.83 samples/sec Loss 5.5583 LearningRate 0.1040 Epoch: 10 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:27:57,354-Speed 11033.56 samples/sec Loss 5.4248 LearningRate 0.1040 Epoch: 10 Global Step: 45150 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:01,351-Speed 10247.61 samples/sec Loss 5.5046 LearningRate 0.1039 Epoch: 10 Global Step: 45160 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:05,085-Speed 10974.46 samples/sec Loss 5.4489 LearningRate 0.1039 Epoch: 10 Global Step: 45170 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:08,902-Speed 10732.18 samples/sec Loss 5.4846 LearningRate 0.1038 Epoch: 10 Global Step: 45180 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:12,396-Speed 11739.72 samples/sec Loss 5.4388 LearningRate 0.1038 Epoch: 10 Global Step: 45190 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:16,194-Speed 10785.66 samples/sec Loss 5.4439 LearningRate 0.1037 Epoch: 10 Global Step: 45200 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:19,712-Speed 11645.89 samples/sec Loss 5.3961 LearningRate 0.1037 Epoch: 10 Global Step: 45210 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:24,269-Speed 8989.84 samples/sec Loss 5.4717 LearningRate 0.1036 Epoch: 10 Global Step: 45220 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:27,915-Speed 11235.37 samples/sec Loss 5.5209 LearningRate 0.1036 Epoch: 10 Global Step: 45230 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:31,354-Speed 11915.90 samples/sec Loss 5.4889 LearningRate 0.1035 Epoch: 10 Global Step: 45240 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:34,737-Speed 12111.74 samples/sec Loss 5.4710 LearningRate 0.1034 Epoch: 10 Global Step: 45250 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:38,270-Speed 11596.13 samples/sec Loss 5.4808 LearningRate 0.1034 Epoch: 10 Global Step: 45260 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:42,442-Speed 9820.59 samples/sec Loss 5.4471 LearningRate 0.1033 Epoch: 10 Global Step: 45270 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:46,135-Speed 11092.97 samples/sec Loss 5.4370 LearningRate 0.1033 Epoch: 10 Global Step: 45280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:49,599-Speed 11826.40 samples/sec Loss 5.4051 LearningRate 0.1032 Epoch: 10 Global Step: 45290 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:53,067-Speed 11815.55 samples/sec Loss 5.5010 LearningRate 0.1032 Epoch: 10 Global Step: 45300 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:28:56,543-Speed 11794.23 samples/sec Loss 5.4834 LearningRate 0.1031 Epoch: 10 Global Step: 45310 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:29:00,237-Speed 11087.98 samples/sec Loss 5.4474 LearningRate 0.1031 Epoch: 10 Global Step: 45320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:29:03,897-Speed 11192.88 samples/sec Loss 5.4315 LearningRate 0.1030 Epoch: 10 Global Step: 45330 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:29:07,353-Speed 11856.10 samples/sec Loss 5.4676 LearningRate 0.1030 Epoch: 10 Global Step: 45340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:10,852-Speed 11709.97 samples/sec Loss 5.4014 LearningRate 0.1029 Epoch: 10 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:14,358-Speed 11690.40 samples/sec Loss 5.4360 LearningRate 0.1029 Epoch: 10 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:17,959-Speed 11378.13 samples/sec Loss 5.4784 LearningRate 0.1028 Epoch: 10 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:21,781-Speed 10718.90 samples/sec Loss 5.4308 LearningRate 0.1027 Epoch: 10 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:25,730-Speed 10373.90 samples/sec Loss 5.4575 LearningRate 0.1027 Epoch: 10 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:29,373-Speed 11247.85 samples/sec Loss 5.4279 LearningRate 0.1026 Epoch: 10 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:32,841-Speed 11813.95 samples/sec Loss 5.4277 LearningRate 0.1026 Epoch: 10 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:36,235-Speed 12072.89 samples/sec Loss 5.4618 LearningRate 0.1025 Epoch: 10 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:39,699-Speed 11827.10 samples/sec Loss 5.4444 LearningRate 0.1025 Epoch: 10 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:44,184-Speed 9135.00 samples/sec Loss 5.4636 LearningRate 0.1024 Epoch: 10 Global Step: 45440 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:29:47,852-Speed 11168.85 samples/sec Loss 5.4082 LearningRate 0.1024 Epoch: 10 Global Step: 45450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:29:51,434-Speed 11436.13 samples/sec Loss 5.4394 LearningRate 0.1023 Epoch: 10 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:55,120-Speed 11116.50 samples/sec Loss 5.4270 LearningRate 0.1023 Epoch: 10 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:29:58,591-Speed 11804.82 samples/sec Loss 5.3965 LearningRate 0.1022 Epoch: 10 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:02,101-Speed 11673.46 samples/sec Loss 5.4419 LearningRate 0.1022 Epoch: 10 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:05,569-Speed 11812.50 samples/sec Loss 5.4344 LearningRate 0.1021 Epoch: 10 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:09,048-Speed 11778.32 samples/sec Loss 5.4605 LearningRate 0.1020 Epoch: 10 Global Step: 45510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:12,448-Speed 12050.52 samples/sec Loss 5.4660 LearningRate 0.1020 Epoch: 10 Global Step: 45520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:15,904-Speed 11855.87 samples/sec Loss 5.4630 LearningRate 0.1019 Epoch: 10 Global Step: 45530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:19,706-Speed 10774.71 samples/sec Loss 5.3914 LearningRate 0.1019 Epoch: 10 Global Step: 45540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:23,116-Speed 12012.67 samples/sec Loss 5.4563 LearningRate 0.1018 Epoch: 10 Global Step: 45550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:26,675-Speed 11514.38 samples/sec Loss 5.4715 LearningRate 0.1018 Epoch: 10 Global Step: 45560 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:30:31,427-Speed 8621.19 samples/sec Loss 5.4450 LearningRate 0.1017 Epoch: 10 Global Step: 45570 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:30:35,156-Speed 10992.84 samples/sec Loss 5.5005 LearningRate 0.1017 Epoch: 10 Global Step: 45580 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:30:39,155-Speed 10246.51 samples/sec Loss 5.4780 LearningRate 0.1016 Epoch: 10 Global Step: 45590 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:30:42,635-Speed 11773.64 samples/sec Loss 5.4299 LearningRate 0.1016 Epoch: 10 Global Step: 45600 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:30:46,105-Speed 11806.53 samples/sec Loss 5.4944 LearningRate 0.1015 Epoch: 10 Global Step: 45610 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:30:49,628-Speed 11628.21 samples/sec Loss 5.3695 LearningRate 0.1015 Epoch: 10 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:53,474-Speed 10652.79 samples/sec Loss 5.4060 LearningRate 0.1014 Epoch: 10 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:30:56,961-Speed 11749.80 samples/sec Loss 5.4217 LearningRate 0.1013 Epoch: 10 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:00,460-Speed 11709.68 samples/sec Loss 5.4561 LearningRate 0.1013 Epoch: 10 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:03,937-Speed 11782.15 samples/sec Loss 5.4384 LearningRate 0.1012 Epoch: 10 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:07,437-Speed 11705.87 samples/sec Loss 5.4319 LearningRate 0.1012 Epoch: 10 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:11,105-Speed 11170.00 samples/sec Loss 5.4378 LearningRate 0.1011 Epoch: 10 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:14,517-Speed 12009.61 samples/sec Loss 5.4110 LearningRate 0.1011 Epoch: 10 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:17,872-Speed 12209.27 samples/sec Loss 5.4029 LearningRate 0.1010 Epoch: 10 Global Step: 45700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:21,261-Speed 12087.90 samples/sec Loss 5.3975 LearningRate 0.1010 Epoch: 10 Global Step: 45710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:31:24,673-Speed 12008.29 samples/sec Loss 5.4766 LearningRate 0.1009 Epoch: 10 Global Step: 45720 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:28,125-Speed 11868.43 samples/sec Loss 5.4023 LearningRate 0.1009 Epoch: 10 Global Step: 45730 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:32,548-Speed 9264.02 samples/sec Loss 5.4214 LearningRate 0.1008 Epoch: 10 Global Step: 45740 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:36,119-Speed 11472.23 samples/sec Loss 5.3976 LearningRate 0.1008 Epoch: 10 Global Step: 45750 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:39,959-Speed 10669.20 samples/sec Loss 5.4232 LearningRate 0.1007 Epoch: 10 Global Step: 45760 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:43,758-Speed 10782.65 samples/sec Loss 5.4140 LearningRate 0.1007 Epoch: 10 Global Step: 45770 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:47,478-Speed 11015.40 samples/sec Loss 5.4416 LearningRate 0.1006 Epoch: 10 Global Step: 45780 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:50,903-Speed 11961.13 samples/sec Loss 5.4470 LearningRate 0.1005 Epoch: 10 Global Step: 45790 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:54,444-Speed 11569.46 samples/sec Loss 5.4191 LearningRate 0.1005 Epoch: 10 Global Step: 45800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:31:58,040-Speed 11394.96 samples/sec Loss 5.4655 LearningRate 0.1004 Epoch: 10 Global Step: 45810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:32:01,489-Speed 11877.99 samples/sec Loss 5.4371 LearningRate 0.1004 Epoch: 10 Global Step: 45820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:32:05,409-Speed 10452.49 samples/sec Loss 5.4592 LearningRate 0.1003 Epoch: 10 Global Step: 45830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:32:09,053-Speed 11241.97 samples/sec Loss 5.4128 LearningRate 0.1003 Epoch: 10 Global Step: 45840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:32:13,003-Speed 10371.86 samples/sec Loss 5.4256 LearningRate 0.1002 Epoch: 10 Global Step: 45850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:32:16,867-Speed 10604.57 samples/sec Loss 5.4434 LearningRate 0.1002 Epoch: 10 Global Step: 45860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:32:20,426-Speed 11510.13 samples/sec Loss 5.4476 LearningRate 0.1001 Epoch: 10 Global Step: 45870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:32:23,933-Speed 11680.41 samples/sec Loss 5.4053 LearningRate 0.1001 Epoch: 10 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:32:27,902-Speed 10323.67 samples/sec Loss 5.3422 LearningRate 0.1000 Epoch: 10 Global Step: 45890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:01,186-Speed 1230.64 samples/sec Loss 4.8316 LearningRate 0.1000 Epoch: 11 Global Step: 45900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:05,226-Speed 10144.36 samples/sec Loss 4.6388 LearningRate 0.0999 Epoch: 11 Global Step: 45910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:09,579-Speed 9411.10 samples/sec Loss 4.6829 LearningRate 0.0999 Epoch: 11 Global Step: 45920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:13,743-Speed 9836.96 samples/sec Loss 4.6797 LearningRate 0.0998 Epoch: 11 Global Step: 45930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:17,296-Speed 11530.96 samples/sec Loss 4.6920 LearningRate 0.0997 Epoch: 11 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:21,486-Speed 9778.49 samples/sec Loss 4.6479 LearningRate 0.0997 Epoch: 11 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:25,399-Speed 10468.59 samples/sec Loss 4.6830 LearningRate 0.0996 Epoch: 11 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:29,263-Speed 10606.19 samples/sec Loss 4.7229 LearningRate 0.0996 Epoch: 11 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:33,123-Speed 10613.66 samples/sec Loss 4.7127 LearningRate 0.0995 Epoch: 11 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-17 02:33:36,615-Speed 11731.06 samples/sec Loss 4.7224 LearningRate 0.0995 Epoch: 11 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:33:40,171-Speed 11521.57 samples/sec Loss 4.7248 LearningRate 0.0994 Epoch: 11 Global Step: 46000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:33:43,680-Speed 11680.68 samples/sec Loss 4.8058 LearningRate 0.0994 Epoch: 11 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:33:47,177-Speed 11715.83 samples/sec Loss 4.7753 LearningRate 0.0993 Epoch: 11 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:33:51,398-Speed 9704.94 samples/sec Loss 4.7484 LearningRate 0.0993 Epoch: 11 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:33:55,189-Speed 10807.69 samples/sec Loss 4.7802 LearningRate 0.0992 Epoch: 11 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:33:58,588-Speed 12052.30 samples/sec Loss 4.7905 LearningRate 0.0992 Epoch: 11 Global Step: 46050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:02,558-Speed 10320.03 samples/sec Loss 4.7608 LearningRate 0.0991 Epoch: 11 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:06,294-Speed 10967.17 samples/sec Loss 4.7730 LearningRate 0.0991 Epoch: 11 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:09,758-Speed 11828.85 samples/sec Loss 4.7744 LearningRate 0.0990 Epoch: 11 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:13,139-Speed 12118.60 samples/sec Loss 4.8379 LearningRate 0.0989 Epoch: 11 Global Step: 46090 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:34:16,614-Speed 11787.65 samples/sec Loss 4.8241 LearningRate 0.0989 Epoch: 11 Global Step: 46100 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:34:20,250-Speed 11266.70 samples/sec Loss 4.8583 LearningRate 0.0988 Epoch: 11 Global Step: 46110 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:34:23,631-Speed 12117.77 samples/sec Loss 4.8196 LearningRate 0.0988 Epoch: 11 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:27,078-Speed 11889.46 samples/sec Loss 4.7870 LearningRate 0.0987 Epoch: 11 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:30,673-Speed 11394.91 samples/sec Loss 4.8735 LearningRate 0.0987 Epoch: 11 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:34,171-Speed 11711.28 samples/sec Loss 4.7997 LearningRate 0.0986 Epoch: 11 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:37,661-Speed 11741.31 samples/sec Loss 4.8143 LearningRate 0.0986 Epoch: 11 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:41,122-Speed 11839.61 samples/sec Loss 4.7694 LearningRate 0.0985 Epoch: 11 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:44,846-Speed 11000.40 samples/sec Loss 4.8877 LearningRate 0.0985 Epoch: 11 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:48,836-Speed 10268.67 samples/sec Loss 4.7907 LearningRate 0.0984 Epoch: 11 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:52,361-Speed 11620.71 samples/sec Loss 4.8435 LearningRate 0.0984 Epoch: 11 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:55,875-Speed 11659.89 samples/sec Loss 4.9038 LearningRate 0.0983 Epoch: 11 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:34:59,324-Speed 11882.10 samples/sec Loss 4.8972 LearningRate 0.0983 Epoch: 11 Global Step: 46220 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:02,730-Speed 12028.43 samples/sec Loss 4.8977 LearningRate 0.0982 Epoch: 11 Global Step: 46230 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:06,156-Speed 11961.58 samples/sec Loss 4.8723 LearningRate 0.0982 Epoch: 11 Global Step: 46240 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:09,582-Speed 11958.18 samples/sec Loss 4.8989 LearningRate 0.0981 Epoch: 11 Global Step: 46250 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:13,078-Speed 11717.75 samples/sec Loss 4.8817 LearningRate 0.0980 Epoch: 11 Global Step: 46260 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:16,725-Speed 11233.00 samples/sec Loss 4.8954 LearningRate 0.0980 Epoch: 11 Global Step: 46270 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:20,189-Speed 11830.28 samples/sec Loss 4.8474 LearningRate 0.0979 Epoch: 11 Global Step: 46280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:23,835-Speed 11236.62 samples/sec Loss 4.9201 LearningRate 0.0979 Epoch: 11 Global Step: 46290 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:35:27,228-Speed 12076.15 samples/sec Loss 4.8857 LearningRate 0.0978 Epoch: 11 Global Step: 46300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:30,587-Speed 12194.47 samples/sec Loss 4.8875 LearningRate 0.0978 Epoch: 11 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:34,291-Speed 11060.25 samples/sec Loss 4.9198 LearningRate 0.0977 Epoch: 11 Global Step: 46320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:37,771-Speed 11775.63 samples/sec Loss 4.9094 LearningRate 0.0977 Epoch: 11 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:41,307-Speed 11588.37 samples/sec Loss 4.9141 LearningRate 0.0976 Epoch: 11 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:44,993-Speed 11114.75 samples/sec Loss 4.8573 LearningRate 0.0976 Epoch: 11 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:49,045-Speed 10110.19 samples/sec Loss 4.8992 LearningRate 0.0975 Epoch: 11 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:52,537-Speed 11731.23 samples/sec Loss 4.9124 LearningRate 0.0975 Epoch: 11 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:55,949-Speed 12008.93 samples/sec Loss 4.9583 LearningRate 0.0974 Epoch: 11 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:35:59,403-Speed 11861.76 samples/sec Loss 4.9256 LearningRate 0.0974 Epoch: 11 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:36:02,796-Speed 12073.09 samples/sec Loss 4.9751 LearningRate 0.0973 Epoch: 11 Global Step: 46400 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:06,566-Speed 10870.44 samples/sec Loss 4.9892 LearningRate 0.0973 Epoch: 11 Global Step: 46410 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:10,060-Speed 11724.45 samples/sec Loss 4.9338 LearningRate 0.0972 Epoch: 11 Global Step: 46420 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:13,837-Speed 10849.17 samples/sec Loss 4.9814 LearningRate 0.0972 Epoch: 11 Global Step: 46430 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:17,421-Speed 11428.08 samples/sec Loss 4.9183 LearningRate 0.0971 Epoch: 11 Global Step: 46440 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:21,147-Speed 10997.30 samples/sec Loss 4.9751 LearningRate 0.0970 Epoch: 11 Global Step: 46450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:24,583-Speed 11923.78 samples/sec Loss 4.9908 LearningRate 0.0970 Epoch: 11 Global Step: 46460 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:27,968-Speed 12103.92 samples/sec Loss 5.0146 LearningRate 0.0969 Epoch: 11 Global Step: 46470 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:31,588-Speed 11317.27 samples/sec Loss 4.9788 LearningRate 0.0969 Epoch: 11 Global Step: 46480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:35,336-Speed 10932.81 samples/sec Loss 4.9947 LearningRate 0.0968 Epoch: 11 Global Step: 46490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:38,885-Speed 11543.17 samples/sec Loss 4.9636 LearningRate 0.0968 Epoch: 11 Global Step: 46500 Fp16 Grad Scale: 524288 Required: 4 hours Training: 2022-01-17 02:36:42,761-Speed 10570.44 samples/sec Loss 5.0132 LearningRate 0.0967 Epoch: 11 Global Step: 46510 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:46,285-Speed 11627.75 samples/sec Loss 4.9775 LearningRate 0.0967 Epoch: 11 Global Step: 46520 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:49,997-Speed 11035.31 samples/sec Loss 4.9996 LearningRate 0.0966 Epoch: 11 Global Step: 46530 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:53,743-Speed 10938.10 samples/sec Loss 4.9745 LearningRate 0.0966 Epoch: 11 Global Step: 46540 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:36:57,223-Speed 11772.71 samples/sec Loss 4.9496 LearningRate 0.0965 Epoch: 11 Global Step: 46550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:37:00,817-Speed 11399.84 samples/sec Loss 5.0040 LearningRate 0.0965 Epoch: 11 Global Step: 46560 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:37:05,285-Speed 9170.19 samples/sec Loss 4.9595 LearningRate 0.0964 Epoch: 11 Global Step: 46570 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:37:08,765-Speed 11772.95 samples/sec Loss 5.0145 LearningRate 0.0964 Epoch: 11 Global Step: 46580 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:37:12,288-Speed 11629.77 samples/sec Loss 4.9984 LearningRate 0.0963 Epoch: 11 Global Step: 46590 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:37:15,751-Speed 11834.54 samples/sec Loss 5.0280 LearningRate 0.0963 Epoch: 11 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:19,589-Speed 10673.14 samples/sec Loss 5.0194 LearningRate 0.0962 Epoch: 11 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:23,103-Speed 11661.47 samples/sec Loss 5.0540 LearningRate 0.0962 Epoch: 11 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:26,585-Speed 11763.79 samples/sec Loss 4.9897 LearningRate 0.0961 Epoch: 11 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:30,511-Speed 10436.69 samples/sec Loss 4.9587 LearningRate 0.0961 Epoch: 11 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:34,071-Speed 11506.57 samples/sec Loss 5.0146 LearningRate 0.0960 Epoch: 11 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:37,639-Speed 11483.88 samples/sec Loss 5.0002 LearningRate 0.0960 Epoch: 11 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:41,239-Speed 11381.09 samples/sec Loss 5.0461 LearningRate 0.0959 Epoch: 11 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:44,975-Speed 10966.56 samples/sec Loss 5.0311 LearningRate 0.0958 Epoch: 11 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:48,355-Speed 12122.09 samples/sec Loss 5.0719 LearningRate 0.0958 Epoch: 11 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:37:51,901-Speed 11554.22 samples/sec Loss 5.0248 LearningRate 0.0957 Epoch: 11 Global Step: 46700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:37:55,485-Speed 11429.13 samples/sec Loss 4.9702 LearningRate 0.0957 Epoch: 11 Global Step: 46710 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:37:59,048-Speed 11502.08 samples/sec Loss 4.9899 LearningRate 0.0956 Epoch: 11 Global Step: 46720 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:02,794-Speed 10935.91 samples/sec Loss 5.0400 LearningRate 0.0956 Epoch: 11 Global Step: 46730 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:06,714-Speed 10450.91 samples/sec Loss 5.0321 LearningRate 0.0955 Epoch: 11 Global Step: 46740 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:10,320-Speed 11360.40 samples/sec Loss 5.0410 LearningRate 0.0955 Epoch: 11 Global Step: 46750 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:13,767-Speed 11887.30 samples/sec Loss 5.0521 LearningRate 0.0954 Epoch: 11 Global Step: 46760 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:17,193-Speed 11958.53 samples/sec Loss 5.0079 LearningRate 0.0954 Epoch: 11 Global Step: 46770 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:21,680-Speed 9130.13 samples/sec Loss 5.0355 LearningRate 0.0953 Epoch: 11 Global Step: 46780 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:25,331-Speed 11221.96 samples/sec Loss 5.0712 LearningRate 0.0953 Epoch: 11 Global Step: 46790 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:29,103-Speed 10862.51 samples/sec Loss 5.0294 LearningRate 0.0952 Epoch: 11 Global Step: 46800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:32,563-Speed 11840.05 samples/sec Loss 5.0443 LearningRate 0.0952 Epoch: 11 Global Step: 46810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:35,908-Speed 12251.14 samples/sec Loss 5.0393 LearningRate 0.0951 Epoch: 11 Global Step: 46820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:39,337-Speed 11947.52 samples/sec Loss 5.0675 LearningRate 0.0951 Epoch: 11 Global Step: 46830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:42,741-Speed 12036.89 samples/sec Loss 5.0776 LearningRate 0.0950 Epoch: 11 Global Step: 46840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:46,777-Speed 10149.67 samples/sec Loss 5.0525 LearningRate 0.0950 Epoch: 11 Global Step: 46850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:38:50,266-Speed 11743.13 samples/sec Loss 5.0150 LearningRate 0.0949 Epoch: 11 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:38:53,866-Speed 11382.07 samples/sec Loss 5.0504 LearningRate 0.0949 Epoch: 11 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:38:57,388-Speed 11631.50 samples/sec Loss 5.1115 LearningRate 0.0948 Epoch: 11 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:00,842-Speed 11862.71 samples/sec Loss 5.0576 LearningRate 0.0948 Epoch: 11 Global Step: 46890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:04,410-Speed 11482.20 samples/sec Loss 5.0275 LearningRate 0.0947 Epoch: 11 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:08,118-Speed 11048.79 samples/sec Loss 5.0403 LearningRate 0.0947 Epoch: 11 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:11,549-Speed 11942.00 samples/sec Loss 5.0377 LearningRate 0.0946 Epoch: 11 Global Step: 46920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:14,919-Speed 12157.48 samples/sec Loss 5.0595 LearningRate 0.0945 Epoch: 11 Global Step: 46930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:19,453-Speed 9035.06 samples/sec Loss 5.1135 LearningRate 0.0945 Epoch: 11 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:22,907-Speed 11863.59 samples/sec Loss 5.0823 LearningRate 0.0944 Epoch: 11 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:26,597-Speed 11103.17 samples/sec Loss 5.0327 LearningRate 0.0944 Epoch: 11 Global Step: 46960 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:39:30,296-Speed 11075.76 samples/sec Loss 5.1080 LearningRate 0.0943 Epoch: 11 Global Step: 46970 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:39:33,675-Speed 12124.35 samples/sec Loss 5.0574 LearningRate 0.0943 Epoch: 11 Global Step: 46980 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:39:37,176-Speed 11701.43 samples/sec Loss 5.1077 LearningRate 0.0942 Epoch: 11 Global Step: 46990 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:39:40,776-Speed 11380.78 samples/sec Loss 5.0943 LearningRate 0.0942 Epoch: 11 Global Step: 47000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:39:44,490-Speed 11032.50 samples/sec Loss 5.0884 LearningRate 0.0941 Epoch: 11 Global Step: 47010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:48,118-Speed 11294.09 samples/sec Loss 5.0993 LearningRate 0.0941 Epoch: 11 Global Step: 47020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:51,729-Speed 11344.79 samples/sec Loss 5.0828 LearningRate 0.0940 Epoch: 11 Global Step: 47030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:55,228-Speed 11708.84 samples/sec Loss 5.1245 LearningRate 0.0940 Epoch: 11 Global Step: 47040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:39:58,712-Speed 11761.27 samples/sec Loss 5.0433 LearningRate 0.0939 Epoch: 11 Global Step: 47050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:02,371-Speed 11196.58 samples/sec Loss 5.0971 LearningRate 0.0939 Epoch: 11 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:05,847-Speed 11788.02 samples/sec Loss 5.1159 LearningRate 0.0938 Epoch: 11 Global Step: 47070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:09,295-Speed 11881.27 samples/sec Loss 5.0915 LearningRate 0.0938 Epoch: 11 Global Step: 47080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:12,737-Speed 11905.85 samples/sec Loss 5.0779 LearningRate 0.0937 Epoch: 11 Global Step: 47090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:16,107-Speed 12154.04 samples/sec Loss 5.0625 LearningRate 0.0937 Epoch: 11 Global Step: 47100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:19,954-Speed 10652.46 samples/sec Loss 5.0423 LearningRate 0.0936 Epoch: 11 Global Step: 47110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:23,412-Speed 11848.26 samples/sec Loss 5.0553 LearningRate 0.0936 Epoch: 11 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:27,162-Speed 10924.73 samples/sec Loss 5.1009 LearningRate 0.0935 Epoch: 11 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:30,755-Speed 11401.95 samples/sec Loss 5.1115 LearningRate 0.0935 Epoch: 11 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:34,214-Speed 11843.89 samples/sec Loss 5.1350 LearningRate 0.0934 Epoch: 11 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:38,044-Speed 10696.20 samples/sec Loss 5.0586 LearningRate 0.0934 Epoch: 11 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:41,674-Speed 11288.18 samples/sec Loss 5.0505 LearningRate 0.0933 Epoch: 11 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:45,253-Speed 11448.15 samples/sec Loss 5.1254 LearningRate 0.0933 Epoch: 11 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:48,893-Speed 11257.11 samples/sec Loss 5.1252 LearningRate 0.0932 Epoch: 11 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:52,992-Speed 9994.77 samples/sec Loss 5.1189 LearningRate 0.0932 Epoch: 11 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:56,406-Speed 11999.54 samples/sec Loss 5.1333 LearningRate 0.0931 Epoch: 11 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:40:59,940-Speed 11594.25 samples/sec Loss 5.0962 LearningRate 0.0931 Epoch: 11 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:03,663-Speed 11007.69 samples/sec Loss 5.1395 LearningRate 0.0930 Epoch: 11 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:07,264-Speed 11375.24 samples/sec Loss 5.1116 LearningRate 0.0929 Epoch: 11 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:10,674-Speed 12016.86 samples/sec Loss 5.1437 LearningRate 0.0929 Epoch: 11 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:14,231-Speed 11517.65 samples/sec Loss 5.1515 LearningRate 0.0928 Epoch: 11 Global Step: 47260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:17,666-Speed 11927.20 samples/sec Loss 5.0504 LearningRate 0.0928 Epoch: 11 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:21,334-Speed 11171.25 samples/sec Loss 5.1460 LearningRate 0.0927 Epoch: 11 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:25,038-Speed 11061.16 samples/sec Loss 5.1547 LearningRate 0.0927 Epoch: 11 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:28,586-Speed 11545.79 samples/sec Loss 5.1323 LearningRate 0.0926 Epoch: 11 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:32,006-Speed 11978.97 samples/sec Loss 5.1925 LearningRate 0.0926 Epoch: 11 Global Step: 47310 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:41:35,509-Speed 11697.87 samples/sec Loss 5.1094 LearningRate 0.0925 Epoch: 11 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:39,430-Speed 10450.60 samples/sec Loss 5.1290 LearningRate 0.0925 Epoch: 11 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:42,946-Speed 11651.59 samples/sec Loss 5.1155 LearningRate 0.0924 Epoch: 11 Global Step: 47340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:46,354-Speed 12021.58 samples/sec Loss 5.1402 LearningRate 0.0924 Epoch: 11 Global Step: 47350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:50,226-Speed 10580.87 samples/sec Loss 5.1319 LearningRate 0.0923 Epoch: 11 Global Step: 47360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:54,135-Speed 10482.62 samples/sec Loss 5.1681 LearningRate 0.0923 Epoch: 11 Global Step: 47370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:41:57,905-Speed 10865.44 samples/sec Loss 5.1806 LearningRate 0.0922 Epoch: 11 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:02,241-Speed 9449.77 samples/sec Loss 5.1195 LearningRate 0.0922 Epoch: 11 Global Step: 47390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:05,716-Speed 11790.77 samples/sec Loss 5.1076 LearningRate 0.0921 Epoch: 11 Global Step: 47400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:09,451-Speed 10968.54 samples/sec Loss 5.0741 LearningRate 0.0921 Epoch: 11 Global Step: 47410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:12,873-Speed 11975.65 samples/sec Loss 5.1363 LearningRate 0.0920 Epoch: 11 Global Step: 47420 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:42:16,706-Speed 10689.77 samples/sec Loss 5.1114 LearningRate 0.0920 Epoch: 11 Global Step: 47430 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:42:20,243-Speed 11584.06 samples/sec Loss 5.1163 LearningRate 0.0919 Epoch: 11 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:23,969-Speed 10995.31 samples/sec Loss 5.1544 LearningRate 0.0919 Epoch: 11 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:27,948-Speed 10295.72 samples/sec Loss 5.1205 LearningRate 0.0918 Epoch: 11 Global Step: 47460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:31,434-Speed 11754.63 samples/sec Loss 5.1390 LearningRate 0.0918 Epoch: 11 Global Step: 47470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:34,890-Speed 11855.16 samples/sec Loss 5.1480 LearningRate 0.0917 Epoch: 11 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:38,493-Speed 11371.18 samples/sec Loss 5.1228 LearningRate 0.0917 Epoch: 11 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:42,172-Speed 11136.78 samples/sec Loss 5.1948 LearningRate 0.0916 Epoch: 11 Global Step: 47500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:45,727-Speed 11523.66 samples/sec Loss 5.1768 LearningRate 0.0916 Epoch: 11 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:49,304-Speed 11455.45 samples/sec Loss 5.1602 LearningRate 0.0915 Epoch: 11 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:53,119-Speed 10739.70 samples/sec Loss 5.1439 LearningRate 0.0915 Epoch: 11 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:42:56,613-Speed 11723.34 samples/sec Loss 5.1311 LearningRate 0.0914 Epoch: 11 Global Step: 47540 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:43:00,007-Speed 12070.15 samples/sec Loss 5.1815 LearningRate 0.0914 Epoch: 11 Global Step: 47550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:43:04,399-Speed 9328.95 samples/sec Loss 5.1329 LearningRate 0.0913 Epoch: 11 Global Step: 47560 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:43:08,150-Speed 10922.74 samples/sec Loss 5.1687 LearningRate 0.0913 Epoch: 11 Global Step: 47570 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:43:11,586-Speed 11923.12 samples/sec Loss 5.1740 LearningRate 0.0912 Epoch: 11 Global Step: 47580 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:43:15,182-Speed 11393.32 samples/sec Loss 5.1218 LearningRate 0.0912 Epoch: 11 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:18,669-Speed 11750.87 samples/sec Loss 5.1151 LearningRate 0.0911 Epoch: 11 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:22,624-Speed 10357.30 samples/sec Loss 5.1085 LearningRate 0.0911 Epoch: 11 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:26,177-Speed 11532.92 samples/sec Loss 5.0739 LearningRate 0.0910 Epoch: 11 Global Step: 47620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:29,600-Speed 11970.57 samples/sec Loss 5.1341 LearningRate 0.0910 Epoch: 11 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:33,204-Speed 11367.69 samples/sec Loss 5.0635 LearningRate 0.0909 Epoch: 11 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:36,657-Speed 11865.33 samples/sec Loss 5.1416 LearningRate 0.0909 Epoch: 11 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:40,330-Speed 11153.34 samples/sec Loss 5.1552 LearningRate 0.0908 Epoch: 11 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:43,726-Speed 12062.53 samples/sec Loss 5.1640 LearningRate 0.0908 Epoch: 11 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:47,450-Speed 11005.40 samples/sec Loss 5.1763 LearningRate 0.0907 Epoch: 11 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:43:50,834-Speed 12104.28 samples/sec Loss 5.1564 LearningRate 0.0907 Epoch: 11 Global Step: 47690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:43:54,220-Speed 12102.85 samples/sec Loss 5.1448 LearningRate 0.0906 Epoch: 11 Global Step: 47700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:43:57,717-Speed 11713.48 samples/sec Loss 5.1572 LearningRate 0.0906 Epoch: 11 Global Step: 47710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:01,688-Speed 10316.08 samples/sec Loss 5.1472 LearningRate 0.0905 Epoch: 11 Global Step: 47720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:05,107-Speed 11985.12 samples/sec Loss 5.1640 LearningRate 0.0904 Epoch: 11 Global Step: 47730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:08,540-Speed 11934.55 samples/sec Loss 5.1562 LearningRate 0.0904 Epoch: 11 Global Step: 47740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:12,089-Speed 11543.51 samples/sec Loss 5.1657 LearningRate 0.0903 Epoch: 11 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:15,776-Speed 11111.84 samples/sec Loss 5.1105 LearningRate 0.0903 Epoch: 11 Global Step: 47760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:19,360-Speed 11431.62 samples/sec Loss 5.1744 LearningRate 0.0902 Epoch: 11 Global Step: 47770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:22,932-Speed 11471.87 samples/sec Loss 5.1734 LearningRate 0.0902 Epoch: 11 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:26,475-Speed 11563.21 samples/sec Loss 5.1725 LearningRate 0.0901 Epoch: 11 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:30,092-Speed 11327.50 samples/sec Loss 5.0990 LearningRate 0.0901 Epoch: 11 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:44:33,583-Speed 11736.74 samples/sec Loss 5.1256 LearningRate 0.0900 Epoch: 11 Global Step: 47810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:44:37,069-Speed 11752.42 samples/sec Loss 5.1618 LearningRate 0.0900 Epoch: 11 Global Step: 47820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:44:40,554-Speed 11757.21 samples/sec Loss 5.1219 LearningRate 0.0899 Epoch: 11 Global Step: 47830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:44:44,075-Speed 11635.71 samples/sec Loss 5.1728 LearningRate 0.0899 Epoch: 11 Global Step: 47840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:44:47,847-Speed 10862.57 samples/sec Loss 5.1885 LearningRate 0.0898 Epoch: 11 Global Step: 47850 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:44:51,466-Speed 11321.10 samples/sec Loss 5.2006 LearningRate 0.0898 Epoch: 11 Global Step: 47860 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:44:54,970-Speed 11693.33 samples/sec Loss 5.1678 LearningRate 0.0897 Epoch: 11 Global Step: 47870 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:44:58,466-Speed 11717.87 samples/sec Loss 5.1798 LearningRate 0.0897 Epoch: 11 Global Step: 47880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:45:02,473-Speed 10223.90 samples/sec Loss 5.1963 LearningRate 0.0896 Epoch: 11 Global Step: 47890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:45:06,305-Speed 10691.62 samples/sec Loss 5.1215 LearningRate 0.0896 Epoch: 11 Global Step: 47900 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:45:09,990-Speed 11117.18 samples/sec Loss 5.1397 LearningRate 0.0895 Epoch: 11 Global Step: 47910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:45:13,491-Speed 11702.19 samples/sec Loss 5.1772 LearningRate 0.0895 Epoch: 11 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:16,998-Speed 11683.50 samples/sec Loss 5.1365 LearningRate 0.0894 Epoch: 11 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:20,442-Speed 11897.69 samples/sec Loss 5.1793 LearningRate 0.0894 Epoch: 11 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:23,831-Speed 12091.22 samples/sec Loss 5.1426 LearningRate 0.0893 Epoch: 11 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:27,288-Speed 11849.33 samples/sec Loss 5.1770 LearningRate 0.0893 Epoch: 11 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:30,725-Speed 11921.89 samples/sec Loss 5.1292 LearningRate 0.0892 Epoch: 11 Global Step: 47970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:34,165-Speed 11910.52 samples/sec Loss 5.1864 LearningRate 0.0892 Epoch: 11 Global Step: 47980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:38,056-Speed 10542.20 samples/sec Loss 5.1481 LearningRate 0.0891 Epoch: 11 Global Step: 47990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:41,564-Speed 11677.63 samples/sec Loss 5.1946 LearningRate 0.0891 Epoch: 11 Global Step: 48000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:45,026-Speed 11834.68 samples/sec Loss 5.1343 LearningRate 0.0890 Epoch: 11 Global Step: 48010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:48,547-Speed 11637.04 samples/sec Loss 5.1748 LearningRate 0.0890 Epoch: 11 Global Step: 48020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:52,370-Speed 10716.55 samples/sec Loss 5.2079 LearningRate 0.0889 Epoch: 11 Global Step: 48030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:55,766-Speed 12062.53 samples/sec Loss 5.1742 LearningRate 0.0889 Epoch: 11 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:45:59,508-Speed 10949.57 samples/sec Loss 5.2055 LearningRate 0.0888 Epoch: 11 Global Step: 48050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:03,854-Speed 9426.94 samples/sec Loss 5.1836 LearningRate 0.0888 Epoch: 11 Global Step: 48060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:07,441-Speed 11420.88 samples/sec Loss 5.1742 LearningRate 0.0887 Epoch: 11 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:10,906-Speed 11824.25 samples/sec Loss 5.2062 LearningRate 0.0887 Epoch: 11 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:14,342-Speed 11922.95 samples/sec Loss 5.1712 LearningRate 0.0886 Epoch: 11 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:17,773-Speed 11943.28 samples/sec Loss 5.1623 LearningRate 0.0886 Epoch: 11 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:21,375-Speed 11374.69 samples/sec Loss 5.1675 LearningRate 0.0885 Epoch: 11 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:24,965-Speed 11413.33 samples/sec Loss 5.1294 LearningRate 0.0885 Epoch: 11 Global Step: 48120 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:46:28,574-Speed 11349.05 samples/sec Loss 5.1745 LearningRate 0.0884 Epoch: 11 Global Step: 48130 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:46:32,226-Speed 11219.14 samples/sec Loss 5.1645 LearningRate 0.0884 Epoch: 11 Global Step: 48140 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:46:35,963-Speed 10962.13 samples/sec Loss 5.1942 LearningRate 0.0883 Epoch: 11 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:39,706-Speed 10946.61 samples/sec Loss 5.1737 LearningRate 0.0883 Epoch: 11 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:43,153-Speed 11887.28 samples/sec Loss 5.1546 LearningRate 0.0882 Epoch: 11 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:46,823-Speed 11164.09 samples/sec Loss 5.1888 LearningRate 0.0882 Epoch: 11 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:50,341-Speed 11646.06 samples/sec Loss 5.1929 LearningRate 0.0881 Epoch: 11 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:53,743-Speed 12041.94 samples/sec Loss 5.1856 LearningRate 0.0881 Epoch: 11 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:46:57,571-Speed 10703.84 samples/sec Loss 5.2029 LearningRate 0.0880 Epoch: 11 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:01,085-Speed 11658.29 samples/sec Loss 5.2166 LearningRate 0.0880 Epoch: 11 Global Step: 48220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:05,017-Speed 10417.97 samples/sec Loss 5.1999 LearningRate 0.0879 Epoch: 11 Global Step: 48230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:08,532-Speed 11658.06 samples/sec Loss 5.1329 LearningRate 0.0879 Epoch: 11 Global Step: 48240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:12,068-Speed 11585.78 samples/sec Loss 5.1307 LearningRate 0.0878 Epoch: 11 Global Step: 48250 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:47:15,630-Speed 11502.22 samples/sec Loss 5.1650 LearningRate 0.0878 Epoch: 11 Global Step: 48260 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:47:19,254-Speed 11306.04 samples/sec Loss 5.1468 LearningRate 0.0877 Epoch: 11 Global Step: 48270 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:47:23,490-Speed 9671.88 samples/sec Loss 5.2038 LearningRate 0.0877 Epoch: 11 Global Step: 48280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:47:26,889-Speed 12054.31 samples/sec Loss 5.1750 LearningRate 0.0876 Epoch: 11 Global Step: 48290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:30,447-Speed 11515.04 samples/sec Loss 5.1697 LearningRate 0.0876 Epoch: 11 Global Step: 48300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:33,940-Speed 11727.53 samples/sec Loss 5.1489 LearningRate 0.0875 Epoch: 11 Global Step: 48310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:37,349-Speed 12017.24 samples/sec Loss 5.1500 LearningRate 0.0875 Epoch: 11 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:40,871-Speed 11635.47 samples/sec Loss 5.1415 LearningRate 0.0874 Epoch: 11 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:44,534-Speed 11181.88 samples/sec Loss 5.1291 LearningRate 0.0874 Epoch: 11 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:48,114-Speed 11446.70 samples/sec Loss 5.1338 LearningRate 0.0873 Epoch: 11 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:51,537-Speed 11969.51 samples/sec Loss 5.1903 LearningRate 0.0873 Epoch: 11 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:54,949-Speed 12009.22 samples/sec Loss 5.1752 LearningRate 0.0872 Epoch: 11 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:47:59,017-Speed 10068.96 samples/sec Loss 5.1450 LearningRate 0.0872 Epoch: 11 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:48:02,446-Speed 11948.39 samples/sec Loss 5.2272 LearningRate 0.0871 Epoch: 11 Global Step: 48390 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:06,089-Speed 11246.79 samples/sec Loss 5.1740 LearningRate 0.0871 Epoch: 11 Global Step: 48400 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:09,745-Speed 11206.69 samples/sec Loss 5.1867 LearningRate 0.0870 Epoch: 11 Global Step: 48410 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:13,214-Speed 11809.10 samples/sec Loss 5.1757 LearningRate 0.0870 Epoch: 11 Global Step: 48420 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:16,847-Speed 11277.84 samples/sec Loss 5.1621 LearningRate 0.0869 Epoch: 11 Global Step: 48430 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:20,557-Speed 11045.14 samples/sec Loss 5.1371 LearningRate 0.0869 Epoch: 11 Global Step: 48440 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:24,066-Speed 11675.55 samples/sec Loss 5.1705 LearningRate 0.0868 Epoch: 11 Global Step: 48450 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:27,653-Speed 11421.64 samples/sec Loss 5.0886 LearningRate 0.0868 Epoch: 11 Global Step: 48460 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:31,611-Speed 10367.23 samples/sec Loss 5.1496 LearningRate 0.0867 Epoch: 11 Global Step: 48470 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:35,475-Speed 10603.93 samples/sec Loss 5.2060 LearningRate 0.0867 Epoch: 11 Global Step: 48480 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:38,962-Speed 11746.66 samples/sec Loss 5.1785 LearningRate 0.0866 Epoch: 11 Global Step: 48490 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:42,480-Speed 11648.04 samples/sec Loss 5.1577 LearningRate 0.0866 Epoch: 11 Global Step: 48500 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:46,497-Speed 10200.03 samples/sec Loss 5.1553 LearningRate 0.0865 Epoch: 11 Global Step: 48510 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:48:49,956-Speed 11842.90 samples/sec Loss 5.1490 LearningRate 0.0865 Epoch: 11 Global Step: 48520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:48:53,431-Speed 11787.94 samples/sec Loss 5.1191 LearningRate 0.0864 Epoch: 11 Global Step: 48530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:48:57,207-Speed 10850.73 samples/sec Loss 5.1894 LearningRate 0.0864 Epoch: 11 Global Step: 48540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:00,658-Speed 11871.25 samples/sec Loss 5.1597 LearningRate 0.0863 Epoch: 11 Global Step: 48550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:04,100-Speed 11907.40 samples/sec Loss 5.1283 LearningRate 0.0863 Epoch: 11 Global Step: 48560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:07,504-Speed 12032.43 samples/sec Loss 5.1729 LearningRate 0.0862 Epoch: 11 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:10,920-Speed 11993.40 samples/sec Loss 5.2022 LearningRate 0.0862 Epoch: 11 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:14,518-Speed 11387.34 samples/sec Loss 5.1540 LearningRate 0.0861 Epoch: 11 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:17,983-Speed 11825.09 samples/sec Loss 5.1951 LearningRate 0.0861 Epoch: 11 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:22,388-Speed 9302.78 samples/sec Loss 5.1858 LearningRate 0.0860 Epoch: 11 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:26,055-Speed 11172.60 samples/sec Loss 5.1328 LearningRate 0.0860 Epoch: 11 Global Step: 48620 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:49:29,620-Speed 11491.26 samples/sec Loss 5.2079 LearningRate 0.0859 Epoch: 11 Global Step: 48630 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:49:33,223-Speed 11370.02 samples/sec Loss 5.1889 LearningRate 0.0859 Epoch: 11 Global Step: 48640 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:49:36,659-Speed 11924.39 samples/sec Loss 5.1765 LearningRate 0.0858 Epoch: 11 Global Step: 48650 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:49:40,214-Speed 11526.94 samples/sec Loss 5.1560 LearningRate 0.0858 Epoch: 11 Global Step: 48660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:49:43,866-Speed 11218.37 samples/sec Loss 5.1645 LearningRate 0.0858 Epoch: 11 Global Step: 48670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:47,484-Speed 11321.47 samples/sec Loss 5.1937 LearningRate 0.0857 Epoch: 11 Global Step: 48680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:51,061-Speed 11454.65 samples/sec Loss 5.1967 LearningRate 0.0857 Epoch: 11 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:54,840-Speed 10840.35 samples/sec Loss 5.1405 LearningRate 0.0856 Epoch: 11 Global Step: 48700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:49:58,270-Speed 11948.46 samples/sec Loss 5.1799 LearningRate 0.0856 Epoch: 11 Global Step: 48710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:01,704-Speed 11932.94 samples/sec Loss 5.1693 LearningRate 0.0855 Epoch: 11 Global Step: 48720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:05,229-Speed 11622.05 samples/sec Loss 5.1709 LearningRate 0.0855 Epoch: 11 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:08,832-Speed 11371.78 samples/sec Loss 5.1606 LearningRate 0.0854 Epoch: 11 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:12,304-Speed 11800.47 samples/sec Loss 5.1907 LearningRate 0.0854 Epoch: 11 Global Step: 48750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:15,927-Speed 11308.00 samples/sec Loss 5.1730 LearningRate 0.0853 Epoch: 11 Global Step: 48760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:19,605-Speed 11139.98 samples/sec Loss 5.1675 LearningRate 0.0853 Epoch: 11 Global Step: 48770 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:50:23,391-Speed 10820.89 samples/sec Loss 5.1257 LearningRate 0.0852 Epoch: 11 Global Step: 48780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:27,591-Speed 9755.67 samples/sec Loss 5.1705 LearningRate 0.0852 Epoch: 11 Global Step: 48790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:31,247-Speed 11206.76 samples/sec Loss 5.1420 LearningRate 0.0851 Epoch: 11 Global Step: 48800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:34,697-Speed 11876.26 samples/sec Loss 5.1581 LearningRate 0.0851 Epoch: 11 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:38,094-Speed 12062.76 samples/sec Loss 5.1796 LearningRate 0.0850 Epoch: 11 Global Step: 48820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:41,549-Speed 11856.37 samples/sec Loss 5.1216 LearningRate 0.0850 Epoch: 11 Global Step: 48830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:45,332-Speed 10830.91 samples/sec Loss 5.1435 LearningRate 0.0849 Epoch: 11 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:48,737-Speed 12032.21 samples/sec Loss 5.1629 LearningRate 0.0849 Epoch: 11 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:52,321-Speed 11431.13 samples/sec Loss 5.1406 LearningRate 0.0848 Epoch: 11 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:55,695-Speed 12145.57 samples/sec Loss 5.1556 LearningRate 0.0848 Epoch: 11 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:50:59,160-Speed 11824.74 samples/sec Loss 5.1941 LearningRate 0.0847 Epoch: 11 Global Step: 48880 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:51:02,570-Speed 12014.11 samples/sec Loss 5.1327 LearningRate 0.0847 Epoch: 11 Global Step: 48890 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:51:05,929-Speed 12200.37 samples/sec Loss 5.1401 LearningRate 0.0846 Epoch: 11 Global Step: 48900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:10,259-Speed 9460.21 samples/sec Loss 5.1746 LearningRate 0.0846 Epoch: 11 Global Step: 48910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:13,815-Speed 11522.04 samples/sec Loss 5.1314 LearningRate 0.0845 Epoch: 11 Global Step: 48920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:17,375-Speed 11508.74 samples/sec Loss 5.1576 LearningRate 0.0845 Epoch: 11 Global Step: 48930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:20,848-Speed 11798.98 samples/sec Loss 5.1553 LearningRate 0.0844 Epoch: 11 Global Step: 48940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:24,638-Speed 10808.38 samples/sec Loss 5.0823 LearningRate 0.0844 Epoch: 11 Global Step: 48950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:28,203-Speed 11494.17 samples/sec Loss 5.1159 LearningRate 0.0843 Epoch: 11 Global Step: 48960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:31,796-Speed 11403.96 samples/sec Loss 5.1338 LearningRate 0.0843 Epoch: 11 Global Step: 48970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:35,394-Speed 11387.24 samples/sec Loss 5.1460 LearningRate 0.0842 Epoch: 11 Global Step: 48980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:38,842-Speed 11881.55 samples/sec Loss 5.1292 LearningRate 0.0842 Epoch: 11 Global Step: 48990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:51:42,409-Speed 11484.32 samples/sec Loss 5.1787 LearningRate 0.0841 Epoch: 11 Global Step: 49000 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:51:45,864-Speed 11859.29 samples/sec Loss 5.1711 LearningRate 0.0841 Epoch: 11 Global Step: 49010 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:51:49,477-Speed 11342.10 samples/sec Loss 5.2065 LearningRate 0.0840 Epoch: 11 Global Step: 49020 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:51:53,016-Speed 11577.61 samples/sec Loss 5.1467 LearningRate 0.0840 Epoch: 11 Global Step: 49030 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:51:56,548-Speed 11598.33 samples/sec Loss 5.1540 LearningRate 0.0839 Epoch: 11 Global Step: 49040 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:52:00,346-Speed 10788.09 samples/sec Loss 5.1648 LearningRate 0.0839 Epoch: 11 Global Step: 49050 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:52:03,761-Speed 11996.84 samples/sec Loss 5.1755 LearningRate 0.0838 Epoch: 11 Global Step: 49060 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:52:07,294-Speed 11594.71 samples/sec Loss 5.1355 LearningRate 0.0838 Epoch: 11 Global Step: 49070 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:52:11,351-Speed 10099.11 samples/sec Loss 5.1188 LearningRate 0.0837 Epoch: 11 Global Step: 49080 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:52:14,816-Speed 11824.27 samples/sec Loss 5.1781 LearningRate 0.0837 Epoch: 11 Global Step: 49090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:18,559-Speed 10945.35 samples/sec Loss 5.1519 LearningRate 0.0836 Epoch: 11 Global Step: 49100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:22,155-Speed 11393.73 samples/sec Loss 5.2086 LearningRate 0.0836 Epoch: 11 Global Step: 49110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:26,276-Speed 9942.63 samples/sec Loss 5.1665 LearningRate 0.0835 Epoch: 11 Global Step: 49120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:30,091-Speed 10740.20 samples/sec Loss 5.1131 LearningRate 0.0835 Epoch: 11 Global Step: 49130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:33,733-Speed 11247.82 samples/sec Loss 5.0833 LearningRate 0.0834 Epoch: 11 Global Step: 49140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:37,296-Speed 11500.51 samples/sec Loss 5.1936 LearningRate 0.0834 Epoch: 11 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:40,903-Speed 11358.23 samples/sec Loss 5.1666 LearningRate 0.0834 Epoch: 11 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:44,418-Speed 11652.72 samples/sec Loss 5.1458 LearningRate 0.0833 Epoch: 11 Global Step: 49170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:48,218-Speed 10780.62 samples/sec Loss 5.1396 LearningRate 0.0833 Epoch: 11 Global Step: 49180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:52:51,804-Speed 11427.51 samples/sec Loss 5.1653 LearningRate 0.0832 Epoch: 11 Global Step: 49190 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:52:55,369-Speed 11493.98 samples/sec Loss 5.1612 LearningRate 0.0832 Epoch: 11 Global Step: 49200 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:52:59,061-Speed 11097.14 samples/sec Loss 5.1599 LearningRate 0.0831 Epoch: 11 Global Step: 49210 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:53:02,523-Speed 11835.84 samples/sec Loss 5.1948 LearningRate 0.0831 Epoch: 11 Global Step: 49220 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:53:06,362-Speed 10676.09 samples/sec Loss 5.1409 LearningRate 0.0830 Epoch: 11 Global Step: 49230 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:53:09,945-Speed 11432.51 samples/sec Loss 5.1752 LearningRate 0.0830 Epoch: 11 Global Step: 49240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:13,377-Speed 11938.46 samples/sec Loss 5.1346 LearningRate 0.0829 Epoch: 11 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:17,625-Speed 9642.55 samples/sec Loss 5.2296 LearningRate 0.0829 Epoch: 11 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:21,343-Speed 11022.47 samples/sec Loss 5.1891 LearningRate 0.0828 Epoch: 11 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:24,948-Speed 11364.33 samples/sec Loss 5.1291 LearningRate 0.0828 Epoch: 11 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:28,346-Speed 12058.52 samples/sec Loss 5.1556 LearningRate 0.0827 Epoch: 11 Global Step: 49290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:32,836-Speed 9126.49 samples/sec Loss 5.1274 LearningRate 0.0827 Epoch: 11 Global Step: 49300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:36,478-Speed 11248.98 samples/sec Loss 5.1631 LearningRate 0.0826 Epoch: 11 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:39,929-Speed 11871.53 samples/sec Loss 5.1949 LearningRate 0.0826 Epoch: 11 Global Step: 49320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:43,579-Speed 11225.92 samples/sec Loss 5.1868 LearningRate 0.0825 Epoch: 11 Global Step: 49330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:53:47,043-Speed 11826.37 samples/sec Loss 5.1441 LearningRate 0.0825 Epoch: 11 Global Step: 49340 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:53:50,498-Speed 11859.27 samples/sec Loss 5.1370 LearningRate 0.0824 Epoch: 11 Global Step: 49350 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:53:54,133-Speed 11269.27 samples/sec Loss 5.1847 LearningRate 0.0824 Epoch: 11 Global Step: 49360 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:53:58,010-Speed 10568.57 samples/sec Loss 5.1544 LearningRate 0.0823 Epoch: 11 Global Step: 49370 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:54:01,555-Speed 11555.22 samples/sec Loss 5.1839 LearningRate 0.0823 Epoch: 11 Global Step: 49380 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:54:05,690-Speed 9908.79 samples/sec Loss 5.1790 LearningRate 0.0822 Epoch: 11 Global Step: 49390 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:54:09,165-Speed 11789.84 samples/sec Loss 5.1665 LearningRate 0.0822 Epoch: 11 Global Step: 49400 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:54:12,848-Speed 11125.32 samples/sec Loss 5.0960 LearningRate 0.0821 Epoch: 11 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:16,281-Speed 11935.78 samples/sec Loss 5.1277 LearningRate 0.0821 Epoch: 11 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:19,855-Speed 11462.54 samples/sec Loss 5.1838 LearningRate 0.0820 Epoch: 11 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:23,453-Speed 11386.64 samples/sec Loss 5.1804 LearningRate 0.0820 Epoch: 11 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:27,070-Speed 11327.54 samples/sec Loss 5.1483 LearningRate 0.0819 Epoch: 11 Global Step: 49450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:30,721-Speed 11222.76 samples/sec Loss 5.1189 LearningRate 0.0819 Epoch: 11 Global Step: 49460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:34,159-Speed 11915.29 samples/sec Loss 5.1822 LearningRate 0.0818 Epoch: 11 Global Step: 49470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:37,919-Speed 10898.58 samples/sec Loss 5.1729 LearningRate 0.0818 Epoch: 11 Global Step: 49480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:41,566-Speed 11231.07 samples/sec Loss 5.1431 LearningRate 0.0818 Epoch: 11 Global Step: 49490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:45,333-Speed 10876.46 samples/sec Loss 5.1831 LearningRate 0.0817 Epoch: 11 Global Step: 49500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:54:49,038-Speed 11058.03 samples/sec Loss 5.1615 LearningRate 0.0817 Epoch: 11 Global Step: 49510 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:54:52,811-Speed 10858.48 samples/sec Loss 5.1235 LearningRate 0.0816 Epoch: 11 Global Step: 49520 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:54:56,531-Speed 11012.70 samples/sec Loss 5.1495 LearningRate 0.0816 Epoch: 11 Global Step: 49530 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:55:00,096-Speed 11492.49 samples/sec Loss 5.1442 LearningRate 0.0815 Epoch: 11 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:03,551-Speed 11857.59 samples/sec Loss 5.0956 LearningRate 0.0815 Epoch: 11 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:06,980-Speed 11950.92 samples/sec Loss 5.1486 LearningRate 0.0814 Epoch: 11 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:10,400-Speed 11992.07 samples/sec Loss 5.0745 LearningRate 0.0814 Epoch: 11 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:14,182-Speed 10832.07 samples/sec Loss 5.1245 LearningRate 0.0813 Epoch: 11 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:17,710-Speed 11613.50 samples/sec Loss 5.1454 LearningRate 0.0813 Epoch: 11 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:21,600-Speed 10530.94 samples/sec Loss 5.1244 LearningRate 0.0812 Epoch: 11 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:25,124-Speed 11627.51 samples/sec Loss 5.0937 LearningRate 0.0812 Epoch: 11 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:28,687-Speed 11500.81 samples/sec Loss 5.1517 LearningRate 0.0811 Epoch: 11 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:32,496-Speed 10755.95 samples/sec Loss 5.1203 LearningRate 0.0811 Epoch: 11 Global Step: 49630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:35,934-Speed 11915.57 samples/sec Loss 5.1192 LearningRate 0.0810 Epoch: 11 Global Step: 49640 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:55:39,323-Speed 12089.54 samples/sec Loss 5.0896 LearningRate 0.0810 Epoch: 11 Global Step: 49650 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:55:43,174-Speed 10636.54 samples/sec Loss 5.1892 LearningRate 0.0809 Epoch: 11 Global Step: 49660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:55:46,599-Speed 11964.54 samples/sec Loss 5.1457 LearningRate 0.0809 Epoch: 11 Global Step: 49670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:55:50,019-Speed 11980.00 samples/sec Loss 5.1638 LearningRate 0.0808 Epoch: 11 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:53,681-Speed 11189.64 samples/sec Loss 5.1366 LearningRate 0.0808 Epoch: 11 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:55:57,267-Speed 11428.68 samples/sec Loss 5.1512 LearningRate 0.0807 Epoch: 11 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:00,830-Speed 11497.63 samples/sec Loss 5.1894 LearningRate 0.0807 Epoch: 11 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:04,478-Speed 11229.54 samples/sec Loss 5.1519 LearningRate 0.0806 Epoch: 11 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:08,090-Speed 11347.06 samples/sec Loss 5.1500 LearningRate 0.0806 Epoch: 11 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:11,651-Speed 11503.26 samples/sec Loss 5.1038 LearningRate 0.0806 Epoch: 11 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:15,454-Speed 10773.58 samples/sec Loss 5.1248 LearningRate 0.0805 Epoch: 11 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:19,265-Speed 10749.71 samples/sec Loss 5.0721 LearningRate 0.0805 Epoch: 11 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:22,918-Speed 11216.35 samples/sec Loss 5.1328 LearningRate 0.0804 Epoch: 11 Global Step: 49770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:26,400-Speed 11763.21 samples/sec Loss 5.1196 LearningRate 0.0804 Epoch: 11 Global Step: 49780 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:56:29,901-Speed 11705.36 samples/sec Loss 5.1250 LearningRate 0.0803 Epoch: 11 Global Step: 49790 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:56:33,376-Speed 11792.26 samples/sec Loss 5.1468 LearningRate 0.0803 Epoch: 11 Global Step: 49800 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:56:36,863-Speed 11749.70 samples/sec Loss 5.1823 LearningRate 0.0802 Epoch: 11 Global Step: 49810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:40,371-Speed 11679.23 samples/sec Loss 5.1569 LearningRate 0.0802 Epoch: 11 Global Step: 49820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:44,340-Speed 10322.69 samples/sec Loss 5.1024 LearningRate 0.0801 Epoch: 11 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:47,748-Speed 12023.19 samples/sec Loss 5.1212 LearningRate 0.0801 Epoch: 11 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:51,223-Speed 11790.33 samples/sec Loss 5.1561 LearningRate 0.0800 Epoch: 11 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:54,796-Speed 11464.85 samples/sec Loss 5.1462 LearningRate 0.0800 Epoch: 11 Global Step: 49860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:56:58,749-Speed 10364.57 samples/sec Loss 5.1878 LearningRate 0.0799 Epoch: 11 Global Step: 49870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:02,395-Speed 11238.77 samples/sec Loss 5.0835 LearningRate 0.0799 Epoch: 11 Global Step: 49880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:06,055-Speed 11195.54 samples/sec Loss 5.0829 LearningRate 0.0798 Epoch: 11 Global Step: 49890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:09,466-Speed 12009.81 samples/sec Loss 5.1561 LearningRate 0.0798 Epoch: 11 Global Step: 49900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:12,865-Speed 12052.39 samples/sec Loss 5.1512 LearningRate 0.0797 Epoch: 11 Global Step: 49910 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:57:16,631-Speed 10881.48 samples/sec Loss 5.1304 LearningRate 0.0797 Epoch: 11 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:20,186-Speed 11523.70 samples/sec Loss 5.1494 LearningRate 0.0796 Epoch: 11 Global Step: 49930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:23,651-Speed 11824.61 samples/sec Loss 5.1567 LearningRate 0.0796 Epoch: 11 Global Step: 49940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:27,046-Speed 12068.57 samples/sec Loss 5.1256 LearningRate 0.0796 Epoch: 11 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:30,528-Speed 11767.17 samples/sec Loss 5.1396 LearningRate 0.0795 Epoch: 11 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:34,089-Speed 11504.84 samples/sec Loss 5.0395 LearningRate 0.0795 Epoch: 11 Global Step: 49970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:37,636-Speed 11550.90 samples/sec Loss 5.1427 LearningRate 0.0794 Epoch: 11 Global Step: 49980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:41,442-Speed 10764.73 samples/sec Loss 5.1505 LearningRate 0.0794 Epoch: 11 Global Step: 49990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:57:45,126-Speed 11121.30 samples/sec Loss 5.1727 LearningRate 0.0793 Epoch: 11 Global Step: 50000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:58:06,519-[lfw][50000]XNorm: 9.135563 Training: 2022-01-17 02:58:06,519-[lfw][50000]Accuracy-Flip: 0.99667+-0.00325 Training: 2022-01-17 02:58:06,519-[lfw][50000]Accuracy-Highest: 0.99667 Training: 2022-01-17 02:58:31,122-[cfp_fp][50000]XNorm: 7.690396 Training: 2022-01-17 02:58:31,123-[cfp_fp][50000]Accuracy-Flip: 0.96914+-0.00858 Training: 2022-01-17 02:58:31,124-[cfp_fp][50000]Accuracy-Highest: 0.97014 Training: 2022-01-17 02:58:52,391-[agedb_30][50000]XNorm: 8.826701 Training: 2022-01-17 02:58:52,391-[agedb_30][50000]Accuracy-Flip: 0.96717+-0.00895 Training: 2022-01-17 02:58:52,392-[agedb_30][50000]Accuracy-Highest: 0.96717 Training: 2022-01-17 02:58:55,771-Speed 579.81 samples/sec Loss 5.0925 LearningRate 0.0793 Epoch: 11 Global Step: 50010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:58:59,156-Speed 12102.31 samples/sec Loss 5.1289 LearningRate 0.0792 Epoch: 11 Global Step: 50020 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:59:02,542-Speed 12100.27 samples/sec Loss 5.1185 LearningRate 0.0792 Epoch: 11 Global Step: 50030 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:59:05,898-Speed 12208.09 samples/sec Loss 5.1953 LearningRate 0.0791 Epoch: 11 Global Step: 50040 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:59:09,332-Speed 11930.70 samples/sec Loss 5.1010 LearningRate 0.0791 Epoch: 11 Global Step: 50050 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:59:14,196-Speed 8423.70 samples/sec Loss 5.0971 LearningRate 0.0790 Epoch: 11 Global Step: 50060 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 02:59:47,455-Speed 1231.61 samples/sec Loss 4.6872 LearningRate 0.0790 Epoch: 12 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:59:51,544-Speed 10018.10 samples/sec Loss 4.3835 LearningRate 0.0789 Epoch: 12 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:59:55,812-Speed 9599.32 samples/sec Loss 4.3846 LearningRate 0.0789 Epoch: 12 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 02:59:59,632-Speed 10726.29 samples/sec Loss 4.3668 LearningRate 0.0788 Epoch: 12 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:00:03,196-Speed 11493.38 samples/sec Loss 4.3840 LearningRate 0.0788 Epoch: 12 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:00:06,805-Speed 11352.01 samples/sec Loss 4.3804 LearningRate 0.0787 Epoch: 12 Global Step: 50120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:00:10,964-Speed 9860.40 samples/sec Loss 4.4211 LearningRate 0.0787 Epoch: 12 Global Step: 50130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:00:14,811-Speed 10651.09 samples/sec Loss 4.3943 LearningRate 0.0787 Epoch: 12 Global Step: 50140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:00:18,255-Speed 11894.17 samples/sec Loss 4.4215 LearningRate 0.0786 Epoch: 12 Global Step: 50150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:00:21,955-Speed 11076.03 samples/sec Loss 4.3895 LearningRate 0.0786 Epoch: 12 Global Step: 50160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:00:25,508-Speed 11530.67 samples/sec Loss 4.4377 LearningRate 0.0785 Epoch: 12 Global Step: 50170 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:29,217-Speed 11045.44 samples/sec Loss 4.4714 LearningRate 0.0785 Epoch: 12 Global Step: 50180 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:32,719-Speed 11705.18 samples/sec Loss 4.4087 LearningRate 0.0784 Epoch: 12 Global Step: 50190 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:36,175-Speed 11855.63 samples/sec Loss 4.4355 LearningRate 0.0784 Epoch: 12 Global Step: 50200 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:39,596-Speed 11978.39 samples/sec Loss 4.4174 LearningRate 0.0783 Epoch: 12 Global Step: 50210 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:43,797-Speed 9751.02 samples/sec Loss 4.4853 LearningRate 0.0783 Epoch: 12 Global Step: 50220 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:47,273-Speed 11790.71 samples/sec Loss 4.4695 LearningRate 0.0782 Epoch: 12 Global Step: 50230 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:50,724-Speed 11872.01 samples/sec Loss 4.4679 LearningRate 0.0782 Epoch: 12 Global Step: 50240 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:54,347-Speed 11308.18 samples/sec Loss 4.4578 LearningRate 0.0781 Epoch: 12 Global Step: 50250 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:00:58,081-Speed 10970.21 samples/sec Loss 4.4427 LearningRate 0.0781 Epoch: 12 Global Step: 50260 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:01:01,737-Speed 11206.64 samples/sec Loss 4.4903 LearningRate 0.0780 Epoch: 12 Global Step: 50270 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:01:05,164-Speed 11956.17 samples/sec Loss 4.4814 LearningRate 0.0780 Epoch: 12 Global Step: 50280 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:01:08,749-Speed 11430.79 samples/sec Loss 4.5472 LearningRate 0.0779 Epoch: 12 Global Step: 50290 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:01:12,324-Speed 11458.77 samples/sec Loss 4.4790 LearningRate 0.0779 Epoch: 12 Global Step: 50300 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:01:15,796-Speed 11804.01 samples/sec Loss 4.5172 LearningRate 0.0779 Epoch: 12 Global Step: 50310 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:01:19,388-Speed 11404.82 samples/sec Loss 4.4775 LearningRate 0.0778 Epoch: 12 Global Step: 50320 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:01:22,955-Speed 11487.61 samples/sec Loss 4.5314 LearningRate 0.0778 Epoch: 12 Global Step: 50330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:26,499-Speed 11559.88 samples/sec Loss 4.5170 LearningRate 0.0777 Epoch: 12 Global Step: 50340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:30,083-Speed 11431.27 samples/sec Loss 4.5492 LearningRate 0.0777 Epoch: 12 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:33,550-Speed 11816.62 samples/sec Loss 4.5161 LearningRate 0.0776 Epoch: 12 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:37,030-Speed 11771.82 samples/sec Loss 4.5177 LearningRate 0.0776 Epoch: 12 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:40,980-Speed 10371.12 samples/sec Loss 4.5273 LearningRate 0.0775 Epoch: 12 Global Step: 50380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:44,760-Speed 10841.16 samples/sec Loss 4.5210 LearningRate 0.0775 Epoch: 12 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:48,329-Speed 11478.28 samples/sec Loss 4.5446 LearningRate 0.0774 Epoch: 12 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:51,919-Speed 11414.04 samples/sec Loss 4.5310 LearningRate 0.0774 Epoch: 12 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:55,439-Speed 11639.46 samples/sec Loss 4.5224 LearningRate 0.0773 Epoch: 12 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:01:59,055-Speed 11327.07 samples/sec Loss 4.6025 LearningRate 0.0773 Epoch: 12 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:02,511-Speed 11855.91 samples/sec Loss 4.5322 LearningRate 0.0772 Epoch: 12 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:05,943-Speed 11950.70 samples/sec Loss 4.5688 LearningRate 0.0772 Epoch: 12 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:09,403-Speed 11840.19 samples/sec Loss 4.5876 LearningRate 0.0771 Epoch: 12 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:12,821-Speed 11986.91 samples/sec Loss 4.5959 LearningRate 0.0771 Epoch: 12 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:16,207-Speed 12097.61 samples/sec Loss 4.5606 LearningRate 0.0771 Epoch: 12 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:19,700-Speed 11732.86 samples/sec Loss 4.5973 LearningRate 0.0770 Epoch: 12 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:23,203-Speed 11693.89 samples/sec Loss 4.6168 LearningRate 0.0770 Epoch: 12 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:26,637-Speed 11950.18 samples/sec Loss 4.6133 LearningRate 0.0769 Epoch: 12 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:30,260-Speed 11305.32 samples/sec Loss 4.6205 LearningRate 0.0769 Epoch: 12 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:33,783-Speed 11629.78 samples/sec Loss 4.6032 LearningRate 0.0768 Epoch: 12 Global Step: 50530 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:02:37,769-Speed 10278.74 samples/sec Loss 4.6731 LearningRate 0.0768 Epoch: 12 Global Step: 50540 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:02:41,628-Speed 10619.29 samples/sec Loss 4.6299 LearningRate 0.0767 Epoch: 12 Global Step: 50550 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:02:45,320-Speed 11098.09 samples/sec Loss 4.6191 LearningRate 0.0767 Epoch: 12 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:48,815-Speed 11719.30 samples/sec Loss 4.6193 LearningRate 0.0766 Epoch: 12 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:52,530-Speed 11028.53 samples/sec Loss 4.5663 LearningRate 0.0766 Epoch: 12 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:02:56,439-Speed 10482.24 samples/sec Loss 4.6447 LearningRate 0.0765 Epoch: 12 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:00,066-Speed 11294.21 samples/sec Loss 4.6269 LearningRate 0.0765 Epoch: 12 Global Step: 50600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:03,782-Speed 11025.61 samples/sec Loss 4.6417 LearningRate 0.0764 Epoch: 12 Global Step: 50610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:07,599-Speed 10732.12 samples/sec Loss 4.6059 LearningRate 0.0764 Epoch: 12 Global Step: 50620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:10,997-Speed 12057.92 samples/sec Loss 4.6252 LearningRate 0.0764 Epoch: 12 Global Step: 50630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:14,987-Speed 10268.20 samples/sec Loss 4.6567 LearningRate 0.0763 Epoch: 12 Global Step: 50640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:18,701-Speed 11032.71 samples/sec Loss 4.6221 LearningRate 0.0763 Epoch: 12 Global Step: 50650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:22,306-Speed 11364.91 samples/sec Loss 4.6524 LearningRate 0.0762 Epoch: 12 Global Step: 50660 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:03:25,806-Speed 11705.32 samples/sec Loss 4.6963 LearningRate 0.0762 Epoch: 12 Global Step: 50670 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:03:29,538-Speed 10978.28 samples/sec Loss 4.6370 LearningRate 0.0761 Epoch: 12 Global Step: 50680 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:03:32,996-Speed 11847.57 samples/sec Loss 4.6692 LearningRate 0.0761 Epoch: 12 Global Step: 50690 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:03:36,552-Speed 11523.79 samples/sec Loss 4.6189 LearningRate 0.0760 Epoch: 12 Global Step: 50700 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:03:40,382-Speed 10696.57 samples/sec Loss 4.6610 LearningRate 0.0760 Epoch: 12 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:43,998-Speed 11331.87 samples/sec Loss 4.6791 LearningRate 0.0759 Epoch: 12 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:47,678-Speed 11132.92 samples/sec Loss 4.7150 LearningRate 0.0759 Epoch: 12 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:51,132-Speed 11862.16 samples/sec Loss 4.7126 LearningRate 0.0758 Epoch: 12 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:54,591-Speed 11845.36 samples/sec Loss 4.6768 LearningRate 0.0758 Epoch: 12 Global Step: 50750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:03:58,021-Speed 11944.55 samples/sec Loss 4.6438 LearningRate 0.0758 Epoch: 12 Global Step: 50760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:04:01,515-Speed 11726.32 samples/sec Loss 4.6652 LearningRate 0.0757 Epoch: 12 Global Step: 50770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:04:05,373-Speed 10620.37 samples/sec Loss 4.6784 LearningRate 0.0757 Epoch: 12 Global Step: 50780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:04:08,907-Speed 11591.51 samples/sec Loss 4.6812 LearningRate 0.0756 Epoch: 12 Global Step: 50790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:04:12,706-Speed 10783.05 samples/sec Loss 4.7379 LearningRate 0.0756 Epoch: 12 Global Step: 50800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:04:16,140-Speed 11930.82 samples/sec Loss 4.7098 LearningRate 0.0755 Epoch: 12 Global Step: 50810 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:04:19,598-Speed 11849.14 samples/sec Loss 4.6742 LearningRate 0.0755 Epoch: 12 Global Step: 50820 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:04:23,021-Speed 11971.22 samples/sec Loss 4.6875 LearningRate 0.0754 Epoch: 12 Global Step: 50830 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:04:26,708-Speed 11109.41 samples/sec Loss 4.7022 LearningRate 0.0754 Epoch: 12 Global Step: 50840 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-01-17 03:04:30,826-Speed 9948.62 samples/sec Loss 4.7243 LearningRate 0.0753 Epoch: 12 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-17 03:04:34,461-Speed 11269.58 samples/sec Loss 4.6804 LearningRate 0.0753 Epoch: 12 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:04:37,909-Speed 11884.05 samples/sec Loss 4.6723 LearningRate 0.0752 Epoch: 12 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:04:41,686-Speed 10847.43 samples/sec Loss 4.7317 LearningRate 0.0752 Epoch: 12 Global Step: 50880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:04:45,207-Speed 11637.73 samples/sec Loss 4.6715 LearningRate 0.0751 Epoch: 12 Global Step: 50890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:04:48,877-Speed 11160.50 samples/sec Loss 4.7290 LearningRate 0.0751 Epoch: 12 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:04:52,531-Speed 11215.22 samples/sec Loss 4.7293 LearningRate 0.0751 Epoch: 12 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:04:56,131-Speed 11378.34 samples/sec Loss 4.7192 LearningRate 0.0750 Epoch: 12 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:04:59,619-Speed 11747.35 samples/sec Loss 4.7387 LearningRate 0.0750 Epoch: 12 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:03,125-Speed 11686.79 samples/sec Loss 4.7324 LearningRate 0.0749 Epoch: 12 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:06,845-Speed 11011.86 samples/sec Loss 4.7169 LearningRate 0.0749 Epoch: 12 Global Step: 50950 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:05:10,413-Speed 11481.02 samples/sec Loss 4.7317 LearningRate 0.0748 Epoch: 12 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:14,179-Speed 10879.31 samples/sec Loss 4.7260 LearningRate 0.0748 Epoch: 12 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:18,021-Speed 10667.01 samples/sec Loss 4.7144 LearningRate 0.0747 Epoch: 12 Global Step: 50980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:21,465-Speed 11896.78 samples/sec Loss 4.7003 LearningRate 0.0747 Epoch: 12 Global Step: 50990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:25,175-Speed 11042.54 samples/sec Loss 4.7435 LearningRate 0.0746 Epoch: 12 Global Step: 51000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:28,864-Speed 11104.49 samples/sec Loss 4.7628 LearningRate 0.0746 Epoch: 12 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:32,302-Speed 11915.89 samples/sec Loss 4.7828 LearningRate 0.0746 Epoch: 12 Global Step: 51020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:35,933-Speed 11286.38 samples/sec Loss 4.7899 LearningRate 0.0745 Epoch: 12 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:39,363-Speed 11943.75 samples/sec Loss 4.7281 LearningRate 0.0745 Epoch: 12 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:43,056-Speed 11094.64 samples/sec Loss 4.7405 LearningRate 0.0744 Epoch: 12 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:47,289-Speed 9676.78 samples/sec Loss 4.6965 LearningRate 0.0744 Epoch: 12 Global Step: 51060 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:05:50,688-Speed 12058.51 samples/sec Loss 4.8332 LearningRate 0.0743 Epoch: 12 Global Step: 51070 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:05:54,305-Speed 11327.83 samples/sec Loss 4.7770 LearningRate 0.0743 Epoch: 12 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:05:57,901-Speed 11390.49 samples/sec Loss 4.7002 LearningRate 0.0742 Epoch: 12 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:01,427-Speed 11619.75 samples/sec Loss 4.7303 LearningRate 0.0742 Epoch: 12 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:04,965-Speed 11581.53 samples/sec Loss 4.7729 LearningRate 0.0741 Epoch: 12 Global Step: 51110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:08,587-Speed 11308.27 samples/sec Loss 4.7787 LearningRate 0.0741 Epoch: 12 Global Step: 51120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:12,066-Speed 11778.99 samples/sec Loss 4.7961 LearningRate 0.0740 Epoch: 12 Global Step: 51130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:15,777-Speed 11039.41 samples/sec Loss 4.7577 LearningRate 0.0740 Epoch: 12 Global Step: 51140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:19,382-Speed 11365.25 samples/sec Loss 4.7399 LearningRate 0.0740 Epoch: 12 Global Step: 51150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:23,174-Speed 10804.11 samples/sec Loss 4.7874 LearningRate 0.0739 Epoch: 12 Global Step: 51160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:26,655-Speed 11772.06 samples/sec Loss 4.7968 LearningRate 0.0739 Epoch: 12 Global Step: 51170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:06:30,127-Speed 11801.02 samples/sec Loss 4.7534 LearningRate 0.0738 Epoch: 12 Global Step: 51180 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:33,692-Speed 11491.86 samples/sec Loss 4.7065 LearningRate 0.0738 Epoch: 12 Global Step: 51190 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:37,365-Speed 11154.57 samples/sec Loss 4.7545 LearningRate 0.0737 Epoch: 12 Global Step: 51200 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:41,026-Speed 11191.81 samples/sec Loss 4.7692 LearningRate 0.0737 Epoch: 12 Global Step: 51210 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:45,248-Speed 9704.56 samples/sec Loss 4.7410 LearningRate 0.0736 Epoch: 12 Global Step: 51220 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:48,675-Speed 11957.14 samples/sec Loss 4.7861 LearningRate 0.0736 Epoch: 12 Global Step: 51230 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:52,125-Speed 11878.27 samples/sec Loss 4.7738 LearningRate 0.0735 Epoch: 12 Global Step: 51240 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:55,568-Speed 11900.12 samples/sec Loss 4.7319 LearningRate 0.0735 Epoch: 12 Global Step: 51250 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:06:59,147-Speed 11444.94 samples/sec Loss 4.7898 LearningRate 0.0735 Epoch: 12 Global Step: 51260 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:07:02,998-Speed 10641.29 samples/sec Loss 4.7796 LearningRate 0.0734 Epoch: 12 Global Step: 51270 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:07:06,634-Speed 11267.43 samples/sec Loss 4.7868 LearningRate 0.0734 Epoch: 12 Global Step: 51280 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:07:10,408-Speed 10855.56 samples/sec Loss 4.7819 LearningRate 0.0733 Epoch: 12 Global Step: 51290 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:07:13,852-Speed 11894.59 samples/sec Loss 4.8198 LearningRate 0.0733 Epoch: 12 Global Step: 51300 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:07:17,550-Speed 11080.74 samples/sec Loss 4.7845 LearningRate 0.0732 Epoch: 12 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:07:21,125-Speed 11457.44 samples/sec Loss 4.7657 LearningRate 0.0732 Epoch: 12 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:07:24,572-Speed 11889.27 samples/sec Loss 4.7733 LearningRate 0.0731 Epoch: 12 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:07:28,143-Speed 11471.90 samples/sec Loss 4.7781 LearningRate 0.0731 Epoch: 12 Global Step: 51340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:07:31,594-Speed 11875.25 samples/sec Loss 4.7938 LearningRate 0.0730 Epoch: 12 Global Step: 51350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:07:35,213-Speed 11321.73 samples/sec Loss 4.7780 LearningRate 0.0730 Epoch: 12 Global Step: 51360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:07:38,707-Speed 11722.78 samples/sec Loss 4.7791 LearningRate 0.0729 Epoch: 12 Global Step: 51370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:07:42,672-Speed 10331.70 samples/sec Loss 4.8100 LearningRate 0.0729 Epoch: 12 Global Step: 51380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:07:46,143-Speed 11806.24 samples/sec Loss 4.8127 LearningRate 0.0729 Epoch: 12 Global Step: 51390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:07:49,944-Speed 10779.01 samples/sec Loss 4.8083 LearningRate 0.0728 Epoch: 12 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:07:53,462-Speed 11643.71 samples/sec Loss 4.8101 LearningRate 0.0728 Epoch: 12 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:07:56,914-Speed 11868.19 samples/sec Loss 4.8095 LearningRate 0.0727 Epoch: 12 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:08:00,387-Speed 11799.07 samples/sec Loss 4.7791 LearningRate 0.0727 Epoch: 12 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:08:04,010-Speed 11307.47 samples/sec Loss 4.7941 LearningRate 0.0726 Epoch: 12 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:08:07,449-Speed 11915.85 samples/sec Loss 4.8272 LearningRate 0.0726 Epoch: 12 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:08:10,817-Speed 12163.04 samples/sec Loss 4.8055 LearningRate 0.0725 Epoch: 12 Global Step: 51460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:14,267-Speed 11873.54 samples/sec Loss 4.7840 LearningRate 0.0725 Epoch: 12 Global Step: 51470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:17,694-Speed 11957.25 samples/sec Loss 4.8580 LearningRate 0.0725 Epoch: 12 Global Step: 51480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:21,408-Speed 11030.74 samples/sec Loss 4.7580 LearningRate 0.0724 Epoch: 12 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:25,067-Speed 11197.23 samples/sec Loss 4.7956 LearningRate 0.0724 Epoch: 12 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:28,660-Speed 11403.51 samples/sec Loss 4.7673 LearningRate 0.0723 Epoch: 12 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:32,176-Speed 11652.98 samples/sec Loss 4.7965 LearningRate 0.0723 Epoch: 12 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:36,306-Speed 9919.96 samples/sec Loss 4.8483 LearningRate 0.0722 Epoch: 12 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:39,709-Speed 12040.83 samples/sec Loss 4.7947 LearningRate 0.0722 Epoch: 12 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:44,337-Speed 8851.90 samples/sec Loss 4.7687 LearningRate 0.0721 Epoch: 12 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:47,924-Speed 11422.48 samples/sec Loss 4.8028 LearningRate 0.0721 Epoch: 12 Global Step: 51560 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:08:51,399-Speed 11787.57 samples/sec Loss 4.7349 LearningRate 0.0720 Epoch: 12 Global Step: 51570 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:08:55,023-Speed 11305.75 samples/sec Loss 4.8293 LearningRate 0.0720 Epoch: 12 Global Step: 51580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:08:58,493-Speed 11808.06 samples/sec Loss 4.8030 LearningRate 0.0720 Epoch: 12 Global Step: 51590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:01,946-Speed 11866.14 samples/sec Loss 4.7573 LearningRate 0.0719 Epoch: 12 Global Step: 51600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:05,423-Speed 11785.50 samples/sec Loss 4.7860 LearningRate 0.0719 Epoch: 12 Global Step: 51610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:09,303-Speed 10557.83 samples/sec Loss 4.8001 LearningRate 0.0718 Epoch: 12 Global Step: 51620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:12,893-Speed 11411.27 samples/sec Loss 4.8463 LearningRate 0.0718 Epoch: 12 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:16,326-Speed 11935.75 samples/sec Loss 4.8277 LearningRate 0.0717 Epoch: 12 Global Step: 51640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:19,966-Speed 11255.48 samples/sec Loss 4.8331 LearningRate 0.0717 Epoch: 12 Global Step: 51650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:23,439-Speed 11796.89 samples/sec Loss 4.8417 LearningRate 0.0716 Epoch: 12 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:26,795-Speed 12208.04 samples/sec Loss 4.8625 LearningRate 0.0716 Epoch: 12 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:30,203-Speed 12022.37 samples/sec Loss 4.8694 LearningRate 0.0715 Epoch: 12 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:34,146-Speed 10390.05 samples/sec Loss 4.8449 LearningRate 0.0715 Epoch: 12 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:37,948-Speed 10777.95 samples/sec Loss 4.7968 LearningRate 0.0715 Epoch: 12 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:41,799-Speed 10638.58 samples/sec Loss 4.8284 LearningRate 0.0714 Epoch: 12 Global Step: 51710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:45,335-Speed 11585.61 samples/sec Loss 4.8200 LearningRate 0.0714 Epoch: 12 Global Step: 51720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:48,953-Speed 11322.29 samples/sec Loss 4.7924 LearningRate 0.0713 Epoch: 12 Global Step: 51730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:52,430-Speed 11784.12 samples/sec Loss 4.8278 LearningRate 0.0713 Epoch: 12 Global Step: 51740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:09:56,251-Speed 10724.23 samples/sec Loss 4.8670 LearningRate 0.0712 Epoch: 12 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:00,065-Speed 10741.15 samples/sec Loss 4.8610 LearningRate 0.0712 Epoch: 12 Global Step: 51760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:03,741-Speed 11146.09 samples/sec Loss 4.8352 LearningRate 0.0711 Epoch: 12 Global Step: 51770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:07,139-Speed 12057.33 samples/sec Loss 4.8039 LearningRate 0.0711 Epoch: 12 Global Step: 51780 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:10:10,890-Speed 10922.31 samples/sec Loss 4.8524 LearningRate 0.0711 Epoch: 12 Global Step: 51790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:14,472-Speed 11438.83 samples/sec Loss 4.7969 LearningRate 0.0710 Epoch: 12 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:17,920-Speed 11882.54 samples/sec Loss 4.8264 LearningRate 0.0710 Epoch: 12 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:21,412-Speed 11734.28 samples/sec Loss 4.8783 LearningRate 0.0709 Epoch: 12 Global Step: 51820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:25,055-Speed 11246.04 samples/sec Loss 4.7911 LearningRate 0.0709 Epoch: 12 Global Step: 51830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:28,917-Speed 10608.31 samples/sec Loss 4.8370 LearningRate 0.0708 Epoch: 12 Global Step: 51840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:32,371-Speed 11859.06 samples/sec Loss 4.8182 LearningRate 0.0708 Epoch: 12 Global Step: 51850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:36,337-Speed 10332.73 samples/sec Loss 4.8245 LearningRate 0.0707 Epoch: 12 Global Step: 51860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:39,872-Speed 11587.83 samples/sec Loss 4.8096 LearningRate 0.0707 Epoch: 12 Global Step: 51870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:44,436-Speed 8976.76 samples/sec Loss 4.8067 LearningRate 0.0706 Epoch: 12 Global Step: 51880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:47,998-Speed 11504.10 samples/sec Loss 4.8368 LearningRate 0.0706 Epoch: 12 Global Step: 51890 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:10:51,644-Speed 11234.70 samples/sec Loss 4.8194 LearningRate 0.0706 Epoch: 12 Global Step: 51900 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:10:55,070-Speed 11959.99 samples/sec Loss 4.8723 LearningRate 0.0705 Epoch: 12 Global Step: 51910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:10:58,554-Speed 11761.55 samples/sec Loss 4.8601 LearningRate 0.0705 Epoch: 12 Global Step: 51920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:02,145-Speed 11409.41 samples/sec Loss 4.8338 LearningRate 0.0704 Epoch: 12 Global Step: 51930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:05,541-Speed 12063.14 samples/sec Loss 4.8499 LearningRate 0.0704 Epoch: 12 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:09,562-Speed 10188.58 samples/sec Loss 4.8531 LearningRate 0.0703 Epoch: 12 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:13,300-Speed 10959.95 samples/sec Loss 4.8050 LearningRate 0.0703 Epoch: 12 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:17,246-Speed 10383.87 samples/sec Loss 4.8500 LearningRate 0.0702 Epoch: 12 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:20,945-Speed 11074.60 samples/sec Loss 4.8693 LearningRate 0.0702 Epoch: 12 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:24,466-Speed 11636.01 samples/sec Loss 4.8101 LearningRate 0.0702 Epoch: 12 Global Step: 51990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:27,988-Speed 11631.73 samples/sec Loss 4.8201 LearningRate 0.0701 Epoch: 12 Global Step: 52000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:31,471-Speed 11765.87 samples/sec Loss 4.8366 LearningRate 0.0701 Epoch: 12 Global Step: 52010 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:11:34,914-Speed 11898.53 samples/sec Loss 4.8408 LearningRate 0.0700 Epoch: 12 Global Step: 52020 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:11:38,755-Speed 10668.52 samples/sec Loss 4.8328 LearningRate 0.0700 Epoch: 12 Global Step: 52030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:42,284-Speed 11609.53 samples/sec Loss 4.8111 LearningRate 0.0699 Epoch: 12 Global Step: 52040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:45,734-Speed 11873.84 samples/sec Loss 4.8444 LearningRate 0.0699 Epoch: 12 Global Step: 52050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:49,176-Speed 11902.31 samples/sec Loss 4.8334 LearningRate 0.0698 Epoch: 12 Global Step: 52060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:52,710-Speed 11595.60 samples/sec Loss 4.8335 LearningRate 0.0698 Epoch: 12 Global Step: 52070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:56,309-Speed 11383.88 samples/sec Loss 4.8384 LearningRate 0.0698 Epoch: 12 Global Step: 52080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:11:59,880-Speed 11474.31 samples/sec Loss 4.8762 LearningRate 0.0697 Epoch: 12 Global Step: 52090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:03,840-Speed 10342.69 samples/sec Loss 4.8787 LearningRate 0.0697 Epoch: 12 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:07,644-Speed 10773.47 samples/sec Loss 4.8530 LearningRate 0.0696 Epoch: 12 Global Step: 52110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:11,161-Speed 11649.11 samples/sec Loss 4.8583 LearningRate 0.0696 Epoch: 12 Global Step: 52120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:14,846-Speed 11117.00 samples/sec Loss 4.7822 LearningRate 0.0695 Epoch: 12 Global Step: 52130 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:12:18,366-Speed 11640.19 samples/sec Loss 4.8584 LearningRate 0.0695 Epoch: 12 Global Step: 52140 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:12:21,915-Speed 11544.12 samples/sec Loss 4.7943 LearningRate 0.0694 Epoch: 12 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:25,619-Speed 11057.78 samples/sec Loss 4.8225 LearningRate 0.0694 Epoch: 12 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:29,396-Speed 10850.93 samples/sec Loss 4.8782 LearningRate 0.0694 Epoch: 12 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:33,362-Speed 10329.40 samples/sec Loss 4.8259 LearningRate 0.0693 Epoch: 12 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:37,117-Speed 10914.01 samples/sec Loss 4.7998 LearningRate 0.0693 Epoch: 12 Global Step: 52190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:40,881-Speed 10882.73 samples/sec Loss 4.8523 LearningRate 0.0692 Epoch: 12 Global Step: 52200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:44,751-Speed 10585.24 samples/sec Loss 4.8378 LearningRate 0.0692 Epoch: 12 Global Step: 52210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:48,315-Speed 11498.66 samples/sec Loss 4.8234 LearningRate 0.0691 Epoch: 12 Global Step: 52220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:51,800-Speed 11756.16 samples/sec Loss 4.8478 LearningRate 0.0691 Epoch: 12 Global Step: 52230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:55,457-Speed 11201.97 samples/sec Loss 4.8284 LearningRate 0.0690 Epoch: 12 Global Step: 52240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:12:58,937-Speed 11775.40 samples/sec Loss 4.8283 LearningRate 0.0690 Epoch: 12 Global Step: 52250 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:02,798-Speed 10611.91 samples/sec Loss 4.8308 LearningRate 0.0690 Epoch: 12 Global Step: 52260 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:06,394-Speed 11394.78 samples/sec Loss 4.8106 LearningRate 0.0689 Epoch: 12 Global Step: 52270 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:10,359-Speed 10331.87 samples/sec Loss 4.8335 LearningRate 0.0689 Epoch: 12 Global Step: 52280 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:14,364-Speed 10228.39 samples/sec Loss 4.8380 LearningRate 0.0688 Epoch: 12 Global Step: 52290 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:17,803-Speed 11913.36 samples/sec Loss 4.8577 LearningRate 0.0688 Epoch: 12 Global Step: 52300 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:21,379-Speed 11456.41 samples/sec Loss 4.8428 LearningRate 0.0687 Epoch: 12 Global Step: 52310 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:25,099-Speed 11014.93 samples/sec Loss 4.8541 LearningRate 0.0687 Epoch: 12 Global Step: 52320 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:28,727-Speed 11293.33 samples/sec Loss 4.8213 LearningRate 0.0686 Epoch: 12 Global Step: 52330 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:13:32,259-Speed 11597.67 samples/sec Loss 4.8320 LearningRate 0.0686 Epoch: 12 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:13:35,921-Speed 11189.71 samples/sec Loss 4.8557 LearningRate 0.0686 Epoch: 12 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:13:40,148-Speed 9691.57 samples/sec Loss 4.8102 LearningRate 0.0685 Epoch: 12 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:13:43,665-Speed 11650.28 samples/sec Loss 4.8579 LearningRate 0.0685 Epoch: 12 Global Step: 52370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:13:47,140-Speed 11789.00 samples/sec Loss 4.8452 LearningRate 0.0684 Epoch: 12 Global Step: 52380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:13:50,978-Speed 10674.68 samples/sec Loss 4.8879 LearningRate 0.0684 Epoch: 12 Global Step: 52390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:13:54,475-Speed 11715.65 samples/sec Loss 4.8890 LearningRate 0.0683 Epoch: 12 Global Step: 52400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:13:58,044-Speed 11477.80 samples/sec Loss 4.8300 LearningRate 0.0683 Epoch: 12 Global Step: 52410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:01,540-Speed 11723.06 samples/sec Loss 4.8146 LearningRate 0.0683 Epoch: 12 Global Step: 52420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:05,459-Speed 10454.23 samples/sec Loss 4.8318 LearningRate 0.0682 Epoch: 12 Global Step: 52430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:09,338-Speed 10561.95 samples/sec Loss 4.8623 LearningRate 0.0682 Epoch: 12 Global Step: 52440 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:13,365-Speed 10174.67 samples/sec Loss 4.8220 LearningRate 0.0681 Epoch: 12 Global Step: 52450 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:16,939-Speed 11460.84 samples/sec Loss 4.8511 LearningRate 0.0681 Epoch: 12 Global Step: 52460 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:20,501-Speed 11502.31 samples/sec Loss 4.8445 LearningRate 0.0680 Epoch: 12 Global Step: 52470 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:24,102-Speed 11379.37 samples/sec Loss 4.8382 LearningRate 0.0680 Epoch: 12 Global Step: 52480 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:27,724-Speed 11311.43 samples/sec Loss 4.8489 LearningRate 0.0679 Epoch: 12 Global Step: 52490 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:31,173-Speed 11878.88 samples/sec Loss 4.8474 LearningRate 0.0679 Epoch: 12 Global Step: 52500 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:34,896-Speed 11005.01 samples/sec Loss 4.8366 LearningRate 0.0679 Epoch: 12 Global Step: 52510 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:14:38,521-Speed 11302.58 samples/sec Loss 4.8420 LearningRate 0.0678 Epoch: 12 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:42,005-Speed 11761.60 samples/sec Loss 4.7996 LearningRate 0.0678 Epoch: 12 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:45,580-Speed 11459.14 samples/sec Loss 4.8505 LearningRate 0.0677 Epoch: 12 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:49,097-Speed 11646.35 samples/sec Loss 4.8050 LearningRate 0.0677 Epoch: 12 Global Step: 52550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:52,635-Speed 11580.32 samples/sec Loss 4.8531 LearningRate 0.0676 Epoch: 12 Global Step: 52560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:56,086-Speed 11872.46 samples/sec Loss 4.8782 LearningRate 0.0676 Epoch: 12 Global Step: 52570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:14:59,910-Speed 10715.23 samples/sec Loss 4.8413 LearningRate 0.0675 Epoch: 12 Global Step: 52580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:03,627-Speed 11023.29 samples/sec Loss 4.8192 LearningRate 0.0675 Epoch: 12 Global Step: 52590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:07,010-Speed 12109.23 samples/sec Loss 4.8229 LearningRate 0.0675 Epoch: 12 Global Step: 52600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:10,664-Speed 11213.47 samples/sec Loss 4.8459 LearningRate 0.0674 Epoch: 12 Global Step: 52610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:14,306-Speed 11249.12 samples/sec Loss 4.8567 LearningRate 0.0674 Epoch: 12 Global Step: 52620 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:15:18,786-Speed 9143.61 samples/sec Loss 4.8666 LearningRate 0.0673 Epoch: 12 Global Step: 52630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:22,204-Speed 11986.88 samples/sec Loss 4.8832 LearningRate 0.0673 Epoch: 12 Global Step: 52640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:25,633-Speed 11950.77 samples/sec Loss 4.7894 LearningRate 0.0672 Epoch: 12 Global Step: 52650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:29,327-Speed 11091.12 samples/sec Loss 4.8502 LearningRate 0.0672 Epoch: 12 Global Step: 52660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:32,988-Speed 11188.21 samples/sec Loss 4.8294 LearningRate 0.0672 Epoch: 12 Global Step: 52670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:36,592-Speed 11369.54 samples/sec Loss 4.8795 LearningRate 0.0671 Epoch: 12 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:40,751-Speed 9851.47 samples/sec Loss 4.8190 LearningRate 0.0671 Epoch: 12 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:44,397-Speed 11235.48 samples/sec Loss 4.8943 LearningRate 0.0670 Epoch: 12 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:47,868-Speed 11804.16 samples/sec Loss 4.8635 LearningRate 0.0670 Epoch: 12 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:51,424-Speed 11521.74 samples/sec Loss 4.8842 LearningRate 0.0669 Epoch: 12 Global Step: 52720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:15:55,193-Speed 10870.03 samples/sec Loss 4.8511 LearningRate 0.0669 Epoch: 12 Global Step: 52730 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:15:58,965-Speed 10863.04 samples/sec Loss 4.8875 LearningRate 0.0669 Epoch: 12 Global Step: 52740 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:16:02,753-Speed 10816.36 samples/sec Loss 4.8440 LearningRate 0.0668 Epoch: 12 Global Step: 52750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:06,328-Speed 11457.56 samples/sec Loss 4.8448 LearningRate 0.0668 Epoch: 12 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:10,210-Speed 10554.68 samples/sec Loss 4.8765 LearningRate 0.0667 Epoch: 12 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:13,836-Speed 11298.51 samples/sec Loss 4.8184 LearningRate 0.0667 Epoch: 12 Global Step: 52780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:17,274-Speed 11918.26 samples/sec Loss 4.8097 LearningRate 0.0666 Epoch: 12 Global Step: 52790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:20,982-Speed 11049.24 samples/sec Loss 4.8602 LearningRate 0.0666 Epoch: 12 Global Step: 52800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:25,058-Speed 10050.11 samples/sec Loss 4.8368 LearningRate 0.0665 Epoch: 12 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:28,596-Speed 11581.52 samples/sec Loss 4.8433 LearningRate 0.0665 Epoch: 12 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:16:32,163-Speed 11486.39 samples/sec Loss 4.8299 LearningRate 0.0665 Epoch: 12 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:16:35,739-Speed 11455.96 samples/sec Loss 4.8591 LearningRate 0.0664 Epoch: 12 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:16:39,138-Speed 12056.61 samples/sec Loss 4.8428 LearningRate 0.0664 Epoch: 12 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:16:42,701-Speed 11499.94 samples/sec Loss 4.8962 LearningRate 0.0663 Epoch: 12 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:16:46,957-Speed 9627.27 samples/sec Loss 4.8485 LearningRate 0.0663 Epoch: 12 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:16:50,418-Speed 11836.07 samples/sec Loss 4.8500 LearningRate 0.0662 Epoch: 12 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:16:53,947-Speed 11608.32 samples/sec Loss 4.7914 LearningRate 0.0662 Epoch: 12 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:16:57,565-Speed 11323.17 samples/sec Loss 4.8001 LearningRate 0.0662 Epoch: 12 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:17:01,478-Speed 10471.30 samples/sec Loss 4.8539 LearningRate 0.0661 Epoch: 12 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:17:05,111-Speed 11276.44 samples/sec Loss 4.8683 LearningRate 0.0661 Epoch: 12 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:17:08,496-Speed 12106.13 samples/sec Loss 4.8580 LearningRate 0.0660 Epoch: 12 Global Step: 52930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:11,932-Speed 11926.29 samples/sec Loss 4.8522 LearningRate 0.0660 Epoch: 12 Global Step: 52940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:15,378-Speed 11890.10 samples/sec Loss 4.8205 LearningRate 0.0659 Epoch: 12 Global Step: 52950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:18,821-Speed 11900.82 samples/sec Loss 4.8190 LearningRate 0.0659 Epoch: 12 Global Step: 52960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:22,279-Speed 11846.01 samples/sec Loss 4.8353 LearningRate 0.0659 Epoch: 12 Global Step: 52970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:26,187-Speed 10481.97 samples/sec Loss 4.8558 LearningRate 0.0658 Epoch: 12 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:29,744-Speed 11518.76 samples/sec Loss 4.8700 LearningRate 0.0658 Epoch: 12 Global Step: 52990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:33,144-Speed 12051.46 samples/sec Loss 4.8460 LearningRate 0.0657 Epoch: 12 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:36,928-Speed 10828.65 samples/sec Loss 4.9255 LearningRate 0.0657 Epoch: 12 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:40,354-Speed 11958.56 samples/sec Loss 4.9073 LearningRate 0.0656 Epoch: 12 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:17:44,029-Speed 11146.23 samples/sec Loss 4.8902 LearningRate 0.0656 Epoch: 12 Global Step: 53030 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:17:48,036-Speed 10225.88 samples/sec Loss 4.8310 LearningRate 0.0656 Epoch: 12 Global Step: 53040 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:17:51,499-Speed 11830.01 samples/sec Loss 4.8883 LearningRate 0.0655 Epoch: 12 Global Step: 53050 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:17:54,945-Speed 11891.87 samples/sec Loss 4.8328 LearningRate 0.0655 Epoch: 12 Global Step: 53060 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:17:58,736-Speed 10804.74 samples/sec Loss 4.8150 LearningRate 0.0654 Epoch: 12 Global Step: 53070 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:18:02,320-Speed 11434.04 samples/sec Loss 4.8437 LearningRate 0.0654 Epoch: 12 Global Step: 53080 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:18:05,885-Speed 11489.39 samples/sec Loss 4.8831 LearningRate 0.0653 Epoch: 12 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:09,615-Speed 10986.07 samples/sec Loss 4.8545 LearningRate 0.0653 Epoch: 12 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:13,396-Speed 10836.19 samples/sec Loss 4.8709 LearningRate 0.0652 Epoch: 12 Global Step: 53110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:17,090-Speed 11090.69 samples/sec Loss 4.8362 LearningRate 0.0652 Epoch: 12 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:20,600-Speed 11671.70 samples/sec Loss 4.8847 LearningRate 0.0652 Epoch: 12 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:24,043-Speed 11898.06 samples/sec Loss 4.8841 LearningRate 0.0651 Epoch: 12 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:27,411-Speed 12164.49 samples/sec Loss 4.7970 LearningRate 0.0651 Epoch: 12 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:30,871-Speed 11843.35 samples/sec Loss 4.8302 LearningRate 0.0650 Epoch: 12 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:34,570-Speed 11075.26 samples/sec Loss 4.8610 LearningRate 0.0650 Epoch: 12 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:38,038-Speed 11814.28 samples/sec Loss 4.9173 LearningRate 0.0649 Epoch: 12 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:18:41,791-Speed 10915.56 samples/sec Loss 4.8534 LearningRate 0.0649 Epoch: 12 Global Step: 53190 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:18:45,518-Speed 10995.46 samples/sec Loss 4.8448 LearningRate 0.0649 Epoch: 12 Global Step: 53200 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:18:49,526-Speed 10224.46 samples/sec Loss 4.8387 LearningRate 0.0648 Epoch: 12 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:18:53,020-Speed 11725.35 samples/sec Loss 4.8294 LearningRate 0.0648 Epoch: 12 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:18:56,466-Speed 11886.84 samples/sec Loss 4.8661 LearningRate 0.0647 Epoch: 12 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:18:59,872-Speed 12028.46 samples/sec Loss 4.8810 LearningRate 0.0647 Epoch: 12 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:03,399-Speed 11616.85 samples/sec Loss 4.8510 LearningRate 0.0646 Epoch: 12 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:07,023-Speed 11306.14 samples/sec Loss 4.8524 LearningRate 0.0646 Epoch: 12 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:10,809-Speed 10823.43 samples/sec Loss 4.8427 LearningRate 0.0646 Epoch: 12 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:14,581-Speed 10858.94 samples/sec Loss 4.8480 LearningRate 0.0645 Epoch: 12 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:18,775-Speed 9771.67 samples/sec Loss 4.8349 LearningRate 0.0645 Epoch: 12 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:22,535-Speed 10893.64 samples/sec Loss 4.8021 LearningRate 0.0644 Epoch: 12 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:26,236-Speed 11070.50 samples/sec Loss 4.8147 LearningRate 0.0644 Epoch: 12 Global Step: 53310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:19:29,782-Speed 11555.81 samples/sec Loss 4.7990 LearningRate 0.0643 Epoch: 12 Global Step: 53320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:19:33,695-Speed 10470.81 samples/sec Loss 4.8306 LearningRate 0.0643 Epoch: 12 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:19:37,466-Speed 10863.43 samples/sec Loss 4.7846 LearningRate 0.0643 Epoch: 12 Global Step: 53340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:19:41,690-Speed 9698.95 samples/sec Loss 4.8180 LearningRate 0.0642 Epoch: 12 Global Step: 53350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:19:45,328-Speed 11261.06 samples/sec Loss 4.8614 LearningRate 0.0642 Epoch: 12 Global Step: 53360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:19:48,781-Speed 11867.71 samples/sec Loss 4.8058 LearningRate 0.0641 Epoch: 12 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:52,178-Speed 12058.27 samples/sec Loss 4.7848 LearningRate 0.0641 Epoch: 12 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:55,803-Speed 11301.49 samples/sec Loss 4.8788 LearningRate 0.0640 Epoch: 12 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:19:59,365-Speed 11502.54 samples/sec Loss 4.8251 LearningRate 0.0640 Epoch: 12 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:20:02,868-Speed 11695.82 samples/sec Loss 4.7798 LearningRate 0.0640 Epoch: 12 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:20:06,580-Speed 11038.26 samples/sec Loss 4.8874 LearningRate 0.0639 Epoch: 12 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:20:10,202-Speed 11312.17 samples/sec Loss 4.7517 LearningRate 0.0639 Epoch: 12 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:20:13,789-Speed 11420.23 samples/sec Loss 4.8325 LearningRate 0.0638 Epoch: 12 Global Step: 53440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:20:17,776-Speed 10276.77 samples/sec Loss 4.8437 LearningRate 0.0638 Epoch: 12 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:20:21,448-Speed 11154.57 samples/sec Loss 4.8484 LearningRate 0.0638 Epoch: 12 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:20:25,141-Speed 11095.85 samples/sec Loss 4.7935 LearningRate 0.0637 Epoch: 12 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:28,637-Speed 11719.46 samples/sec Loss 4.8040 LearningRate 0.0637 Epoch: 12 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:32,052-Speed 11997.89 samples/sec Loss 4.8552 LearningRate 0.0636 Epoch: 12 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:35,504-Speed 11867.45 samples/sec Loss 4.8462 LearningRate 0.0636 Epoch: 12 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:38,992-Speed 11747.21 samples/sec Loss 4.8162 LearningRate 0.0635 Epoch: 12 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:42,597-Speed 11365.96 samples/sec Loss 4.8347 LearningRate 0.0635 Epoch: 12 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:46,176-Speed 11446.47 samples/sec Loss 4.8105 LearningRate 0.0635 Epoch: 12 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:49,811-Speed 11273.25 samples/sec Loss 4.8735 LearningRate 0.0634 Epoch: 12 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:53,453-Speed 11249.59 samples/sec Loss 4.8273 LearningRate 0.0634 Epoch: 12 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:20:57,063-Speed 11345.39 samples/sec Loss 4.7719 LearningRate 0.0633 Epoch: 12 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:21:00,822-Speed 10899.48 samples/sec Loss 4.8207 LearningRate 0.0633 Epoch: 12 Global Step: 53570 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:21:05,650-Speed 8485.71 samples/sec Loss 4.8381 LearningRate 0.0632 Epoch: 12 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:09,107-Speed 11853.27 samples/sec Loss 4.8203 LearningRate 0.0632 Epoch: 12 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:12,751-Speed 11243.23 samples/sec Loss 4.8359 LearningRate 0.0632 Epoch: 12 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:16,388-Speed 11262.10 samples/sec Loss 4.8713 LearningRate 0.0631 Epoch: 12 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:19,876-Speed 11747.87 samples/sec Loss 4.8131 LearningRate 0.0631 Epoch: 12 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:23,666-Speed 10812.31 samples/sec Loss 4.8131 LearningRate 0.0630 Epoch: 12 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:27,466-Speed 10779.55 samples/sec Loss 4.8005 LearningRate 0.0630 Epoch: 12 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:30,974-Speed 11679.07 samples/sec Loss 4.8809 LearningRate 0.0629 Epoch: 12 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:34,445-Speed 11804.04 samples/sec Loss 4.8174 LearningRate 0.0629 Epoch: 12 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:38,316-Speed 10583.68 samples/sec Loss 4.8050 LearningRate 0.0629 Epoch: 12 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:21:41,852-Speed 11587.47 samples/sec Loss 4.8465 LearningRate 0.0628 Epoch: 12 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:21:45,633-Speed 10835.59 samples/sec Loss 4.8548 LearningRate 0.0628 Epoch: 12 Global Step: 53690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:21:49,152-Speed 11644.26 samples/sec Loss 4.8410 LearningRate 0.0627 Epoch: 12 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:21:52,549-Speed 12058.42 samples/sec Loss 4.8361 LearningRate 0.0627 Epoch: 12 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:21:56,133-Speed 11430.79 samples/sec Loss 4.8632 LearningRate 0.0627 Epoch: 12 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:21:59,914-Speed 10836.29 samples/sec Loss 4.7957 LearningRate 0.0626 Epoch: 12 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:03,536-Speed 11313.04 samples/sec Loss 4.7908 LearningRate 0.0626 Epoch: 12 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:07,264-Speed 10987.39 samples/sec Loss 4.7819 LearningRate 0.0625 Epoch: 12 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:11,830-Speed 8972.93 samples/sec Loss 4.7922 LearningRate 0.0625 Epoch: 12 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:15,416-Speed 11426.34 samples/sec Loss 4.8558 LearningRate 0.0624 Epoch: 12 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:18,984-Speed 11482.12 samples/sec Loss 4.8403 LearningRate 0.0624 Epoch: 12 Global Step: 53780 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:22:22,361-Speed 12134.13 samples/sec Loss 4.7905 LearningRate 0.0624 Epoch: 12 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:26,037-Speed 11143.37 samples/sec Loss 4.7929 LearningRate 0.0623 Epoch: 12 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:29,546-Speed 11676.67 samples/sec Loss 4.8344 LearningRate 0.0623 Epoch: 12 Global Step: 53810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:33,344-Speed 10787.11 samples/sec Loss 4.8473 LearningRate 0.0622 Epoch: 12 Global Step: 53820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:36,762-Speed 11985.32 samples/sec Loss 4.7844 LearningRate 0.0622 Epoch: 12 Global Step: 53830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:40,331-Speed 11482.29 samples/sec Loss 4.8326 LearningRate 0.0621 Epoch: 12 Global Step: 53840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:43,751-Speed 11978.05 samples/sec Loss 4.8065 LearningRate 0.0621 Epoch: 12 Global Step: 53850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:47,168-Speed 11989.84 samples/sec Loss 4.8396 LearningRate 0.0621 Epoch: 12 Global Step: 53860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:50,696-Speed 11613.14 samples/sec Loss 4.8302 LearningRate 0.0620 Epoch: 12 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:54,234-Speed 11577.99 samples/sec Loss 4.8750 LearningRate 0.0620 Epoch: 12 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:22:57,723-Speed 11744.15 samples/sec Loss 4.8637 LearningRate 0.0619 Epoch: 12 Global Step: 53890 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:23:01,385-Speed 11191.36 samples/sec Loss 4.7919 LearningRate 0.0619 Epoch: 12 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:04,978-Speed 11402.93 samples/sec Loss 4.8005 LearningRate 0.0619 Epoch: 12 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:08,410-Speed 11937.70 samples/sec Loss 4.8070 LearningRate 0.0618 Epoch: 12 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:12,432-Speed 10186.53 samples/sec Loss 4.7823 LearningRate 0.0618 Epoch: 12 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:16,239-Speed 10762.05 samples/sec Loss 4.7728 LearningRate 0.0617 Epoch: 12 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:20,190-Speed 10367.64 samples/sec Loss 4.8713 LearningRate 0.0617 Epoch: 12 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:23,801-Speed 11346.84 samples/sec Loss 4.8296 LearningRate 0.0616 Epoch: 12 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:27,385-Speed 11431.61 samples/sec Loss 4.8262 LearningRate 0.0616 Epoch: 12 Global Step: 53970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:30,806-Speed 11975.61 samples/sec Loss 4.8436 LearningRate 0.0616 Epoch: 12 Global Step: 53980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:34,343-Speed 11584.40 samples/sec Loss 4.8242 LearningRate 0.0615 Epoch: 12 Global Step: 53990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:37,942-Speed 11382.05 samples/sec Loss 4.8163 LearningRate 0.0615 Epoch: 12 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:23:41,658-Speed 11028.16 samples/sec Loss 4.8573 LearningRate 0.0614 Epoch: 12 Global Step: 54010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:23:45,157-Speed 11708.28 samples/sec Loss 4.8442 LearningRate 0.0614 Epoch: 12 Global Step: 54020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:23:48,800-Speed 11246.85 samples/sec Loss 4.7939 LearningRate 0.0614 Epoch: 12 Global Step: 54030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:23:52,734-Speed 10414.56 samples/sec Loss 4.8419 LearningRate 0.0613 Epoch: 12 Global Step: 54040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:23:56,201-Speed 11815.13 samples/sec Loss 4.8684 LearningRate 0.0613 Epoch: 12 Global Step: 54050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:23:59,799-Speed 11390.13 samples/sec Loss 4.8615 LearningRate 0.0612 Epoch: 12 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:24:03,397-Speed 11384.70 samples/sec Loss 4.8337 LearningRate 0.0612 Epoch: 12 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:24:07,004-Speed 11358.56 samples/sec Loss 4.7588 LearningRate 0.0611 Epoch: 12 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:24:10,402-Speed 12055.90 samples/sec Loss 4.8065 LearningRate 0.0611 Epoch: 12 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:24:14,155-Speed 10918.65 samples/sec Loss 4.8282 LearningRate 0.0611 Epoch: 12 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:24:18,627-Speed 9160.20 samples/sec Loss 4.8094 LearningRate 0.0610 Epoch: 12 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:22,166-Speed 11578.71 samples/sec Loss 4.8193 LearningRate 0.0610 Epoch: 12 Global Step: 54120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:25,806-Speed 11254.46 samples/sec Loss 4.8396 LearningRate 0.0609 Epoch: 12 Global Step: 54130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:29,346-Speed 11573.12 samples/sec Loss 4.8101 LearningRate 0.0609 Epoch: 12 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:32,978-Speed 11281.15 samples/sec Loss 4.8585 LearningRate 0.0609 Epoch: 12 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:36,562-Speed 11431.42 samples/sec Loss 4.8004 LearningRate 0.0608 Epoch: 12 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:40,053-Speed 11736.53 samples/sec Loss 4.8219 LearningRate 0.0608 Epoch: 12 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:43,478-Speed 11962.87 samples/sec Loss 4.7649 LearningRate 0.0607 Epoch: 12 Global Step: 54180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:47,452-Speed 10309.45 samples/sec Loss 4.7803 LearningRate 0.0607 Epoch: 12 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:51,030-Speed 11449.60 samples/sec Loss 4.8322 LearningRate 0.0606 Epoch: 12 Global Step: 54200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:24:54,470-Speed 11907.96 samples/sec Loss 4.8170 LearningRate 0.0606 Epoch: 12 Global Step: 54210 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:24:57,909-Speed 11916.55 samples/sec Loss 4.7454 LearningRate 0.0606 Epoch: 12 Global Step: 54220 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:25:02,158-Speed 9641.13 samples/sec Loss 4.8125 LearningRate 0.0605 Epoch: 12 Global Step: 54230 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:25:34,580-Speed 1263.39 samples/sec Loss 4.5369 LearningRate 0.0605 Epoch: 13 Global Step: 54240 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:25:38,537-Speed 10353.58 samples/sec Loss 4.1084 LearningRate 0.0604 Epoch: 13 Global Step: 54250 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:25:43,766-Speed 7836.26 samples/sec Loss 4.1160 LearningRate 0.0604 Epoch: 13 Global Step: 54260 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:25:47,684-Speed 10456.87 samples/sec Loss 4.1144 LearningRate 0.0604 Epoch: 13 Global Step: 54270 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:25:51,127-Speed 11899.58 samples/sec Loss 4.1038 LearningRate 0.0603 Epoch: 13 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:25:54,634-Speed 11680.39 samples/sec Loss 4.0726 LearningRate 0.0603 Epoch: 13 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:25:58,763-Speed 9921.69 samples/sec Loss 4.1258 LearningRate 0.0602 Epoch: 13 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:02,503-Speed 10956.75 samples/sec Loss 4.1585 LearningRate 0.0602 Epoch: 13 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:05,941-Speed 11916.49 samples/sec Loss 4.1166 LearningRate 0.0601 Epoch: 13 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:09,688-Speed 10934.73 samples/sec Loss 4.1255 LearningRate 0.0601 Epoch: 13 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:13,292-Speed 11367.14 samples/sec Loss 4.1453 LearningRate 0.0601 Epoch: 13 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:17,003-Speed 11038.05 samples/sec Loss 4.1195 LearningRate 0.0600 Epoch: 13 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:20,890-Speed 10544.24 samples/sec Loss 4.1032 LearningRate 0.0600 Epoch: 13 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:24,470-Speed 11441.12 samples/sec Loss 4.1519 LearningRate 0.0599 Epoch: 13 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:28,039-Speed 11480.67 samples/sec Loss 4.1520 LearningRate 0.0599 Epoch: 13 Global Step: 54380 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:26:31,834-Speed 10796.83 samples/sec Loss 4.1366 LearningRate 0.0599 Epoch: 13 Global Step: 54390 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:26:35,918-Speed 10030.31 samples/sec Loss 4.1381 LearningRate 0.0598 Epoch: 13 Global Step: 54400 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:26:39,355-Speed 11923.22 samples/sec Loss 4.1958 LearningRate 0.0598 Epoch: 13 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:43,028-Speed 11154.46 samples/sec Loss 4.1990 LearningRate 0.0597 Epoch: 13 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:46,692-Speed 11180.01 samples/sec Loss 4.1847 LearningRate 0.0597 Epoch: 13 Global Step: 54430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:50,503-Speed 10750.72 samples/sec Loss 4.1769 LearningRate 0.0597 Epoch: 13 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:53,887-Speed 12106.49 samples/sec Loss 4.2436 LearningRate 0.0596 Epoch: 13 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:26:57,224-Speed 12279.38 samples/sec Loss 4.1305 LearningRate 0.0596 Epoch: 13 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:00,623-Speed 12053.53 samples/sec Loss 4.2110 LearningRate 0.0595 Epoch: 13 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:04,054-Speed 11948.15 samples/sec Loss 4.1854 LearningRate 0.0595 Epoch: 13 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:07,547-Speed 11729.45 samples/sec Loss 4.2074 LearningRate 0.0594 Epoch: 13 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:10,999-Speed 11866.07 samples/sec Loss 4.2059 LearningRate 0.0594 Epoch: 13 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:14,984-Speed 10279.76 samples/sec Loss 4.2101 LearningRate 0.0594 Epoch: 13 Global Step: 54510 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:27:18,459-Speed 11793.40 samples/sec Loss 4.2258 LearningRate 0.0593 Epoch: 13 Global Step: 54520 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:27:21,879-Speed 11978.79 samples/sec Loss 4.2145 LearningRate 0.0593 Epoch: 13 Global Step: 54530 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:27:25,562-Speed 11123.76 samples/sec Loss 4.2456 LearningRate 0.0592 Epoch: 13 Global Step: 54540 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:27:29,277-Speed 11027.83 samples/sec Loss 4.2062 LearningRate 0.0592 Epoch: 13 Global Step: 54550 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:27:33,039-Speed 10891.99 samples/sec Loss 4.2329 LearningRate 0.0592 Epoch: 13 Global Step: 54560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:36,448-Speed 12017.02 samples/sec Loss 4.2640 LearningRate 0.0591 Epoch: 13 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:39,842-Speed 12076.06 samples/sec Loss 4.2468 LearningRate 0.0591 Epoch: 13 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:43,264-Speed 11971.96 samples/sec Loss 4.2261 LearningRate 0.0590 Epoch: 13 Global Step: 54590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:46,796-Speed 11599.19 samples/sec Loss 4.2529 LearningRate 0.0590 Epoch: 13 Global Step: 54600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:50,321-Speed 11622.06 samples/sec Loss 4.2520 LearningRate 0.0590 Epoch: 13 Global Step: 54610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:53,913-Speed 11406.01 samples/sec Loss 4.2453 LearningRate 0.0589 Epoch: 13 Global Step: 54620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:27:57,802-Speed 10534.74 samples/sec Loss 4.2153 LearningRate 0.0589 Epoch: 13 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:01,777-Speed 10307.57 samples/sec Loss 4.2555 LearningRate 0.0588 Epoch: 13 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:05,621-Speed 10656.85 samples/sec Loss 4.2574 LearningRate 0.0588 Epoch: 13 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:09,340-Speed 11018.20 samples/sec Loss 4.3081 LearningRate 0.0588 Epoch: 13 Global Step: 54660 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:28:12,897-Speed 11518.00 samples/sec Loss 4.3237 LearningRate 0.0587 Epoch: 13 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:16,556-Speed 11195.18 samples/sec Loss 4.2859 LearningRate 0.0587 Epoch: 13 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:20,380-Speed 10715.34 samples/sec Loss 4.2660 LearningRate 0.0586 Epoch: 13 Global Step: 54690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:23,927-Speed 11551.95 samples/sec Loss 4.2837 LearningRate 0.0586 Epoch: 13 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:27,366-Speed 11912.34 samples/sec Loss 4.2662 LearningRate 0.0585 Epoch: 13 Global Step: 54710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:30,992-Speed 11300.52 samples/sec Loss 4.2735 LearningRate 0.0585 Epoch: 13 Global Step: 54720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:34,921-Speed 10427.09 samples/sec Loss 4.3004 LearningRate 0.0585 Epoch: 13 Global Step: 54730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:38,575-Speed 11222.61 samples/sec Loss 4.3115 LearningRate 0.0584 Epoch: 13 Global Step: 54740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:42,174-Speed 11382.99 samples/sec Loss 4.3071 LearningRate 0.0584 Epoch: 13 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:45,822-Speed 11232.11 samples/sec Loss 4.3237 LearningRate 0.0583 Epoch: 13 Global Step: 54760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:28:49,541-Speed 11016.87 samples/sec Loss 4.3098 LearningRate 0.0583 Epoch: 13 Global Step: 54770 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:28:52,930-Speed 12087.13 samples/sec Loss 4.3325 LearningRate 0.0583 Epoch: 13 Global Step: 54780 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:28:56,992-Speed 10086.23 samples/sec Loss 4.3033 LearningRate 0.0582 Epoch: 13 Global Step: 54790 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:29:00,549-Speed 11520.66 samples/sec Loss 4.2955 LearningRate 0.0582 Epoch: 13 Global Step: 54800 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:29:04,149-Speed 11380.74 samples/sec Loss 4.3139 LearningRate 0.0581 Epoch: 13 Global Step: 54810 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:29:07,827-Speed 11138.24 samples/sec Loss 4.3470 LearningRate 0.0581 Epoch: 13 Global Step: 54820 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:29:11,279-Speed 11871.02 samples/sec Loss 4.3658 LearningRate 0.0581 Epoch: 13 Global Step: 54830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:14,979-Speed 11073.12 samples/sec Loss 4.3481 LearningRate 0.0580 Epoch: 13 Global Step: 54840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:18,624-Speed 11240.10 samples/sec Loss 4.3352 LearningRate 0.0580 Epoch: 13 Global Step: 54850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:22,141-Speed 11652.84 samples/sec Loss 4.3580 LearningRate 0.0579 Epoch: 13 Global Step: 54860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:26,246-Speed 9979.99 samples/sec Loss 4.3465 LearningRate 0.0579 Epoch: 13 Global Step: 54870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:29,745-Speed 11708.61 samples/sec Loss 4.3769 LearningRate 0.0579 Epoch: 13 Global Step: 54880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:33,153-Speed 12022.85 samples/sec Loss 4.3543 LearningRate 0.0578 Epoch: 13 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:36,997-Speed 10657.05 samples/sec Loss 4.3622 LearningRate 0.0578 Epoch: 13 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:40,647-Speed 11228.05 samples/sec Loss 4.3718 LearningRate 0.0577 Epoch: 13 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:44,409-Speed 10888.32 samples/sec Loss 4.3960 LearningRate 0.0577 Epoch: 13 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:29:48,589-Speed 9802.30 samples/sec Loss 4.3591 LearningRate 0.0577 Epoch: 13 Global Step: 54930 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:29:52,145-Speed 11519.81 samples/sec Loss 4.3481 LearningRate 0.0576 Epoch: 13 Global Step: 54940 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:29:55,899-Speed 10913.52 samples/sec Loss 4.3131 LearningRate 0.0576 Epoch: 13 Global Step: 54950 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:29:59,509-Speed 11347.45 samples/sec Loss 4.3750 LearningRate 0.0575 Epoch: 13 Global Step: 54960 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:30:02,914-Speed 12041.43 samples/sec Loss 4.3761 LearningRate 0.0575 Epoch: 13 Global Step: 54970 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:30:06,689-Speed 10851.38 samples/sec Loss 4.3774 LearningRate 0.0575 Epoch: 13 Global Step: 54980 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:30:10,399-Speed 11044.57 samples/sec Loss 4.3031 LearningRate 0.0574 Epoch: 13 Global Step: 54990 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:30:13,860-Speed 11836.59 samples/sec Loss 4.4286 LearningRate 0.0574 Epoch: 13 Global Step: 55000 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:30:35,138-[lfw][55000]XNorm: 8.521146 Training: 2022-01-17 03:30:35,139-[lfw][55000]Accuracy-Flip: 0.99650+-0.00263 Training: 2022-01-17 03:30:35,139-[lfw][55000]Accuracy-Highest: 0.99667 Training: 2022-01-17 03:30:59,624-[cfp_fp][55000]XNorm: 7.246406 Training: 2022-01-17 03:30:59,625-[cfp_fp][55000]Accuracy-Flip: 0.97157+-0.01043 Training: 2022-01-17 03:30:59,626-[cfp_fp][55000]Accuracy-Highest: 0.97157 Training: 2022-01-17 03:31:20,801-[agedb_30][55000]XNorm: 8.182919 Training: 2022-01-17 03:31:20,801-[agedb_30][55000]Accuracy-Flip: 0.96933+-0.00544 Training: 2022-01-17 03:31:20,802-[agedb_30][55000]Accuracy-Highest: 0.96933 Training: 2022-01-17 03:31:24,165-Speed 582.61 samples/sec Loss 4.4051 LearningRate 0.0573 Epoch: 13 Global Step: 55010 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:31:27,538-Speed 12145.61 samples/sec Loss 4.3782 LearningRate 0.0573 Epoch: 13 Global Step: 55020 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:31:30,863-Speed 12325.76 samples/sec Loss 4.3785 LearningRate 0.0572 Epoch: 13 Global Step: 55030 Fp16 Grad Scale: 524288 Required: 3 hours Training: 2022-01-17 03:31:34,207-Speed 12251.91 samples/sec Loss 4.3813 LearningRate 0.0572 Epoch: 13 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:31:37,593-Speed 12100.82 samples/sec Loss 4.4049 LearningRate 0.0572 Epoch: 13 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:31:40,926-Speed 12292.44 samples/sec Loss 4.4066 LearningRate 0.0571 Epoch: 13 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:31:44,246-Speed 12340.20 samples/sec Loss 4.3839 LearningRate 0.0571 Epoch: 13 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:31:47,605-Speed 12196.00 samples/sec Loss 4.4438 LearningRate 0.0570 Epoch: 13 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:31:50,976-Speed 12156.95 samples/sec Loss 4.3844 LearningRate 0.0570 Epoch: 13 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:31:54,331-Speed 12209.54 samples/sec Loss 4.4386 LearningRate 0.0570 Epoch: 13 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:31:57,920-Speed 11416.68 samples/sec Loss 4.4211 LearningRate 0.0569 Epoch: 13 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:32:01,418-Speed 11712.90 samples/sec Loss 4.3810 LearningRate 0.0569 Epoch: 13 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:32:05,068-Speed 11222.63 samples/sec Loss 4.4060 LearningRate 0.0568 Epoch: 13 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:32:08,641-Speed 11469.09 samples/sec Loss 4.4165 LearningRate 0.0568 Epoch: 13 Global Step: 55140 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:12,229-Speed 11417.13 samples/sec Loss 4.4162 LearningRate 0.0568 Epoch: 13 Global Step: 55150 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:16,399-Speed 9825.57 samples/sec Loss 4.4006 LearningRate 0.0567 Epoch: 13 Global Step: 55160 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:19,820-Speed 11974.30 samples/sec Loss 4.4220 LearningRate 0.0567 Epoch: 13 Global Step: 55170 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:23,402-Speed 11440.47 samples/sec Loss 4.3857 LearningRate 0.0566 Epoch: 13 Global Step: 55180 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:26,880-Speed 11776.95 samples/sec Loss 4.4102 LearningRate 0.0566 Epoch: 13 Global Step: 55190 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:30,272-Speed 12082.03 samples/sec Loss 4.3949 LearningRate 0.0566 Epoch: 13 Global Step: 55200 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:33,942-Speed 11162.59 samples/sec Loss 4.4084 LearningRate 0.0565 Epoch: 13 Global Step: 55210 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:37,414-Speed 11802.00 samples/sec Loss 4.4265 LearningRate 0.0565 Epoch: 13 Global Step: 55220 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:40,808-Speed 12070.79 samples/sec Loss 4.4403 LearningRate 0.0564 Epoch: 13 Global Step: 55230 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:44,152-Speed 12252.80 samples/sec Loss 4.4180 LearningRate 0.0564 Epoch: 13 Global Step: 55240 Fp16 Grad Scale: 524288 Required: 3 hours Training: 2022-01-17 03:32:47,817-Speed 11176.82 samples/sec Loss 4.4328 LearningRate 0.0564 Epoch: 13 Global Step: 55250 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:51,218-Speed 12048.46 samples/sec Loss 4.4362 LearningRate 0.0563 Epoch: 13 Global Step: 55260 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:54,994-Speed 10850.59 samples/sec Loss 4.4669 LearningRate 0.0563 Epoch: 13 Global Step: 55270 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:32:58,425-Speed 11941.56 samples/sec Loss 4.4499 LearningRate 0.0562 Epoch: 13 Global Step: 55280 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:33:01,854-Speed 11947.90 samples/sec Loss 4.4267 LearningRate 0.0562 Epoch: 13 Global Step: 55290 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:33:05,464-Speed 11349.53 samples/sec Loss 4.4156 LearningRate 0.0562 Epoch: 13 Global Step: 55300 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:33:08,923-Speed 11845.27 samples/sec Loss 4.4497 LearningRate 0.0561 Epoch: 13 Global Step: 55310 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:33:12,406-Speed 11763.44 samples/sec Loss 4.4636 LearningRate 0.0561 Epoch: 13 Global Step: 55320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:15,910-Speed 11690.27 samples/sec Loss 4.4201 LearningRate 0.0560 Epoch: 13 Global Step: 55330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:19,520-Speed 11349.12 samples/sec Loss 4.4427 LearningRate 0.0560 Epoch: 13 Global Step: 55340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:23,912-Speed 9328.85 samples/sec Loss 4.4047 LearningRate 0.0560 Epoch: 13 Global Step: 55350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:27,484-Speed 11468.51 samples/sec Loss 4.4457 LearningRate 0.0559 Epoch: 13 Global Step: 55360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:30,895-Speed 12012.10 samples/sec Loss 4.4188 LearningRate 0.0559 Epoch: 13 Global Step: 55370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:34,251-Speed 12208.75 samples/sec Loss 4.4402 LearningRate 0.0558 Epoch: 13 Global Step: 55380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:37,789-Speed 11581.39 samples/sec Loss 4.4470 LearningRate 0.0558 Epoch: 13 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:41,309-Speed 11639.03 samples/sec Loss 4.4575 LearningRate 0.0558 Epoch: 13 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:44,846-Speed 11582.65 samples/sec Loss 4.4519 LearningRate 0.0557 Epoch: 13 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:33:48,651-Speed 10767.46 samples/sec Loss 4.4111 LearningRate 0.0557 Epoch: 13 Global Step: 55420 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:33:52,548-Speed 10513.98 samples/sec Loss 4.4490 LearningRate 0.0556 Epoch: 13 Global Step: 55430 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:33:56,614-Speed 10076.67 samples/sec Loss 4.4057 LearningRate 0.0556 Epoch: 13 Global Step: 55440 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:34:00,496-Speed 10553.12 samples/sec Loss 4.4232 LearningRate 0.0556 Epoch: 13 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:03,941-Speed 11890.55 samples/sec Loss 4.4665 LearningRate 0.0555 Epoch: 13 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:07,401-Speed 11842.16 samples/sec Loss 4.4736 LearningRate 0.0555 Epoch: 13 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:11,203-Speed 10776.42 samples/sec Loss 4.4640 LearningRate 0.0554 Epoch: 13 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:14,993-Speed 10810.40 samples/sec Loss 4.4918 LearningRate 0.0554 Epoch: 13 Global Step: 55490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:18,586-Speed 11404.32 samples/sec Loss 4.4212 LearningRate 0.0554 Epoch: 13 Global Step: 55500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:22,141-Speed 11524.75 samples/sec Loss 4.4733 LearningRate 0.0553 Epoch: 13 Global Step: 55510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:25,570-Speed 11946.66 samples/sec Loss 4.4798 LearningRate 0.0553 Epoch: 13 Global Step: 55520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:29,642-Speed 10063.27 samples/sec Loss 4.4698 LearningRate 0.0553 Epoch: 13 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:33,042-Speed 12050.65 samples/sec Loss 4.4488 LearningRate 0.0552 Epoch: 13 Global Step: 55540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:34:37,036-Speed 10257.42 samples/sec Loss 4.4374 LearningRate 0.0552 Epoch: 13 Global Step: 55550 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:34:40,749-Speed 11035.44 samples/sec Loss 4.4289 LearningRate 0.0551 Epoch: 13 Global Step: 55560 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:34:44,425-Speed 11145.42 samples/sec Loss 4.4666 LearningRate 0.0551 Epoch: 13 Global Step: 55570 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:34:47,960-Speed 11590.81 samples/sec Loss 4.4397 LearningRate 0.0551 Epoch: 13 Global Step: 55580 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:34:51,584-Speed 11303.31 samples/sec Loss 4.4730 LearningRate 0.0550 Epoch: 13 Global Step: 55590 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:34:55,601-Speed 10198.04 samples/sec Loss 4.4920 LearningRate 0.0550 Epoch: 13 Global Step: 55600 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:34:59,309-Speed 11051.89 samples/sec Loss 4.5570 LearningRate 0.0549 Epoch: 13 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:03,020-Speed 11037.63 samples/sec Loss 4.4926 LearningRate 0.0549 Epoch: 13 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:06,631-Speed 11348.61 samples/sec Loss 4.4795 LearningRate 0.0549 Epoch: 13 Global Step: 55630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:10,063-Speed 11935.44 samples/sec Loss 4.4810 LearningRate 0.0548 Epoch: 13 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:14,023-Speed 10348.41 samples/sec Loss 4.5002 LearningRate 0.0548 Epoch: 13 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:17,411-Speed 12090.20 samples/sec Loss 4.5254 LearningRate 0.0547 Epoch: 13 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:20,842-Speed 11942.97 samples/sec Loss 4.5301 LearningRate 0.0547 Epoch: 13 Global Step: 55670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:24,456-Speed 11333.69 samples/sec Loss 4.4809 LearningRate 0.0547 Epoch: 13 Global Step: 55680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:27,813-Speed 12208.24 samples/sec Loss 4.4753 LearningRate 0.0546 Epoch: 13 Global Step: 55690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:31,237-Speed 11965.06 samples/sec Loss 4.4826 LearningRate 0.0546 Epoch: 13 Global Step: 55700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:35:34,606-Speed 12160.71 samples/sec Loss 4.4792 LearningRate 0.0545 Epoch: 13 Global Step: 55710 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:35:38,474-Speed 10593.29 samples/sec Loss 4.4623 LearningRate 0.0545 Epoch: 13 Global Step: 55720 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:35:41,979-Speed 11691.22 samples/sec Loss 4.4529 LearningRate 0.0545 Epoch: 13 Global Step: 55730 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:35:45,608-Speed 11287.56 samples/sec Loss 4.4980 LearningRate 0.0544 Epoch: 13 Global Step: 55740 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:35:49,031-Speed 11972.47 samples/sec Loss 4.5098 LearningRate 0.0544 Epoch: 13 Global Step: 55750 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:35:52,635-Speed 11365.58 samples/sec Loss 4.5180 LearningRate 0.0543 Epoch: 13 Global Step: 55760 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:35:56,128-Speed 11729.30 samples/sec Loss 4.4720 LearningRate 0.0543 Epoch: 13 Global Step: 55770 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:35:59,649-Speed 11639.10 samples/sec Loss 4.4946 LearningRate 0.0543 Epoch: 13 Global Step: 55780 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:36:03,285-Speed 11267.42 samples/sec Loss 4.4998 LearningRate 0.0542 Epoch: 13 Global Step: 55790 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:36:06,707-Speed 11972.11 samples/sec Loss 4.4993 LearningRate 0.0542 Epoch: 13 Global Step: 55800 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:36:10,610-Speed 10499.05 samples/sec Loss 4.4074 LearningRate 0.0541 Epoch: 13 Global Step: 55810 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:36:15,084-Speed 9158.86 samples/sec Loss 4.5291 LearningRate 0.0541 Epoch: 13 Global Step: 55820 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:36:18,489-Speed 12032.86 samples/sec Loss 4.4848 LearningRate 0.0541 Epoch: 13 Global Step: 55830 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:36:22,216-Speed 10990.95 samples/sec Loss 4.5244 LearningRate 0.0540 Epoch: 13 Global Step: 55840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:25,760-Speed 11558.05 samples/sec Loss 4.4314 LearningRate 0.0540 Epoch: 13 Global Step: 55850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:29,436-Speed 11147.69 samples/sec Loss 4.5037 LearningRate 0.0540 Epoch: 13 Global Step: 55860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:33,360-Speed 10440.85 samples/sec Loss 4.4723 LearningRate 0.0539 Epoch: 13 Global Step: 55870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:36,925-Speed 11493.64 samples/sec Loss 4.4697 LearningRate 0.0539 Epoch: 13 Global Step: 55880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:40,887-Speed 10340.76 samples/sec Loss 4.4519 LearningRate 0.0538 Epoch: 13 Global Step: 55890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:44,713-Speed 10707.73 samples/sec Loss 4.4913 LearningRate 0.0538 Epoch: 13 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:48,485-Speed 10864.16 samples/sec Loss 4.5000 LearningRate 0.0538 Epoch: 13 Global Step: 55910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:52,359-Speed 10573.74 samples/sec Loss 4.5212 LearningRate 0.0537 Epoch: 13 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:55,798-Speed 11912.78 samples/sec Loss 4.5182 LearningRate 0.0537 Epoch: 13 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:36:59,169-Speed 12154.86 samples/sec Loss 4.5182 LearningRate 0.0536 Epoch: 13 Global Step: 55940 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:37:02,558-Speed 12089.66 samples/sec Loss 4.4603 LearningRate 0.0536 Epoch: 13 Global Step: 55950 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:37:05,948-Speed 12087.75 samples/sec Loss 4.5216 LearningRate 0.0536 Epoch: 13 Global Step: 55960 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:37:09,444-Speed 11717.63 samples/sec Loss 4.5282 LearningRate 0.0535 Epoch: 13 Global Step: 55970 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:37:13,657-Speed 9724.73 samples/sec Loss 4.5186 LearningRate 0.0535 Epoch: 13 Global Step: 55980 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:37:17,059-Speed 12043.93 samples/sec Loss 4.4860 LearningRate 0.0534 Epoch: 13 Global Step: 55990 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:37:20,483-Speed 11965.42 samples/sec Loss 4.5424 LearningRate 0.0534 Epoch: 13 Global Step: 56000 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:37:24,015-Speed 11598.17 samples/sec Loss 4.4870 LearningRate 0.0534 Epoch: 13 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:27,478-Speed 11830.82 samples/sec Loss 4.5570 LearningRate 0.0533 Epoch: 13 Global Step: 56020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:30,835-Speed 12205.36 samples/sec Loss 4.5459 LearningRate 0.0533 Epoch: 13 Global Step: 56030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:34,523-Speed 11108.96 samples/sec Loss 4.4972 LearningRate 0.0533 Epoch: 13 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:38,038-Speed 11656.89 samples/sec Loss 4.5320 LearningRate 0.0532 Epoch: 13 Global Step: 56050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:41,779-Speed 10950.95 samples/sec Loss 4.4738 LearningRate 0.0532 Epoch: 13 Global Step: 56060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:45,365-Speed 11426.67 samples/sec Loss 4.4836 LearningRate 0.0531 Epoch: 13 Global Step: 56070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:48,819-Speed 11861.46 samples/sec Loss 4.5304 LearningRate 0.0531 Epoch: 13 Global Step: 56080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:52,463-Speed 11240.97 samples/sec Loss 4.4865 LearningRate 0.0531 Epoch: 13 Global Step: 56090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:37:56,159-Speed 11083.92 samples/sec Loss 4.5131 LearningRate 0.0530 Epoch: 13 Global Step: 56100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:00,102-Speed 10391.78 samples/sec Loss 4.4918 LearningRate 0.0530 Epoch: 13 Global Step: 56110 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:38:04,219-Speed 9951.28 samples/sec Loss 4.4978 LearningRate 0.0529 Epoch: 13 Global Step: 56120 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:38:07,890-Speed 11161.50 samples/sec Loss 4.5125 LearningRate 0.0529 Epoch: 13 Global Step: 56130 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:38:11,321-Speed 11942.67 samples/sec Loss 4.4624 LearningRate 0.0529 Epoch: 13 Global Step: 56140 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:38:15,039-Speed 11018.94 samples/sec Loss 4.5242 LearningRate 0.0528 Epoch: 13 Global Step: 56150 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:38:18,587-Speed 11547.28 samples/sec Loss 4.5099 LearningRate 0.0528 Epoch: 13 Global Step: 56160 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:38:22,199-Speed 11342.65 samples/sec Loss 4.5107 LearningRate 0.0527 Epoch: 13 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:25,669-Speed 11807.50 samples/sec Loss 4.4829 LearningRate 0.0527 Epoch: 13 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:29,205-Speed 11586.95 samples/sec Loss 4.5087 LearningRate 0.0527 Epoch: 13 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:32,642-Speed 11920.99 samples/sec Loss 4.5047 LearningRate 0.0526 Epoch: 13 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:35,990-Speed 12237.97 samples/sec Loss 4.5337 LearningRate 0.0526 Epoch: 13 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:39,385-Speed 12067.74 samples/sec Loss 4.4979 LearningRate 0.0526 Epoch: 13 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:42,844-Speed 11845.61 samples/sec Loss 4.5096 LearningRate 0.0525 Epoch: 13 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:46,371-Speed 11616.30 samples/sec Loss 4.5170 LearningRate 0.0525 Epoch: 13 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:49,901-Speed 11605.31 samples/sec Loss 4.4787 LearningRate 0.0524 Epoch: 13 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:53,317-Speed 11992.79 samples/sec Loss 4.5372 LearningRate 0.0524 Epoch: 13 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:38:56,850-Speed 11596.03 samples/sec Loss 4.5533 LearningRate 0.0524 Epoch: 13 Global Step: 56270 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:39:01,012-Speed 9842.96 samples/sec Loss 4.5497 LearningRate 0.0523 Epoch: 13 Global Step: 56280 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:39:04,841-Speed 10701.66 samples/sec Loss 4.5160 LearningRate 0.0523 Epoch: 13 Global Step: 56290 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:39:08,291-Speed 11875.03 samples/sec Loss 4.4939 LearningRate 0.0522 Epoch: 13 Global Step: 56300 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:39:12,237-Speed 10383.05 samples/sec Loss 4.4685 LearningRate 0.0522 Epoch: 13 Global Step: 56310 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:39:15,842-Speed 11365.73 samples/sec Loss 4.5009 LearningRate 0.0522 Epoch: 13 Global Step: 56320 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:39:19,698-Speed 10626.53 samples/sec Loss 4.5365 LearningRate 0.0521 Epoch: 13 Global Step: 56330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:23,491-Speed 10799.85 samples/sec Loss 4.4821 LearningRate 0.0521 Epoch: 13 Global Step: 56340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:26,873-Speed 12112.08 samples/sec Loss 4.5593 LearningRate 0.0521 Epoch: 13 Global Step: 56350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:30,571-Speed 11081.91 samples/sec Loss 4.5049 LearningRate 0.0520 Epoch: 13 Global Step: 56360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:34,233-Speed 11185.02 samples/sec Loss 4.5536 LearningRate 0.0520 Epoch: 13 Global Step: 56370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:37,647-Speed 12001.34 samples/sec Loss 4.5056 LearningRate 0.0519 Epoch: 13 Global Step: 56380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:41,201-Speed 11530.76 samples/sec Loss 4.5048 LearningRate 0.0519 Epoch: 13 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:44,624-Speed 11971.05 samples/sec Loss 4.5367 LearningRate 0.0519 Epoch: 13 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:47,970-Speed 12243.36 samples/sec Loss 4.4755 LearningRate 0.0518 Epoch: 13 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:51,690-Speed 11014.12 samples/sec Loss 4.4979 LearningRate 0.0518 Epoch: 13 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:39:55,607-Speed 10457.84 samples/sec Loss 4.5031 LearningRate 0.0517 Epoch: 13 Global Step: 56430 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:39:59,235-Speed 11293.34 samples/sec Loss 4.5228 LearningRate 0.0517 Epoch: 13 Global Step: 56440 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:40:02,718-Speed 11764.81 samples/sec Loss 4.5524 LearningRate 0.0517 Epoch: 13 Global Step: 56450 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:40:06,239-Speed 11639.48 samples/sec Loss 4.5210 LearningRate 0.0516 Epoch: 13 Global Step: 56460 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:40:09,937-Speed 11078.11 samples/sec Loss 4.5349 LearningRate 0.0516 Epoch: 13 Global Step: 56470 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:40:13,548-Speed 11344.90 samples/sec Loss 4.4834 LearningRate 0.0516 Epoch: 13 Global Step: 56480 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:40:17,185-Speed 11266.41 samples/sec Loss 4.5523 LearningRate 0.0515 Epoch: 13 Global Step: 56490 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:40:20,581-Speed 12063.34 samples/sec Loss 4.5218 LearningRate 0.0515 Epoch: 13 Global Step: 56500 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:40:24,023-Speed 11902.64 samples/sec Loss 4.5177 LearningRate 0.0514 Epoch: 13 Global Step: 56510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:27,535-Speed 11666.80 samples/sec Loss 4.5100 LearningRate 0.0514 Epoch: 13 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:30,985-Speed 11875.65 samples/sec Loss 4.5202 LearningRate 0.0514 Epoch: 13 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:34,389-Speed 12037.01 samples/sec Loss 4.4841 LearningRate 0.0513 Epoch: 13 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:38,104-Speed 11028.97 samples/sec Loss 4.5161 LearningRate 0.0513 Epoch: 13 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:41,607-Speed 11695.61 samples/sec Loss 4.5816 LearningRate 0.0512 Epoch: 13 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:44,928-Speed 12336.73 samples/sec Loss 4.5352 LearningRate 0.0512 Epoch: 13 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:48,811-Speed 10549.91 samples/sec Loss 4.5015 LearningRate 0.0512 Epoch: 13 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:52,383-Speed 11471.42 samples/sec Loss 4.4897 LearningRate 0.0511 Epoch: 13 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:55,872-Speed 11739.59 samples/sec Loss 4.5151 LearningRate 0.0511 Epoch: 13 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:40:59,420-Speed 11548.20 samples/sec Loss 4.4902 LearningRate 0.0511 Epoch: 13 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:03,113-Speed 11095.41 samples/sec Loss 4.5019 LearningRate 0.0510 Epoch: 13 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:07,386-Speed 9587.83 samples/sec Loss 4.5066 LearningRate 0.0510 Epoch: 13 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:10,854-Speed 11814.84 samples/sec Loss 4.5191 LearningRate 0.0509 Epoch: 13 Global Step: 56640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:14,543-Speed 11107.32 samples/sec Loss 4.5269 LearningRate 0.0509 Epoch: 13 Global Step: 56650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:18,781-Speed 9664.69 samples/sec Loss 4.5311 LearningRate 0.0509 Epoch: 13 Global Step: 56660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:22,615-Speed 10687.33 samples/sec Loss 4.5363 LearningRate 0.0508 Epoch: 13 Global Step: 56670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:26,020-Speed 12030.89 samples/sec Loss 4.4701 LearningRate 0.0508 Epoch: 13 Global Step: 56680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:29,733-Speed 11036.37 samples/sec Loss 4.5353 LearningRate 0.0508 Epoch: 13 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:33,789-Speed 10099.31 samples/sec Loss 4.5702 LearningRate 0.0507 Epoch: 13 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:37,206-Speed 11991.97 samples/sec Loss 4.4739 LearningRate 0.0507 Epoch: 13 Global Step: 56710 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:41:40,537-Speed 12298.53 samples/sec Loss 4.5255 LearningRate 0.0506 Epoch: 13 Global Step: 56720 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:41:43,901-Speed 12182.41 samples/sec Loss 4.5445 LearningRate 0.0506 Epoch: 13 Global Step: 56730 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:41:47,257-Speed 12207.35 samples/sec Loss 4.5680 LearningRate 0.0506 Epoch: 13 Global Step: 56740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:51,259-Speed 10238.12 samples/sec Loss 4.5360 LearningRate 0.0505 Epoch: 13 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:54,837-Speed 11450.24 samples/sec Loss 4.5264 LearningRate 0.0505 Epoch: 13 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:41:58,388-Speed 11535.83 samples/sec Loss 4.5044 LearningRate 0.0505 Epoch: 13 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:02,014-Speed 11299.29 samples/sec Loss 4.5643 LearningRate 0.0504 Epoch: 13 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:05,707-Speed 11095.14 samples/sec Loss 4.5263 LearningRate 0.0504 Epoch: 13 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:09,908-Speed 9753.47 samples/sec Loss 4.5182 LearningRate 0.0503 Epoch: 13 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:13,427-Speed 11641.69 samples/sec Loss 4.5312 LearningRate 0.0503 Epoch: 13 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:17,440-Speed 10209.10 samples/sec Loss 4.5461 LearningRate 0.0503 Epoch: 13 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:21,199-Speed 10899.89 samples/sec Loss 4.4936 LearningRate 0.0502 Epoch: 13 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:24,597-Speed 12058.05 samples/sec Loss 4.4667 LearningRate 0.0502 Epoch: 13 Global Step: 56840 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:42:28,127-Speed 11603.33 samples/sec Loss 4.5156 LearningRate 0.0501 Epoch: 13 Global Step: 56850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:31,573-Speed 11889.80 samples/sec Loss 4.5040 LearningRate 0.0501 Epoch: 13 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:35,312-Speed 10958.79 samples/sec Loss 4.5027 LearningRate 0.0501 Epoch: 13 Global Step: 56870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:38,761-Speed 11877.51 samples/sec Loss 4.5329 LearningRate 0.0500 Epoch: 13 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:42,263-Speed 11703.22 samples/sec Loss 4.5257 LearningRate 0.0500 Epoch: 13 Global Step: 56890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:46,279-Speed 10201.87 samples/sec Loss 4.5397 LearningRate 0.0500 Epoch: 13 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:49,707-Speed 11951.90 samples/sec Loss 4.5218 LearningRate 0.0499 Epoch: 13 Global Step: 56910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:53,162-Speed 11857.14 samples/sec Loss 4.5436 LearningRate 0.0499 Epoch: 13 Global Step: 56920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:42:56,648-Speed 11754.93 samples/sec Loss 4.5174 LearningRate 0.0498 Epoch: 13 Global Step: 56930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:00,029-Speed 12116.83 samples/sec Loss 4.5905 LearningRate 0.0498 Epoch: 13 Global Step: 56940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:03,663-Speed 11276.65 samples/sec Loss 4.5623 LearningRate 0.0498 Epoch: 13 Global Step: 56950 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:43:07,589-Speed 10434.95 samples/sec Loss 4.5345 LearningRate 0.0497 Epoch: 13 Global Step: 56960 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:43:11,667-Speed 10047.74 samples/sec Loss 4.5678 LearningRate 0.0497 Epoch: 13 Global Step: 56970 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:43:15,089-Speed 11972.86 samples/sec Loss 4.5360 LearningRate 0.0497 Epoch: 13 Global Step: 56980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:18,531-Speed 11904.93 samples/sec Loss 4.5040 LearningRate 0.0496 Epoch: 13 Global Step: 56990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:21,945-Speed 11998.31 samples/sec Loss 4.5293 LearningRate 0.0496 Epoch: 13 Global Step: 57000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:25,432-Speed 11752.91 samples/sec Loss 4.5466 LearningRate 0.0495 Epoch: 13 Global Step: 57010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:28,942-Speed 11671.36 samples/sec Loss 4.5726 LearningRate 0.0495 Epoch: 13 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:32,451-Speed 11676.63 samples/sec Loss 4.5478 LearningRate 0.0495 Epoch: 13 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:36,437-Speed 10276.15 samples/sec Loss 4.5447 LearningRate 0.0494 Epoch: 13 Global Step: 57040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:39,880-Speed 11902.56 samples/sec Loss 4.5540 LearningRate 0.0494 Epoch: 13 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:43,301-Speed 11975.33 samples/sec Loss 4.5553 LearningRate 0.0494 Epoch: 13 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:46,674-Speed 12148.17 samples/sec Loss 4.4983 LearningRate 0.0493 Epoch: 13 Global Step: 57070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:50,075-Speed 12047.24 samples/sec Loss 4.5330 LearningRate 0.0493 Epoch: 13 Global Step: 57080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:53,478-Speed 12042.28 samples/sec Loss 4.4997 LearningRate 0.0492 Epoch: 13 Global Step: 57090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:43:56,892-Speed 11998.99 samples/sec Loss 4.5074 LearningRate 0.0492 Epoch: 13 Global Step: 57100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:00,281-Speed 12087.31 samples/sec Loss 4.5010 LearningRate 0.0492 Epoch: 13 Global Step: 57110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:03,718-Speed 11920.86 samples/sec Loss 4.5490 LearningRate 0.0491 Epoch: 13 Global Step: 57120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:07,283-Speed 11495.16 samples/sec Loss 4.5051 LearningRate 0.0491 Epoch: 13 Global Step: 57130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:11,327-Speed 10130.91 samples/sec Loss 4.5230 LearningRate 0.0491 Epoch: 13 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:15,098-Speed 10864.98 samples/sec Loss 4.5277 LearningRate 0.0490 Epoch: 13 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:19,176-Speed 10044.63 samples/sec Loss 4.4604 LearningRate 0.0490 Epoch: 13 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:22,906-Speed 10984.49 samples/sec Loss 4.4859 LearningRate 0.0489 Epoch: 13 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:26,833-Speed 10435.91 samples/sec Loss 4.5385 LearningRate 0.0489 Epoch: 13 Global Step: 57180 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:44:30,299-Speed 11819.40 samples/sec Loss 4.5092 LearningRate 0.0489 Epoch: 13 Global Step: 57190 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:44:33,704-Speed 12032.68 samples/sec Loss 4.5335 LearningRate 0.0488 Epoch: 13 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:37,241-Speed 11582.83 samples/sec Loss 4.5451 LearningRate 0.0488 Epoch: 13 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:40,821-Speed 11444.54 samples/sec Loss 4.4841 LearningRate 0.0488 Epoch: 13 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:44,299-Speed 11776.40 samples/sec Loss 4.5222 LearningRate 0.0487 Epoch: 13 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:47,735-Speed 11927.84 samples/sec Loss 4.5343 LearningRate 0.0487 Epoch: 13 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:51,440-Speed 11057.93 samples/sec Loss 4.5011 LearningRate 0.0487 Epoch: 13 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:55,060-Speed 11317.81 samples/sec Loss 4.5334 LearningRate 0.0486 Epoch: 13 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:44:58,663-Speed 11369.67 samples/sec Loss 4.5430 LearningRate 0.0486 Epoch: 13 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:02,709-Speed 10126.23 samples/sec Loss 4.5310 LearningRate 0.0485 Epoch: 13 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:06,562-Speed 10632.13 samples/sec Loss 4.5400 LearningRate 0.0485 Epoch: 13 Global Step: 57290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:09,962-Speed 12056.63 samples/sec Loss 4.5212 LearningRate 0.0485 Epoch: 13 Global Step: 57300 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:45:14,021-Speed 10091.29 samples/sec Loss 4.5503 LearningRate 0.0484 Epoch: 13 Global Step: 57310 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:45:17,454-Speed 11938.23 samples/sec Loss 4.4874 LearningRate 0.0484 Epoch: 13 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:20,969-Speed 11654.10 samples/sec Loss 4.5339 LearningRate 0.0484 Epoch: 13 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:24,814-Speed 10655.87 samples/sec Loss 4.5185 LearningRate 0.0483 Epoch: 13 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:28,305-Speed 11736.41 samples/sec Loss 4.5233 LearningRate 0.0483 Epoch: 13 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:31,852-Speed 11549.78 samples/sec Loss 4.5151 LearningRate 0.0482 Epoch: 13 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:35,268-Speed 11993.27 samples/sec Loss 4.5514 LearningRate 0.0482 Epoch: 13 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:38,632-Speed 12179.25 samples/sec Loss 4.5516 LearningRate 0.0482 Epoch: 13 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:42,094-Speed 11834.02 samples/sec Loss 4.5170 LearningRate 0.0481 Epoch: 13 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:45,687-Speed 11402.73 samples/sec Loss 4.4970 LearningRate 0.0481 Epoch: 13 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:49,058-Speed 12156.90 samples/sec Loss 4.5594 LearningRate 0.0481 Epoch: 13 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:45:52,624-Speed 11488.18 samples/sec Loss 4.5008 LearningRate 0.0480 Epoch: 13 Global Step: 57420 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:45:56,029-Speed 12032.63 samples/sec Loss 4.5681 LearningRate 0.0480 Epoch: 13 Global Step: 57430 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:45:59,448-Speed 11983.77 samples/sec Loss 4.5497 LearningRate 0.0479 Epoch: 13 Global Step: 57440 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:46:03,132-Speed 11118.70 samples/sec Loss 4.4913 LearningRate 0.0479 Epoch: 13 Global Step: 57450 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:46:07,269-Speed 9903.25 samples/sec Loss 4.5175 LearningRate 0.0479 Epoch: 13 Global Step: 57460 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:46:10,792-Speed 11632.25 samples/sec Loss 4.5148 LearningRate 0.0478 Epoch: 13 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:14,376-Speed 11430.42 samples/sec Loss 4.4764 LearningRate 0.0478 Epoch: 13 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:17,746-Speed 12158.61 samples/sec Loss 4.5165 LearningRate 0.0478 Epoch: 13 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:21,797-Speed 10113.12 samples/sec Loss 4.5236 LearningRate 0.0477 Epoch: 13 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:25,895-Speed 9997.70 samples/sec Loss 4.4853 LearningRate 0.0477 Epoch: 13 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:29,348-Speed 11865.06 samples/sec Loss 4.5008 LearningRate 0.0477 Epoch: 13 Global Step: 57520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:33,364-Speed 10202.59 samples/sec Loss 4.5228 LearningRate 0.0476 Epoch: 13 Global Step: 57530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:36,775-Speed 12009.14 samples/sec Loss 4.5349 LearningRate 0.0476 Epoch: 13 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:40,658-Speed 10550.64 samples/sec Loss 4.4735 LearningRate 0.0475 Epoch: 13 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:44,151-Speed 11730.27 samples/sec Loss 4.5462 LearningRate 0.0475 Epoch: 13 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:46:47,540-Speed 12092.06 samples/sec Loss 4.4935 LearningRate 0.0475 Epoch: 13 Global Step: 57570 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:46:51,097-Speed 11516.35 samples/sec Loss 4.5390 LearningRate 0.0474 Epoch: 13 Global Step: 57580 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:46:54,955-Speed 10620.45 samples/sec Loss 4.5123 LearningRate 0.0474 Epoch: 13 Global Step: 57590 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:46:58,511-Speed 11519.36 samples/sec Loss 4.4803 LearningRate 0.0474 Epoch: 13 Global Step: 57600 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:47:02,480-Speed 10321.48 samples/sec Loss 4.5628 LearningRate 0.0473 Epoch: 13 Global Step: 57610 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:47:05,943-Speed 11831.35 samples/sec Loss 4.5245 LearningRate 0.0473 Epoch: 13 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:09,316-Speed 12149.42 samples/sec Loss 4.4830 LearningRate 0.0473 Epoch: 13 Global Step: 57630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:12,740-Speed 11964.45 samples/sec Loss 4.5417 LearningRate 0.0472 Epoch: 13 Global Step: 57640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:16,443-Speed 11063.29 samples/sec Loss 4.4964 LearningRate 0.0472 Epoch: 13 Global Step: 57650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:19,899-Speed 11855.17 samples/sec Loss 4.5104 LearningRate 0.0471 Epoch: 13 Global Step: 57660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:24,268-Speed 9377.48 samples/sec Loss 4.5432 LearningRate 0.0471 Epoch: 13 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:28,281-Speed 10210.42 samples/sec Loss 4.5353 LearningRate 0.0471 Epoch: 13 Global Step: 57680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:31,700-Speed 11983.19 samples/sec Loss 4.5644 LearningRate 0.0470 Epoch: 13 Global Step: 57690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:35,089-Speed 12087.66 samples/sec Loss 4.5490 LearningRate 0.0470 Epoch: 13 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:38,467-Speed 12128.44 samples/sec Loss 4.5169 LearningRate 0.0470 Epoch: 13 Global Step: 57710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:47:41,862-Speed 12067.28 samples/sec Loss 4.5243 LearningRate 0.0469 Epoch: 13 Global Step: 57720 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:47:45,250-Speed 12093.59 samples/sec Loss 4.5731 LearningRate 0.0469 Epoch: 13 Global Step: 57730 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:47:48,759-Speed 11678.64 samples/sec Loss 4.4853 LearningRate 0.0468 Epoch: 13 Global Step: 57740 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:47:52,788-Speed 10169.66 samples/sec Loss 4.5295 LearningRate 0.0468 Epoch: 13 Global Step: 57750 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:47:56,708-Speed 10449.67 samples/sec Loss 4.4920 LearningRate 0.0468 Epoch: 13 Global Step: 57760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:00,507-Speed 10785.58 samples/sec Loss 4.5530 LearningRate 0.0467 Epoch: 13 Global Step: 57770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:04,812-Speed 9515.74 samples/sec Loss 4.4883 LearningRate 0.0467 Epoch: 13 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:08,305-Speed 11727.45 samples/sec Loss 4.5032 LearningRate 0.0467 Epoch: 13 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:11,677-Speed 12153.79 samples/sec Loss 4.4607 LearningRate 0.0466 Epoch: 13 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:15,464-Speed 10817.54 samples/sec Loss 4.5376 LearningRate 0.0466 Epoch: 13 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:19,501-Speed 10146.43 samples/sec Loss 4.5123 LearningRate 0.0466 Epoch: 13 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:22,926-Speed 11964.71 samples/sec Loss 4.4646 LearningRate 0.0465 Epoch: 13 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:26,295-Speed 12158.58 samples/sec Loss 4.4937 LearningRate 0.0465 Epoch: 13 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:29,909-Speed 11339.09 samples/sec Loss 4.4942 LearningRate 0.0464 Epoch: 13 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:33,627-Speed 11019.49 samples/sec Loss 4.5000 LearningRate 0.0464 Epoch: 13 Global Step: 57860 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:48:36,994-Speed 12168.96 samples/sec Loss 4.5316 LearningRate 0.0464 Epoch: 13 Global Step: 57870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:40,393-Speed 12051.16 samples/sec Loss 4.5063 LearningRate 0.0463 Epoch: 13 Global Step: 57880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:44,363-Speed 10319.61 samples/sec Loss 4.4766 LearningRate 0.0463 Epoch: 13 Global Step: 57890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:47,944-Speed 11442.35 samples/sec Loss 4.5120 LearningRate 0.0463 Epoch: 13 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:51,662-Speed 11018.96 samples/sec Loss 4.5237 LearningRate 0.0462 Epoch: 13 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:55,197-Speed 11591.23 samples/sec Loss 4.5207 LearningRate 0.0462 Epoch: 13 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:48:58,849-Speed 11218.57 samples/sec Loss 4.5324 LearningRate 0.0462 Epoch: 13 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:49:02,493-Speed 11244.65 samples/sec Loss 4.5404 LearningRate 0.0461 Epoch: 13 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:06,469-Speed 10303.56 samples/sec Loss 4.5199 LearningRate 0.0461 Epoch: 13 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:10,027-Speed 11513.46 samples/sec Loss 4.5304 LearningRate 0.0460 Epoch: 13 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:13,459-Speed 11939.55 samples/sec Loss 4.5199 LearningRate 0.0460 Epoch: 13 Global Step: 57970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:16,951-Speed 11730.48 samples/sec Loss 4.4832 LearningRate 0.0460 Epoch: 13 Global Step: 57980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:20,310-Speed 12198.42 samples/sec Loss 4.5253 LearningRate 0.0459 Epoch: 13 Global Step: 57990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:24,451-Speed 9892.30 samples/sec Loss 4.5591 LearningRate 0.0459 Epoch: 13 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:28,315-Speed 10604.98 samples/sec Loss 4.4951 LearningRate 0.0459 Epoch: 13 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:31,941-Speed 11296.51 samples/sec Loss 4.4969 LearningRate 0.0458 Epoch: 13 Global Step: 58020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:35,416-Speed 11796.96 samples/sec Loss 4.5286 LearningRate 0.0458 Epoch: 13 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:49:38,960-Speed 11559.63 samples/sec Loss 4.5369 LearningRate 0.0458 Epoch: 13 Global Step: 58040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:49:42,536-Speed 11457.30 samples/sec Loss 4.4815 LearningRate 0.0457 Epoch: 13 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:49:46,269-Speed 10972.71 samples/sec Loss 4.4798 LearningRate 0.0457 Epoch: 13 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:49:49,816-Speed 11552.74 samples/sec Loss 4.5456 LearningRate 0.0457 Epoch: 13 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:49:53,227-Speed 12010.87 samples/sec Loss 4.4831 LearningRate 0.0456 Epoch: 13 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:49:56,831-Speed 11369.04 samples/sec Loss 4.5083 LearningRate 0.0456 Epoch: 13 Global Step: 58090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:00,513-Speed 11126.81 samples/sec Loss 4.5851 LearningRate 0.0455 Epoch: 13 Global Step: 58100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:04,167-Speed 11212.60 samples/sec Loss 4.5327 LearningRate 0.0455 Epoch: 13 Global Step: 58110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:07,673-Speed 11687.33 samples/sec Loss 4.4776 LearningRate 0.0455 Epoch: 13 Global Step: 58120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:11,464-Speed 10805.30 samples/sec Loss 4.4995 LearningRate 0.0454 Epoch: 13 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:14,960-Speed 11722.20 samples/sec Loss 4.4812 LearningRate 0.0454 Epoch: 13 Global Step: 58140 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:50:18,647-Speed 11112.00 samples/sec Loss 4.5097 LearningRate 0.0454 Epoch: 13 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:22,335-Speed 11107.90 samples/sec Loss 4.5207 LearningRate 0.0453 Epoch: 13 Global Step: 58160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:25,907-Speed 11470.90 samples/sec Loss 4.5129 LearningRate 0.0453 Epoch: 13 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:30,011-Speed 9981.48 samples/sec Loss 4.5153 LearningRate 0.0453 Epoch: 13 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:33,745-Speed 10973.44 samples/sec Loss 4.5173 LearningRate 0.0452 Epoch: 13 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:38,637-Speed 8375.62 samples/sec Loss 4.4936 LearningRate 0.0452 Epoch: 13 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:42,135-Speed 11711.83 samples/sec Loss 4.5094 LearningRate 0.0452 Epoch: 13 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:45,566-Speed 11937.62 samples/sec Loss 4.5035 LearningRate 0.0451 Epoch: 13 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:49,378-Speed 10748.48 samples/sec Loss 4.4800 LearningRate 0.0451 Epoch: 13 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:52,870-Speed 11735.39 samples/sec Loss 4.4739 LearningRate 0.0450 Epoch: 13 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:50:56,437-Speed 11486.05 samples/sec Loss 4.4984 LearningRate 0.0450 Epoch: 13 Global Step: 58250 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:50:59,975-Speed 11578.43 samples/sec Loss 4.5248 LearningRate 0.0450 Epoch: 13 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:03,500-Speed 11623.71 samples/sec Loss 4.5176 LearningRate 0.0449 Epoch: 13 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:07,050-Speed 11540.42 samples/sec Loss 4.5032 LearningRate 0.0449 Epoch: 13 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:10,618-Speed 11484.02 samples/sec Loss 4.4729 LearningRate 0.0449 Epoch: 13 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:14,494-Speed 10569.62 samples/sec Loss 4.5538 LearningRate 0.0448 Epoch: 13 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:17,902-Speed 12022.84 samples/sec Loss 4.4745 LearningRate 0.0448 Epoch: 13 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:21,690-Speed 10815.77 samples/sec Loss 4.5019 LearningRate 0.0448 Epoch: 13 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:25,356-Speed 11173.36 samples/sec Loss 4.4852 LearningRate 0.0447 Epoch: 13 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:28,821-Speed 11828.88 samples/sec Loss 4.4600 LearningRate 0.0447 Epoch: 13 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:32,658-Speed 10678.96 samples/sec Loss 4.5305 LearningRate 0.0447 Epoch: 13 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:36,100-Speed 11902.74 samples/sec Loss 4.5021 LearningRate 0.0446 Epoch: 13 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:39,890-Speed 10810.13 samples/sec Loss 4.4952 LearningRate 0.0446 Epoch: 13 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:43,347-Speed 11851.17 samples/sec Loss 4.4968 LearningRate 0.0445 Epoch: 13 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:46,731-Speed 12107.48 samples/sec Loss 4.4943 LearningRate 0.0445 Epoch: 13 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:51:50,513-Speed 10831.49 samples/sec Loss 4.4589 LearningRate 0.0445 Epoch: 13 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:29,146-Speed 1060.26 samples/sec Loss 4.3625 LearningRate 0.0444 Epoch: 14 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:33,134-Speed 10275.94 samples/sec Loss 3.8091 LearningRate 0.0444 Epoch: 14 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:36,922-Speed 10815.81 samples/sec Loss 3.7827 LearningRate 0.0444 Epoch: 14 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:41,276-Speed 9409.11 samples/sec Loss 3.8629 LearningRate 0.0443 Epoch: 14 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:45,166-Speed 10531.71 samples/sec Loss 3.8290 LearningRate 0.0443 Epoch: 14 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:48,501-Speed 12286.20 samples/sec Loss 3.8357 LearningRate 0.0443 Epoch: 14 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:52,783-Speed 9568.85 samples/sec Loss 3.8526 LearningRate 0.0442 Epoch: 14 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:52:56,446-Speed 11185.32 samples/sec Loss 3.8638 LearningRate 0.0442 Epoch: 14 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:00,192-Speed 10934.39 samples/sec Loss 3.8925 LearningRate 0.0442 Epoch: 14 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:03,927-Speed 10971.65 samples/sec Loss 3.8281 LearningRate 0.0441 Epoch: 14 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:07,450-Speed 11627.82 samples/sec Loss 3.8090 LearningRate 0.0441 Epoch: 14 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:11,165-Speed 11029.88 samples/sec Loss 3.8593 LearningRate 0.0440 Epoch: 14 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:14,649-Speed 11760.39 samples/sec Loss 3.8217 LearningRate 0.0440 Epoch: 14 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:18,365-Speed 11023.45 samples/sec Loss 3.8449 LearningRate 0.0440 Epoch: 14 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:21,771-Speed 12029.18 samples/sec Loss 3.8884 LearningRate 0.0439 Epoch: 14 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:25,993-Speed 9703.95 samples/sec Loss 3.8830 LearningRate 0.0439 Epoch: 14 Global Step: 58560 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:53:30,093-Speed 9994.49 samples/sec Loss 3.9317 LearningRate 0.0439 Epoch: 14 Global Step: 58570 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:53:33,887-Speed 10797.46 samples/sec Loss 3.8523 LearningRate 0.0438 Epoch: 14 Global Step: 58580 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:53:37,420-Speed 11597.35 samples/sec Loss 3.8863 LearningRate 0.0438 Epoch: 14 Global Step: 58590 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:53:41,271-Speed 10639.19 samples/sec Loss 3.8858 LearningRate 0.0438 Epoch: 14 Global Step: 58600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:44,770-Speed 11711.87 samples/sec Loss 3.9203 LearningRate 0.0437 Epoch: 14 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:48,207-Speed 11919.88 samples/sec Loss 3.8682 LearningRate 0.0437 Epoch: 14 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:51,644-Speed 11919.23 samples/sec Loss 3.9173 LearningRate 0.0437 Epoch: 14 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:55,265-Speed 11316.93 samples/sec Loss 3.8733 LearningRate 0.0436 Epoch: 14 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:53:58,942-Speed 11143.82 samples/sec Loss 3.8477 LearningRate 0.0436 Epoch: 14 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:02,412-Speed 11806.17 samples/sec Loss 3.8688 LearningRate 0.0436 Epoch: 14 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:05,850-Speed 11918.40 samples/sec Loss 3.9007 LearningRate 0.0435 Epoch: 14 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:09,502-Speed 11218.10 samples/sec Loss 3.9189 LearningRate 0.0435 Epoch: 14 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:12,915-Speed 12006.69 samples/sec Loss 3.9340 LearningRate 0.0434 Epoch: 14 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:16,492-Speed 11454.90 samples/sec Loss 3.9713 LearningRate 0.0434 Epoch: 14 Global Step: 58700 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:20,366-Speed 10574.58 samples/sec Loss 3.9634 LearningRate 0.0434 Epoch: 14 Global Step: 58710 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:23,989-Speed 11306.00 samples/sec Loss 3.8760 LearningRate 0.0433 Epoch: 14 Global Step: 58720 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:28,417-Speed 9252.98 samples/sec Loss 3.9201 LearningRate 0.0433 Epoch: 14 Global Step: 58730 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:32,173-Speed 10907.05 samples/sec Loss 3.9372 LearningRate 0.0433 Epoch: 14 Global Step: 58740 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:35,881-Speed 11049.13 samples/sec Loss 3.9504 LearningRate 0.0432 Epoch: 14 Global Step: 58750 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:39,810-Speed 10429.64 samples/sec Loss 3.9095 LearningRate 0.0432 Epoch: 14 Global Step: 58760 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:43,479-Speed 11163.35 samples/sec Loss 3.9462 LearningRate 0.0432 Epoch: 14 Global Step: 58770 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:54:47,062-Speed 11438.81 samples/sec Loss 3.9781 LearningRate 0.0431 Epoch: 14 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:50,579-Speed 11647.70 samples/sec Loss 3.8789 LearningRate 0.0431 Epoch: 14 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:54,107-Speed 11612.91 samples/sec Loss 3.9265 LearningRate 0.0431 Epoch: 14 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:54:57,847-Speed 10952.65 samples/sec Loss 4.0179 LearningRate 0.0430 Epoch: 14 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:01,685-Speed 10677.52 samples/sec Loss 3.9605 LearningRate 0.0430 Epoch: 14 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:05,333-Speed 11229.68 samples/sec Loss 3.9969 LearningRate 0.0430 Epoch: 14 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:08,918-Speed 11429.55 samples/sec Loss 3.9734 LearningRate 0.0429 Epoch: 14 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:12,566-Speed 11232.44 samples/sec Loss 3.9847 LearningRate 0.0429 Epoch: 14 Global Step: 58850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:16,006-Speed 11909.14 samples/sec Loss 3.9964 LearningRate 0.0429 Epoch: 14 Global Step: 58860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:19,754-Speed 10931.96 samples/sec Loss 3.9708 LearningRate 0.0428 Epoch: 14 Global Step: 58870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:24,307-Speed 8997.87 samples/sec Loss 3.9698 LearningRate 0.0428 Epoch: 14 Global Step: 58880 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:55:28,198-Speed 10529.32 samples/sec Loss 3.9901 LearningRate 0.0427 Epoch: 14 Global Step: 58890 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:55:31,908-Speed 11044.08 samples/sec Loss 4.0234 LearningRate 0.0427 Epoch: 14 Global Step: 58900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:35,356-Speed 11882.02 samples/sec Loss 4.0072 LearningRate 0.0427 Epoch: 14 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:39,349-Speed 10258.60 samples/sec Loss 4.0451 LearningRate 0.0426 Epoch: 14 Global Step: 58920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:43,060-Speed 11042.30 samples/sec Loss 4.0107 LearningRate 0.0426 Epoch: 14 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:46,654-Speed 11398.86 samples/sec Loss 4.0609 LearningRate 0.0426 Epoch: 14 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:50,337-Speed 11124.68 samples/sec Loss 3.9961 LearningRate 0.0425 Epoch: 14 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:53,927-Speed 11409.31 samples/sec Loss 3.9856 LearningRate 0.0425 Epoch: 14 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:55:57,579-Speed 11219.11 samples/sec Loss 4.0329 LearningRate 0.0425 Epoch: 14 Global Step: 58970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:56:01,269-Speed 11104.98 samples/sec Loss 3.9689 LearningRate 0.0424 Epoch: 14 Global Step: 58980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:56:05,208-Speed 10403.17 samples/sec Loss 4.0058 LearningRate 0.0424 Epoch: 14 Global Step: 58990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:56:08,788-Speed 11444.73 samples/sec Loss 4.0093 LearningRate 0.0424 Epoch: 14 Global Step: 59000 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:12,343-Speed 11522.97 samples/sec Loss 4.0285 LearningRate 0.0423 Epoch: 14 Global Step: 59010 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:15,989-Speed 11236.37 samples/sec Loss 4.0580 LearningRate 0.0423 Epoch: 14 Global Step: 59020 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:19,729-Speed 10955.33 samples/sec Loss 4.0552 LearningRate 0.0423 Epoch: 14 Global Step: 59030 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:23,409-Speed 11132.69 samples/sec Loss 4.0206 LearningRate 0.0422 Epoch: 14 Global Step: 59040 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:27,222-Speed 10745.63 samples/sec Loss 4.0679 LearningRate 0.0422 Epoch: 14 Global Step: 59050 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:30,658-Speed 11923.33 samples/sec Loss 4.0097 LearningRate 0.0422 Epoch: 14 Global Step: 59060 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:34,258-Speed 11378.92 samples/sec Loss 4.0354 LearningRate 0.0421 Epoch: 14 Global Step: 59070 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:38,082-Speed 10714.34 samples/sec Loss 4.0191 LearningRate 0.0421 Epoch: 14 Global Step: 59080 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:42,490-Speed 9295.44 samples/sec Loss 4.0408 LearningRate 0.0421 Epoch: 14 Global Step: 59090 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:46,053-Speed 11499.91 samples/sec Loss 4.0157 LearningRate 0.0420 Epoch: 14 Global Step: 59100 Fp16 Grad Scale: 524288 Required: 3 hours Training: 2022-01-17 03:56:49,746-Speed 11095.50 samples/sec Loss 4.0169 LearningRate 0.0420 Epoch: 14 Global Step: 59110 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:56:53,199-Speed 11863.73 samples/sec Loss 4.0514 LearningRate 0.0420 Epoch: 14 Global Step: 59120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:56:57,031-Speed 10690.18 samples/sec Loss 4.0257 LearningRate 0.0419 Epoch: 14 Global Step: 59130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:00,641-Speed 11350.27 samples/sec Loss 4.0323 LearningRate 0.0419 Epoch: 14 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:04,360-Speed 11018.09 samples/sec Loss 4.0477 LearningRate 0.0418 Epoch: 14 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:08,233-Speed 10579.99 samples/sec Loss 4.0463 LearningRate 0.0418 Epoch: 14 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:11,705-Speed 11798.54 samples/sec Loss 4.0681 LearningRate 0.0418 Epoch: 14 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:15,199-Speed 11724.23 samples/sec Loss 4.0544 LearningRate 0.0417 Epoch: 14 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:18,781-Speed 11439.96 samples/sec Loss 4.0501 LearningRate 0.0417 Epoch: 14 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:22,399-Speed 11325.17 samples/sec Loss 4.0543 LearningRate 0.0417 Epoch: 14 Global Step: 59200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:26,204-Speed 10767.02 samples/sec Loss 4.0805 LearningRate 0.0416 Epoch: 14 Global Step: 59210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:57:29,730-Speed 11619.31 samples/sec Loss 4.0652 LearningRate 0.0416 Epoch: 14 Global Step: 59220 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:33,351-Speed 11312.51 samples/sec Loss 4.0822 LearningRate 0.0416 Epoch: 14 Global Step: 59230 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:37,211-Speed 10616.01 samples/sec Loss 4.0266 LearningRate 0.0415 Epoch: 14 Global Step: 59240 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:41,937-Speed 8669.65 samples/sec Loss 4.0560 LearningRate 0.0415 Epoch: 14 Global Step: 59250 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:45,386-Speed 11878.28 samples/sec Loss 4.0256 LearningRate 0.0415 Epoch: 14 Global Step: 59260 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:49,027-Speed 11249.95 samples/sec Loss 4.0955 LearningRate 0.0414 Epoch: 14 Global Step: 59270 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:52,619-Speed 11405.34 samples/sec Loss 4.1094 LearningRate 0.0414 Epoch: 14 Global Step: 59280 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:56,077-Speed 11850.00 samples/sec Loss 4.0855 LearningRate 0.0414 Epoch: 14 Global Step: 59290 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:57:59,817-Speed 10955.29 samples/sec Loss 4.0652 LearningRate 0.0413 Epoch: 14 Global Step: 59300 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 03:58:03,531-Speed 11030.19 samples/sec Loss 4.0563 LearningRate 0.0413 Epoch: 14 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:58:07,096-Speed 11492.81 samples/sec Loss 4.0511 LearningRate 0.0413 Epoch: 14 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:58:10,665-Speed 11479.86 samples/sec Loss 4.1028 LearningRate 0.0412 Epoch: 14 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:58:14,385-Speed 11014.18 samples/sec Loss 4.0789 LearningRate 0.0412 Epoch: 14 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:58:18,092-Speed 11053.01 samples/sec Loss 4.1106 LearningRate 0.0412 Epoch: 14 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:58:21,851-Speed 10896.65 samples/sec Loss 4.1045 LearningRate 0.0411 Epoch: 14 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:58:25,325-Speed 11797.46 samples/sec Loss 4.1162 LearningRate 0.0411 Epoch: 14 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:29,133-Speed 10756.32 samples/sec Loss 4.1115 LearningRate 0.0411 Epoch: 14 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:32,598-Speed 11823.95 samples/sec Loss 4.0491 LearningRate 0.0410 Epoch: 14 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:36,229-Speed 11286.09 samples/sec Loss 4.1022 LearningRate 0.0410 Epoch: 14 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:40,934-Speed 8707.10 samples/sec Loss 4.0878 LearningRate 0.0410 Epoch: 14 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:44,683-Speed 10927.59 samples/sec Loss 4.0699 LearningRate 0.0409 Epoch: 14 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:48,244-Speed 11505.53 samples/sec Loss 4.0838 LearningRate 0.0409 Epoch: 14 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:51,827-Speed 11437.67 samples/sec Loss 4.1379 LearningRate 0.0409 Epoch: 14 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:55,572-Speed 10939.10 samples/sec Loss 4.1043 LearningRate 0.0408 Epoch: 14 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:58:59,088-Speed 11654.51 samples/sec Loss 4.0897 LearningRate 0.0408 Epoch: 14 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:03,045-Speed 10353.84 samples/sec Loss 4.0812 LearningRate 0.0408 Epoch: 14 Global Step: 59470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:59:06,895-Speed 10643.43 samples/sec Loss 4.0775 LearningRate 0.0407 Epoch: 14 Global Step: 59480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:59:10,883-Speed 10271.88 samples/sec Loss 4.1075 LearningRate 0.0407 Epoch: 14 Global Step: 59490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:59:14,474-Speed 11411.34 samples/sec Loss 4.1247 LearningRate 0.0407 Epoch: 14 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:59:18,887-Speed 9283.01 samples/sec Loss 4.1025 LearningRate 0.0406 Epoch: 14 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:59:22,311-Speed 11965.56 samples/sec Loss 4.0894 LearningRate 0.0406 Epoch: 14 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 03:59:25,673-Speed 12187.61 samples/sec Loss 4.1464 LearningRate 0.0405 Epoch: 14 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:29,051-Speed 12131.30 samples/sec Loss 4.1101 LearningRate 0.0405 Epoch: 14 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:32,674-Speed 11307.58 samples/sec Loss 4.1681 LearningRate 0.0405 Epoch: 14 Global Step: 59550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:36,222-Speed 11548.10 samples/sec Loss 4.1086 LearningRate 0.0404 Epoch: 14 Global Step: 59560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:40,578-Speed 9405.22 samples/sec Loss 4.0883 LearningRate 0.0404 Epoch: 14 Global Step: 59570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:44,165-Speed 11419.70 samples/sec Loss 4.1467 LearningRate 0.0404 Epoch: 14 Global Step: 59580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:47,632-Speed 11817.53 samples/sec Loss 4.1002 LearningRate 0.0403 Epoch: 14 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:51,591-Speed 10348.05 samples/sec Loss 4.1097 LearningRate 0.0403 Epoch: 14 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:55,109-Speed 11645.45 samples/sec Loss 4.1182 LearningRate 0.0403 Epoch: 14 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 03:59:58,541-Speed 11939.49 samples/sec Loss 4.1237 LearningRate 0.0402 Epoch: 14 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-17 04:00:02,465-Speed 10440.55 samples/sec Loss 4.1423 LearningRate 0.0402 Epoch: 14 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:05,916-Speed 11869.41 samples/sec Loss 4.1118 LearningRate 0.0402 Epoch: 14 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:09,372-Speed 11856.12 samples/sec Loss 4.1018 LearningRate 0.0401 Epoch: 14 Global Step: 59650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:13,021-Speed 11229.66 samples/sec Loss 4.1458 LearningRate 0.0401 Epoch: 14 Global Step: 59660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:16,443-Speed 11972.23 samples/sec Loss 4.1757 LearningRate 0.0401 Epoch: 14 Global Step: 59670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:19,891-Speed 11881.91 samples/sec Loss 4.1288 LearningRate 0.0400 Epoch: 14 Global Step: 59680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:23,452-Speed 11506.62 samples/sec Loss 4.1407 LearningRate 0.0400 Epoch: 14 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:27,281-Speed 10700.32 samples/sec Loss 4.1156 LearningRate 0.0400 Epoch: 14 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:31,208-Speed 10433.85 samples/sec Loss 4.1240 LearningRate 0.0399 Epoch: 14 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:34,675-Speed 11814.43 samples/sec Loss 4.1404 LearningRate 0.0399 Epoch: 14 Global Step: 59720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:00:38,166-Speed 11734.98 samples/sec Loss 4.1535 LearningRate 0.0399 Epoch: 14 Global Step: 59730 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:00:42,721-Speed 8994.50 samples/sec Loss 4.1528 LearningRate 0.0398 Epoch: 14 Global Step: 59740 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:00:46,370-Speed 11227.66 samples/sec Loss 4.1826 LearningRate 0.0398 Epoch: 14 Global Step: 59750 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:00:49,921-Speed 11540.57 samples/sec Loss 4.1735 LearningRate 0.0398 Epoch: 14 Global Step: 59760 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:00:53,964-Speed 10132.01 samples/sec Loss 4.1495 LearningRate 0.0397 Epoch: 14 Global Step: 59770 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:00:57,755-Speed 10807.54 samples/sec Loss 4.1454 LearningRate 0.0397 Epoch: 14 Global Step: 59780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:01,519-Speed 10884.38 samples/sec Loss 4.1491 LearningRate 0.0397 Epoch: 14 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:05,019-Speed 11706.67 samples/sec Loss 4.1104 LearningRate 0.0396 Epoch: 14 Global Step: 59800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:08,880-Speed 10611.12 samples/sec Loss 4.1539 LearningRate 0.0396 Epoch: 14 Global Step: 59810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:12,539-Speed 11196.64 samples/sec Loss 4.1949 LearningRate 0.0396 Epoch: 14 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:16,096-Speed 11516.90 samples/sec Loss 4.1213 LearningRate 0.0395 Epoch: 14 Global Step: 59830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:19,596-Speed 11706.07 samples/sec Loss 4.1610 LearningRate 0.0395 Epoch: 14 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:23,231-Speed 11272.45 samples/sec Loss 4.1228 LearningRate 0.0395 Epoch: 14 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:26,831-Speed 11382.23 samples/sec Loss 4.1422 LearningRate 0.0394 Epoch: 14 Global Step: 59860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:30,282-Speed 11870.75 samples/sec Loss 4.1540 LearningRate 0.0394 Epoch: 14 Global Step: 59870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:33,788-Speed 11685.88 samples/sec Loss 4.1782 LearningRate 0.0394 Epoch: 14 Global Step: 59880 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:01:37,447-Speed 11197.08 samples/sec Loss 4.2178 LearningRate 0.0393 Epoch: 14 Global Step: 59890 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:01:41,814-Speed 9382.23 samples/sec Loss 4.1779 LearningRate 0.0393 Epoch: 14 Global Step: 59900 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:01:45,595-Speed 10834.32 samples/sec Loss 4.0786 LearningRate 0.0393 Epoch: 14 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:49,457-Speed 10607.97 samples/sec Loss 4.1632 LearningRate 0.0392 Epoch: 14 Global Step: 59920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:53,001-Speed 11560.26 samples/sec Loss 4.1632 LearningRate 0.0392 Epoch: 14 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:01:56,430-Speed 11949.13 samples/sec Loss 4.1193 LearningRate 0.0392 Epoch: 14 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:02:00,076-Speed 11236.74 samples/sec Loss 4.1484 LearningRate 0.0391 Epoch: 14 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:02:03,610-Speed 11593.02 samples/sec Loss 4.2025 LearningRate 0.0391 Epoch: 14 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:02:06,970-Speed 12194.37 samples/sec Loss 4.1865 LearningRate 0.0391 Epoch: 14 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:02:10,388-Speed 11986.74 samples/sec Loss 4.1415 LearningRate 0.0390 Epoch: 14 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:02:14,309-Speed 10449.27 samples/sec Loss 4.1470 LearningRate 0.0390 Epoch: 14 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:02:17,995-Speed 11116.45 samples/sec Loss 4.1691 LearningRate 0.0390 Epoch: 14 Global Step: 60000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:02:39,370-[lfw][60000]XNorm: 7.930625 Training: 2022-01-17 04:02:39,370-[lfw][60000]Accuracy-Flip: 0.99667+-0.00307 Training: 2022-01-17 04:02:39,371-[lfw][60000]Accuracy-Highest: 0.99667 Training: 2022-01-17 04:03:04,006-[cfp_fp][60000]XNorm: 6.711242 Training: 2022-01-17 04:03:04,006-[cfp_fp][60000]Accuracy-Flip: 0.97171+-0.01061 Training: 2022-01-17 04:03:04,007-[cfp_fp][60000]Accuracy-Highest: 0.97171 Training: 2022-01-17 04:03:25,299-[agedb_30][60000]XNorm: 7.617681 Training: 2022-01-17 04:03:25,300-[agedb_30][60000]Accuracy-Flip: 0.96817+-0.00950 Training: 2022-01-17 04:03:25,300-[agedb_30][60000]Accuracy-Highest: 0.96933 Training: 2022-01-17 04:03:28,661-Speed 579.67 samples/sec Loss 4.1171 LearningRate 0.0389 Epoch: 14 Global Step: 60010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:32,053-Speed 12077.32 samples/sec Loss 4.1650 LearningRate 0.0389 Epoch: 14 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:35,370-Speed 12354.53 samples/sec Loss 4.1449 LearningRate 0.0389 Epoch: 14 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:38,777-Speed 12022.82 samples/sec Loss 4.1441 LearningRate 0.0388 Epoch: 14 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:42,309-Speed 11599.59 samples/sec Loss 4.2137 LearningRate 0.0388 Epoch: 14 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:45,976-Speed 11175.11 samples/sec Loss 4.1366 LearningRate 0.0388 Epoch: 14 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:49,635-Speed 11196.05 samples/sec Loss 4.1651 LearningRate 0.0387 Epoch: 14 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:53,251-Speed 11330.64 samples/sec Loss 4.1466 LearningRate 0.0387 Epoch: 14 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:03:56,695-Speed 11893.80 samples/sec Loss 4.1398 LearningRate 0.0387 Epoch: 14 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:04:00,331-Speed 11271.57 samples/sec Loss 4.1669 LearningRate 0.0386 Epoch: 14 Global Step: 60100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-17 04:04:03,778-Speed 11902.57 samples/sec Loss 4.1884 LearningRate 0.0386 Epoch: 14 Global Step: 60110 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:04:08,007-Speed 9685.36 samples/sec Loss 4.1784 LearningRate 0.0386 Epoch: 14 Global Step: 60120 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:04:11,436-Speed 11949.22 samples/sec Loss 4.1608 LearningRate 0.0385 Epoch: 14 Global Step: 60130 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:04:15,091-Speed 11209.67 samples/sec Loss 4.1836 LearningRate 0.0385 Epoch: 14 Global Step: 60140 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:04:18,800-Speed 11047.22 samples/sec Loss 4.2253 LearningRate 0.0385 Epoch: 14 Global Step: 60150 Fp16 Grad Scale: 262144 Required: 3 hours Training: 2022-01-17 04:04:22,591-Speed 10807.64 samples/sec Loss 4.1672 LearningRate 0.0384 Epoch: 14 Global Step: 60160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:26,027-Speed 11922.80 samples/sec Loss 4.1198 LearningRate 0.0384 Epoch: 14 Global Step: 60170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:29,566-Speed 11575.01 samples/sec Loss 4.1367 LearningRate 0.0384 Epoch: 14 Global Step: 60180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:33,250-Speed 11122.09 samples/sec Loss 4.1628 LearningRate 0.0383 Epoch: 14 Global Step: 60190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:36,735-Speed 11755.92 samples/sec Loss 4.1673 LearningRate 0.0383 Epoch: 14 Global Step: 60200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:40,590-Speed 10628.41 samples/sec Loss 4.1877 LearningRate 0.0383 Epoch: 14 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:44,047-Speed 11851.60 samples/sec Loss 4.1504 LearningRate 0.0382 Epoch: 14 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:47,631-Speed 11432.10 samples/sec Loss 4.1437 LearningRate 0.0382 Epoch: 14 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:51,431-Speed 10778.83 samples/sec Loss 4.1713 LearningRate 0.0382 Epoch: 14 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:54,972-Speed 11574.02 samples/sec Loss 4.1736 LearningRate 0.0381 Epoch: 14 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:04:58,698-Speed 10995.55 samples/sec Loss 4.1483 LearningRate 0.0381 Epoch: 14 Global Step: 60260 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:05:02,400-Speed 11065.77 samples/sec Loss 4.1307 LearningRate 0.0381 Epoch: 14 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:06,093-Speed 11094.86 samples/sec Loss 4.1302 LearningRate 0.0380 Epoch: 14 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:10,486-Speed 9326.68 samples/sec Loss 4.2029 LearningRate 0.0380 Epoch: 14 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:13,987-Speed 11701.14 samples/sec Loss 4.1868 LearningRate 0.0380 Epoch: 14 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:17,712-Speed 10998.12 samples/sec Loss 4.1771 LearningRate 0.0379 Epoch: 14 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:21,673-Speed 10343.13 samples/sec Loss 4.1985 LearningRate 0.0379 Epoch: 14 Global Step: 60320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:25,154-Speed 11781.46 samples/sec Loss 4.1798 LearningRate 0.0379 Epoch: 14 Global Step: 60330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:29,356-Speed 9747.36 samples/sec Loss 4.2041 LearningRate 0.0378 Epoch: 14 Global Step: 60340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:32,800-Speed 11898.37 samples/sec Loss 4.2413 LearningRate 0.0378 Epoch: 14 Global Step: 60350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:36,259-Speed 11846.28 samples/sec Loss 4.1701 LearningRate 0.0378 Epoch: 14 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:05:40,111-Speed 10636.59 samples/sec Loss 4.1739 LearningRate 0.0378 Epoch: 14 Global Step: 60370 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:05:43,622-Speed 11667.34 samples/sec Loss 4.1902 LearningRate 0.0377 Epoch: 14 Global Step: 60380 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:05:47,198-Speed 11457.60 samples/sec Loss 4.1488 LearningRate 0.0377 Epoch: 14 Global Step: 60390 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:05:50,831-Speed 11276.27 samples/sec Loss 4.2024 LearningRate 0.0377 Epoch: 14 Global Step: 60400 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:05:54,613-Speed 10833.62 samples/sec Loss 4.1703 LearningRate 0.0376 Epoch: 14 Global Step: 60410 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:05:58,166-Speed 11529.93 samples/sec Loss 4.2140 LearningRate 0.0376 Epoch: 14 Global Step: 60420 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:06:01,873-Speed 11052.10 samples/sec Loss 4.1818 LearningRate 0.0376 Epoch: 14 Global Step: 60430 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:06:05,319-Speed 11902.02 samples/sec Loss 4.1820 LearningRate 0.0375 Epoch: 14 Global Step: 60440 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:06:09,065-Speed 10937.20 samples/sec Loss 4.2459 LearningRate 0.0375 Epoch: 14 Global Step: 60450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:13,780-Speed 8690.51 samples/sec Loss 4.2087 LearningRate 0.0375 Epoch: 14 Global Step: 60460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:17,357-Speed 11450.28 samples/sec Loss 4.2086 LearningRate 0.0374 Epoch: 14 Global Step: 60470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:21,036-Speed 11135.34 samples/sec Loss 4.1809 LearningRate 0.0374 Epoch: 14 Global Step: 60480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:24,547-Speed 11669.87 samples/sec Loss 4.1877 LearningRate 0.0374 Epoch: 14 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:28,293-Speed 10938.71 samples/sec Loss 4.1888 LearningRate 0.0373 Epoch: 14 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:32,072-Speed 10841.35 samples/sec Loss 4.1868 LearningRate 0.0373 Epoch: 14 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:35,653-Speed 11443.37 samples/sec Loss 4.1695 LearningRate 0.0373 Epoch: 14 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:39,087-Speed 11928.62 samples/sec Loss 4.2444 LearningRate 0.0372 Epoch: 14 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:42,591-Speed 11694.74 samples/sec Loss 4.2268 LearningRate 0.0372 Epoch: 14 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:46,033-Speed 11902.90 samples/sec Loss 4.1536 LearningRate 0.0372 Epoch: 14 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:49,844-Speed 10752.04 samples/sec Loss 4.1712 LearningRate 0.0371 Epoch: 14 Global Step: 60560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:53,535-Speed 11100.70 samples/sec Loss 4.1794 LearningRate 0.0371 Epoch: 14 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:06:57,203-Speed 11168.68 samples/sec Loss 4.2121 LearningRate 0.0371 Epoch: 14 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:07:00,691-Speed 11745.75 samples/sec Loss 4.1887 LearningRate 0.0370 Epoch: 14 Global Step: 60590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:07:04,171-Speed 11773.13 samples/sec Loss 4.1983 LearningRate 0.0370 Epoch: 14 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:07:07,943-Speed 10864.39 samples/sec Loss 4.2143 LearningRate 0.0370 Epoch: 14 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:12,027-Speed 10030.99 samples/sec Loss 4.2318 LearningRate 0.0369 Epoch: 14 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:16,486-Speed 9187.71 samples/sec Loss 4.1743 LearningRate 0.0369 Epoch: 14 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:19,934-Speed 11880.40 samples/sec Loss 4.2283 LearningRate 0.0369 Epoch: 14 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:23,400-Speed 11819.95 samples/sec Loss 4.2545 LearningRate 0.0368 Epoch: 14 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:26,805-Speed 12037.27 samples/sec Loss 4.1958 LearningRate 0.0368 Epoch: 14 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:30,215-Speed 12013.05 samples/sec Loss 4.1720 LearningRate 0.0368 Epoch: 14 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:33,922-Speed 11055.18 samples/sec Loss 4.1248 LearningRate 0.0367 Epoch: 14 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:37,465-Speed 11563.11 samples/sec Loss 4.1788 LearningRate 0.0367 Epoch: 14 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:41,201-Speed 10966.95 samples/sec Loss 4.2122 LearningRate 0.0367 Epoch: 14 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:07:45,058-Speed 10621.69 samples/sec Loss 4.1603 LearningRate 0.0366 Epoch: 14 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:07:48,820-Speed 10891.62 samples/sec Loss 4.2176 LearningRate 0.0366 Epoch: 14 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:07:52,314-Speed 11723.38 samples/sec Loss 4.2007 LearningRate 0.0366 Epoch: 14 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:07:55,984-Speed 11162.01 samples/sec Loss 4.1680 LearningRate 0.0365 Epoch: 14 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:07:59,551-Speed 11489.27 samples/sec Loss 4.1700 LearningRate 0.0365 Epoch: 14 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:03,023-Speed 11798.69 samples/sec Loss 4.2009 LearningRate 0.0365 Epoch: 14 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:06,477-Speed 11863.25 samples/sec Loss 4.2206 LearningRate 0.0365 Epoch: 14 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:10,107-Speed 11285.98 samples/sec Loss 4.1663 LearningRate 0.0364 Epoch: 14 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:13,634-Speed 11614.68 samples/sec Loss 4.2030 LearningRate 0.0364 Epoch: 14 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:18,672-Speed 8131.32 samples/sec Loss 4.2279 LearningRate 0.0364 Epoch: 14 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:22,238-Speed 11491.54 samples/sec Loss 4.2276 LearningRate 0.0363 Epoch: 14 Global Step: 60810 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:08:25,645-Speed 12025.96 samples/sec Loss 4.1527 LearningRate 0.0363 Epoch: 14 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:29,502-Speed 10621.24 samples/sec Loss 4.1860 LearningRate 0.0363 Epoch: 14 Global Step: 60830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:33,251-Speed 10929.09 samples/sec Loss 4.1338 LearningRate 0.0362 Epoch: 14 Global Step: 60840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:36,849-Speed 11386.83 samples/sec Loss 4.1815 LearningRate 0.0362 Epoch: 14 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:40,605-Speed 10908.29 samples/sec Loss 4.1844 LearningRate 0.0362 Epoch: 14 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:44,125-Speed 11639.58 samples/sec Loss 4.2021 LearningRate 0.0361 Epoch: 14 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:48,075-Speed 10372.19 samples/sec Loss 4.2174 LearningRate 0.0361 Epoch: 14 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:51,816-Speed 10951.69 samples/sec Loss 4.2014 LearningRate 0.0361 Epoch: 14 Global Step: 60890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:55,493-Speed 11140.38 samples/sec Loss 4.2214 LearningRate 0.0360 Epoch: 14 Global Step: 60900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:08:59,191-Speed 11078.39 samples/sec Loss 4.1507 LearningRate 0.0360 Epoch: 14 Global Step: 60910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:02,792-Speed 11378.84 samples/sec Loss 4.2091 LearningRate 0.0360 Epoch: 14 Global Step: 60920 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:09:06,281-Speed 11742.82 samples/sec Loss 4.1963 LearningRate 0.0359 Epoch: 14 Global Step: 60930 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:09:09,880-Speed 11382.96 samples/sec Loss 4.2276 LearningRate 0.0359 Epoch: 14 Global Step: 60940 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:09:13,464-Speed 11435.53 samples/sec Loss 4.1954 LearningRate 0.0359 Epoch: 14 Global Step: 60950 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:09:16,878-Speed 12000.88 samples/sec Loss 4.2081 LearningRate 0.0358 Epoch: 14 Global Step: 60960 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:09:21,138-Speed 9618.54 samples/sec Loss 4.2013 LearningRate 0.0358 Epoch: 14 Global Step: 60970 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:09:25,247-Speed 9969.80 samples/sec Loss 4.1607 LearningRate 0.0358 Epoch: 14 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:28,674-Speed 11955.16 samples/sec Loss 4.2158 LearningRate 0.0357 Epoch: 14 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:32,288-Speed 11333.60 samples/sec Loss 4.1759 LearningRate 0.0357 Epoch: 14 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:35,786-Speed 11717.65 samples/sec Loss 4.1751 LearningRate 0.0357 Epoch: 14 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:39,411-Speed 11301.94 samples/sec Loss 4.2157 LearningRate 0.0357 Epoch: 14 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:43,163-Speed 10917.22 samples/sec Loss 4.2277 LearningRate 0.0356 Epoch: 14 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:46,859-Speed 11085.47 samples/sec Loss 4.2051 LearningRate 0.0356 Epoch: 14 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:50,431-Speed 11473.47 samples/sec Loss 4.2412 LearningRate 0.0356 Epoch: 14 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:53,897-Speed 11819.52 samples/sec Loss 4.1533 LearningRate 0.0355 Epoch: 14 Global Step: 61060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:09:57,521-Speed 11306.29 samples/sec Loss 4.2065 LearningRate 0.0355 Epoch: 14 Global Step: 61070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:00,959-Speed 11914.93 samples/sec Loss 4.2015 LearningRate 0.0355 Epoch: 14 Global Step: 61080 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:10:04,613-Speed 11211.95 samples/sec Loss 4.2393 LearningRate 0.0354 Epoch: 14 Global Step: 61090 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:10:08,095-Speed 11766.24 samples/sec Loss 4.2042 LearningRate 0.0354 Epoch: 14 Global Step: 61100 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:10:11,690-Speed 11396.64 samples/sec Loss 4.2232 LearningRate 0.0354 Epoch: 14 Global Step: 61110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:15,636-Speed 10382.72 samples/sec Loss 4.1874 LearningRate 0.0353 Epoch: 14 Global Step: 61120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:19,083-Speed 11886.13 samples/sec Loss 4.1890 LearningRate 0.0353 Epoch: 14 Global Step: 61130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:23,372-Speed 9552.70 samples/sec Loss 4.2185 LearningRate 0.0353 Epoch: 14 Global Step: 61140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:27,048-Speed 11145.52 samples/sec Loss 4.2062 LearningRate 0.0352 Epoch: 14 Global Step: 61150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:30,742-Speed 11091.12 samples/sec Loss 4.1279 LearningRate 0.0352 Epoch: 14 Global Step: 61160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:34,498-Speed 10907.48 samples/sec Loss 4.2297 LearningRate 0.0352 Epoch: 14 Global Step: 61170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:38,280-Speed 10834.75 samples/sec Loss 4.1924 LearningRate 0.0351 Epoch: 14 Global Step: 61180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:41,750-Speed 11805.55 samples/sec Loss 4.2311 LearningRate 0.0351 Epoch: 14 Global Step: 61190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:45,574-Speed 10712.37 samples/sec Loss 4.2182 LearningRate 0.0351 Epoch: 14 Global Step: 61200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:10:49,136-Speed 11504.33 samples/sec Loss 4.1898 LearningRate 0.0351 Epoch: 14 Global Step: 61210 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:10:52,543-Speed 12027.24 samples/sec Loss 4.2649 LearningRate 0.0350 Epoch: 14 Global Step: 61220 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:10:56,407-Speed 10601.13 samples/sec Loss 4.2029 LearningRate 0.0350 Epoch: 14 Global Step: 61230 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:11:00,184-Speed 10847.43 samples/sec Loss 4.1904 LearningRate 0.0350 Epoch: 14 Global Step: 61240 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:11:03,689-Speed 11686.06 samples/sec Loss 4.2211 LearningRate 0.0349 Epoch: 14 Global Step: 61250 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:11:07,295-Speed 11363.50 samples/sec Loss 4.1725 LearningRate 0.0349 Epoch: 14 Global Step: 61260 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:11:10,653-Speed 12201.79 samples/sec Loss 4.2269 LearningRate 0.0349 Epoch: 14 Global Step: 61270 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:11:14,105-Speed 11868.68 samples/sec Loss 4.2423 LearningRate 0.0348 Epoch: 14 Global Step: 61280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:11:17,619-Speed 11658.77 samples/sec Loss 4.1956 LearningRate 0.0348 Epoch: 14 Global Step: 61290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:11:21,433-Speed 10743.40 samples/sec Loss 4.1907 LearningRate 0.0348 Epoch: 14 Global Step: 61300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:11:25,940-Speed 9089.91 samples/sec Loss 4.2021 LearningRate 0.0347 Epoch: 14 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:29,473-Speed 11597.56 samples/sec Loss 4.1924 LearningRate 0.0347 Epoch: 14 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:33,049-Speed 11455.41 samples/sec Loss 4.2128 LearningRate 0.0347 Epoch: 14 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:36,451-Speed 12045.88 samples/sec Loss 4.2238 LearningRate 0.0346 Epoch: 14 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:39,865-Speed 12002.15 samples/sec Loss 4.2320 LearningRate 0.0346 Epoch: 14 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:43,470-Speed 11365.46 samples/sec Loss 4.2099 LearningRate 0.0346 Epoch: 14 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:47,012-Speed 11570.36 samples/sec Loss 4.2162 LearningRate 0.0345 Epoch: 14 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:50,823-Speed 10751.11 samples/sec Loss 4.1964 LearningRate 0.0345 Epoch: 14 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:54,393-Speed 11476.69 samples/sec Loss 4.1975 LearningRate 0.0345 Epoch: 14 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:11:57,968-Speed 11458.77 samples/sec Loss 4.2208 LearningRate 0.0345 Epoch: 14 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:12:01,715-Speed 10933.47 samples/sec Loss 4.1632 LearningRate 0.0344 Epoch: 14 Global Step: 61410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:05,479-Speed 10885.50 samples/sec Loss 4.2046 LearningRate 0.0344 Epoch: 14 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:09,311-Speed 10693.08 samples/sec Loss 4.1865 LearningRate 0.0344 Epoch: 14 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:12,769-Speed 11849.08 samples/sec Loss 4.1700 LearningRate 0.0343 Epoch: 14 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:16,556-Speed 10816.07 samples/sec Loss 4.2140 LearningRate 0.0343 Epoch: 14 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:20,607-Speed 10115.93 samples/sec Loss 4.1623 LearningRate 0.0343 Epoch: 14 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:24,166-Speed 11510.14 samples/sec Loss 4.2442 LearningRate 0.0342 Epoch: 14 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:28,458-Speed 9545.69 samples/sec Loss 4.2240 LearningRate 0.0342 Epoch: 14 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:32,435-Speed 10300.55 samples/sec Loss 4.2192 LearningRate 0.0342 Epoch: 14 Global Step: 61490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:36,083-Speed 11234.47 samples/sec Loss 4.1394 LearningRate 0.0341 Epoch: 14 Global Step: 61500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:39,851-Speed 10871.36 samples/sec Loss 4.1877 LearningRate 0.0341 Epoch: 14 Global Step: 61510 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:12:43,513-Speed 11189.82 samples/sec Loss 4.1593 LearningRate 0.0341 Epoch: 14 Global Step: 61520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:47,571-Speed 10094.90 samples/sec Loss 4.2347 LearningRate 0.0340 Epoch: 14 Global Step: 61530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:51,359-Speed 10815.02 samples/sec Loss 4.1981 LearningRate 0.0340 Epoch: 14 Global Step: 61540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:54,978-Speed 11320.64 samples/sec Loss 4.1750 LearningRate 0.0340 Epoch: 14 Global Step: 61550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:12:58,723-Speed 10942.23 samples/sec Loss 4.2168 LearningRate 0.0340 Epoch: 14 Global Step: 61560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:02,481-Speed 10898.68 samples/sec Loss 4.1844 LearningRate 0.0339 Epoch: 14 Global Step: 61570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:06,209-Speed 10992.30 samples/sec Loss 4.1659 LearningRate 0.0339 Epoch: 14 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:09,795-Speed 11426.07 samples/sec Loss 4.2013 LearningRate 0.0339 Epoch: 14 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:13,481-Speed 11114.53 samples/sec Loss 4.1988 LearningRate 0.0338 Epoch: 14 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:17,206-Speed 10999.73 samples/sec Loss 4.1991 LearningRate 0.0338 Epoch: 14 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:20,853-Speed 11237.12 samples/sec Loss 4.1707 LearningRate 0.0338 Epoch: 14 Global Step: 61620 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:13:24,434-Speed 11438.69 samples/sec Loss 4.1962 LearningRate 0.0337 Epoch: 14 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:27,919-Speed 11757.90 samples/sec Loss 4.1647 LearningRate 0.0337 Epoch: 14 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:32,253-Speed 9452.81 samples/sec Loss 4.1842 LearningRate 0.0337 Epoch: 14 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:35,922-Speed 11168.11 samples/sec Loss 4.1949 LearningRate 0.0336 Epoch: 14 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:39,471-Speed 11541.81 samples/sec Loss 4.1761 LearningRate 0.0336 Epoch: 14 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:43,193-Speed 11009.36 samples/sec Loss 4.2026 LearningRate 0.0336 Epoch: 14 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:47,235-Speed 10135.24 samples/sec Loss 4.2211 LearningRate 0.0336 Epoch: 14 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:50,864-Speed 11290.48 samples/sec Loss 4.2334 LearningRate 0.0335 Epoch: 14 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:54,834-Speed 10320.01 samples/sec Loss 4.2329 LearningRate 0.0335 Epoch: 14 Global Step: 61710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:13:58,299-Speed 11822.58 samples/sec Loss 4.1917 LearningRate 0.0335 Epoch: 14 Global Step: 61720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:01,721-Speed 11973.50 samples/sec Loss 4.2265 LearningRate 0.0334 Epoch: 14 Global Step: 61730 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:14:05,234-Speed 11663.27 samples/sec Loss 4.2064 LearningRate 0.0334 Epoch: 14 Global Step: 61740 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:14:08,789-Speed 11522.64 samples/sec Loss 4.1890 LearningRate 0.0334 Epoch: 14 Global Step: 61750 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:14:12,731-Speed 10394.72 samples/sec Loss 4.1601 LearningRate 0.0333 Epoch: 14 Global Step: 61760 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:14:16,907-Speed 9810.60 samples/sec Loss 4.2388 LearningRate 0.0333 Epoch: 14 Global Step: 61770 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:14:20,795-Speed 10537.43 samples/sec Loss 4.2151 LearningRate 0.0333 Epoch: 14 Global Step: 61780 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:14:24,516-Speed 11012.37 samples/sec Loss 4.2215 LearningRate 0.0332 Epoch: 14 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:28,202-Speed 11113.77 samples/sec Loss 4.2069 LearningRate 0.0332 Epoch: 14 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:31,810-Speed 11353.24 samples/sec Loss 4.1845 LearningRate 0.0332 Epoch: 14 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:36,226-Speed 9278.23 samples/sec Loss 4.2063 LearningRate 0.0332 Epoch: 14 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:39,999-Speed 10857.83 samples/sec Loss 4.2273 LearningRate 0.0331 Epoch: 14 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:43,621-Speed 11311.90 samples/sec Loss 4.1717 LearningRate 0.0331 Epoch: 14 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:47,308-Speed 11112.07 samples/sec Loss 4.2021 LearningRate 0.0331 Epoch: 14 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:51,358-Speed 10119.25 samples/sec Loss 4.2420 LearningRate 0.0330 Epoch: 14 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:55,123-Speed 10880.36 samples/sec Loss 4.1748 LearningRate 0.0330 Epoch: 14 Global Step: 61870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:14:58,803-Speed 11132.82 samples/sec Loss 4.1801 LearningRate 0.0330 Epoch: 14 Global Step: 61880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:02,230-Speed 11953.38 samples/sec Loss 4.1880 LearningRate 0.0329 Epoch: 14 Global Step: 61890 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:15:06,041-Speed 10750.47 samples/sec Loss 4.1624 LearningRate 0.0329 Epoch: 14 Global Step: 61900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:09,855-Speed 10741.61 samples/sec Loss 4.2588 LearningRate 0.0329 Epoch: 14 Global Step: 61910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:13,539-Speed 11124.53 samples/sec Loss 4.1952 LearningRate 0.0328 Epoch: 14 Global Step: 61920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:17,521-Speed 10287.16 samples/sec Loss 4.1928 LearningRate 0.0328 Epoch: 14 Global Step: 61930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:21,038-Speed 11651.32 samples/sec Loss 4.2244 LearningRate 0.0328 Epoch: 14 Global Step: 61940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:24,555-Speed 11648.18 samples/sec Loss 4.2118 LearningRate 0.0328 Epoch: 14 Global Step: 61950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:28,030-Speed 11789.81 samples/sec Loss 4.1808 LearningRate 0.0327 Epoch: 14 Global Step: 61960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:31,621-Speed 11408.37 samples/sec Loss 4.1635 LearningRate 0.0327 Epoch: 14 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:35,352-Speed 10984.20 samples/sec Loss 4.2089 LearningRate 0.0327 Epoch: 14 Global Step: 61980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:39,811-Speed 9187.40 samples/sec Loss 4.1631 LearningRate 0.0326 Epoch: 14 Global Step: 61990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:43,342-Speed 11601.66 samples/sec Loss 4.1794 LearningRate 0.0326 Epoch: 14 Global Step: 62000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:47,229-Speed 10541.86 samples/sec Loss 4.1775 LearningRate 0.0326 Epoch: 14 Global Step: 62010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:50,880-Speed 11222.14 samples/sec Loss 4.1808 LearningRate 0.0325 Epoch: 14 Global Step: 62020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:54,434-Speed 11528.00 samples/sec Loss 4.1921 LearningRate 0.0325 Epoch: 14 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:15:58,177-Speed 10945.19 samples/sec Loss 4.2175 LearningRate 0.0325 Epoch: 14 Global Step: 62040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:01,643-Speed 11820.18 samples/sec Loss 4.1657 LearningRate 0.0325 Epoch: 14 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:05,108-Speed 11823.59 samples/sec Loss 4.2191 LearningRate 0.0324 Epoch: 14 Global Step: 62060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:08,826-Speed 11020.54 samples/sec Loss 4.1952 LearningRate 0.0324 Epoch: 14 Global Step: 62070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:12,418-Speed 11405.20 samples/sec Loss 4.1800 LearningRate 0.0324 Epoch: 14 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:16,107-Speed 11104.75 samples/sec Loss 4.1943 LearningRate 0.0323 Epoch: 14 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:19,543-Speed 11923.51 samples/sec Loss 4.1708 LearningRate 0.0323 Epoch: 14 Global Step: 62100 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:16:22,979-Speed 11928.62 samples/sec Loss 4.2056 LearningRate 0.0323 Epoch: 14 Global Step: 62110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:26,993-Speed 10206.27 samples/sec Loss 4.2259 LearningRate 0.0322 Epoch: 14 Global Step: 62120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:30,754-Speed 10892.46 samples/sec Loss 4.2287 LearningRate 0.0322 Epoch: 14 Global Step: 62130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:34,211-Speed 11849.90 samples/sec Loss 4.1773 LearningRate 0.0322 Epoch: 14 Global Step: 62140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:38,210-Speed 10245.16 samples/sec Loss 4.1757 LearningRate 0.0321 Epoch: 14 Global Step: 62150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:42,516-Speed 9513.48 samples/sec Loss 4.2040 LearningRate 0.0321 Epoch: 14 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:45,998-Speed 11768.52 samples/sec Loss 4.1900 LearningRate 0.0321 Epoch: 14 Global Step: 62170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:49,553-Speed 11523.14 samples/sec Loss 4.1818 LearningRate 0.0321 Epoch: 14 Global Step: 62180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:53,094-Speed 11571.62 samples/sec Loss 4.1854 LearningRate 0.0320 Epoch: 14 Global Step: 62190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:16:57,192-Speed 9994.60 samples/sec Loss 4.1910 LearningRate 0.0320 Epoch: 14 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:01,167-Speed 10308.02 samples/sec Loss 4.2130 LearningRate 0.0320 Epoch: 14 Global Step: 62210 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:17:04,850-Speed 11123.35 samples/sec Loss 4.2239 LearningRate 0.0319 Epoch: 14 Global Step: 62220 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:17:08,750-Speed 10506.94 samples/sec Loss 4.1805 LearningRate 0.0319 Epoch: 14 Global Step: 62230 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:17:12,273-Speed 11627.05 samples/sec Loss 4.2076 LearningRate 0.0319 Epoch: 14 Global Step: 62240 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:17:15,821-Speed 11548.96 samples/sec Loss 4.1579 LearningRate 0.0318 Epoch: 14 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:19,318-Speed 11716.68 samples/sec Loss 4.1985 LearningRate 0.0318 Epoch: 14 Global Step: 62260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:22,731-Speed 12006.07 samples/sec Loss 4.2138 LearningRate 0.0318 Epoch: 14 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:26,405-Speed 11148.94 samples/sec Loss 4.1867 LearningRate 0.0318 Epoch: 14 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:29,973-Speed 11484.79 samples/sec Loss 4.1513 LearningRate 0.0317 Epoch: 14 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:33,455-Speed 11764.83 samples/sec Loss 4.1642 LearningRate 0.0317 Epoch: 14 Global Step: 62300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:37,167-Speed 11046.40 samples/sec Loss 4.2314 LearningRate 0.0317 Epoch: 14 Global Step: 62310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:40,610-Speed 11898.57 samples/sec Loss 4.1714 LearningRate 0.0316 Epoch: 14 Global Step: 62320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:44,536-Speed 10434.66 samples/sec Loss 4.1918 LearningRate 0.0316 Epoch: 14 Global Step: 62330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:48,407-Speed 10583.50 samples/sec Loss 4.2191 LearningRate 0.0316 Epoch: 14 Global Step: 62340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:52,324-Speed 10459.67 samples/sec Loss 4.1995 LearningRate 0.0315 Epoch: 14 Global Step: 62350 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:17:55,903-Speed 11446.49 samples/sec Loss 4.1587 LearningRate 0.0315 Epoch: 14 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:17:59,351-Speed 11887.83 samples/sec Loss 4.1608 LearningRate 0.0315 Epoch: 14 Global Step: 62370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:02,749-Speed 12057.22 samples/sec Loss 4.2097 LearningRate 0.0315 Epoch: 14 Global Step: 62380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:06,181-Speed 11938.80 samples/sec Loss 4.1947 LearningRate 0.0314 Epoch: 14 Global Step: 62390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:09,580-Speed 12053.64 samples/sec Loss 4.2118 LearningRate 0.0314 Epoch: 14 Global Step: 62400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:13,256-Speed 11144.53 samples/sec Loss 4.1603 LearningRate 0.0314 Epoch: 14 Global Step: 62410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:16,818-Speed 11499.39 samples/sec Loss 4.1821 LearningRate 0.0313 Epoch: 14 Global Step: 62420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:20,339-Speed 11638.94 samples/sec Loss 4.1857 LearningRate 0.0313 Epoch: 14 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:23,886-Speed 11548.76 samples/sec Loss 4.1866 LearningRate 0.0313 Epoch: 14 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:27,486-Speed 11380.41 samples/sec Loss 4.1904 LearningRate 0.0313 Epoch: 14 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:31,216-Speed 10985.51 samples/sec Loss 4.1632 LearningRate 0.0312 Epoch: 14 Global Step: 62460 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:18:34,994-Speed 10842.32 samples/sec Loss 4.1609 LearningRate 0.0312 Epoch: 14 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:38,625-Speed 11286.43 samples/sec Loss 4.1650 LearningRate 0.0312 Epoch: 14 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:42,057-Speed 11936.29 samples/sec Loss 4.2042 LearningRate 0.0311 Epoch: 14 Global Step: 62490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:45,561-Speed 11693.61 samples/sec Loss 4.1794 LearningRate 0.0311 Epoch: 14 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:49,851-Speed 9548.83 samples/sec Loss 4.1570 LearningRate 0.0311 Epoch: 14 Global Step: 62510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:53,408-Speed 11518.42 samples/sec Loss 4.1665 LearningRate 0.0310 Epoch: 14 Global Step: 62520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:18:56,897-Speed 11742.14 samples/sec Loss 4.2178 LearningRate 0.0310 Epoch: 14 Global Step: 62530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:19:00,505-Speed 11355.46 samples/sec Loss 4.1948 LearningRate 0.0310 Epoch: 14 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:19:04,291-Speed 10822.64 samples/sec Loss 4.1895 LearningRate 0.0310 Epoch: 14 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:19:07,918-Speed 11298.12 samples/sec Loss 4.1342 LearningRate 0.0309 Epoch: 14 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:19:11,854-Speed 10408.37 samples/sec Loss 4.1808 LearningRate 0.0309 Epoch: 14 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:19:15,379-Speed 11619.37 samples/sec Loss 4.1988 LearningRate 0.0309 Epoch: 14 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:19:49,542-Speed 1199.00 samples/sec Loss 3.5246 LearningRate 0.0308 Epoch: 15 Global Step: 62590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:19:53,395-Speed 10635.35 samples/sec Loss 3.5636 LearningRate 0.0308 Epoch: 15 Global Step: 62600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:19:58,065-Speed 8772.57 samples/sec Loss 3.5811 LearningRate 0.0308 Epoch: 15 Global Step: 62610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:20:01,537-Speed 11800.69 samples/sec Loss 3.5107 LearningRate 0.0307 Epoch: 15 Global Step: 62620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:20:04,998-Speed 11837.85 samples/sec Loss 3.5391 LearningRate 0.0307 Epoch: 15 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:20:09,234-Speed 9674.00 samples/sec Loss 3.5898 LearningRate 0.0307 Epoch: 15 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:12,655-Speed 11975.54 samples/sec Loss 3.5961 LearningRate 0.0307 Epoch: 15 Global Step: 62650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:16,268-Speed 11338.30 samples/sec Loss 3.5588 LearningRate 0.0306 Epoch: 15 Global Step: 62660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:19,734-Speed 11823.67 samples/sec Loss 3.5645 LearningRate 0.0306 Epoch: 15 Global Step: 62670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:23,134-Speed 12051.14 samples/sec Loss 3.5939 LearningRate 0.0306 Epoch: 15 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:26,896-Speed 10888.91 samples/sec Loss 3.6189 LearningRate 0.0305 Epoch: 15 Global Step: 62690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:30,566-Speed 11163.52 samples/sec Loss 3.5790 LearningRate 0.0305 Epoch: 15 Global Step: 62700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:34,134-Speed 11485.09 samples/sec Loss 3.5989 LearningRate 0.0305 Epoch: 15 Global Step: 62710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:38,349-Speed 9718.79 samples/sec Loss 3.5925 LearningRate 0.0305 Epoch: 15 Global Step: 62720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:41,768-Speed 11985.05 samples/sec Loss 3.6077 LearningRate 0.0304 Epoch: 15 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:45,170-Speed 12041.49 samples/sec Loss 3.5775 LearningRate 0.0304 Epoch: 15 Global Step: 62740 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:20:48,545-Speed 12140.97 samples/sec Loss 3.5875 LearningRate 0.0304 Epoch: 15 Global Step: 62750 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:20:52,024-Speed 11774.53 samples/sec Loss 3.5777 LearningRate 0.0303 Epoch: 15 Global Step: 62760 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:20:55,856-Speed 10692.39 samples/sec Loss 3.6248 LearningRate 0.0303 Epoch: 15 Global Step: 62770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:20:59,635-Speed 10842.32 samples/sec Loss 3.5856 LearningRate 0.0303 Epoch: 15 Global Step: 62780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:03,255-Speed 11317.41 samples/sec Loss 3.6329 LearningRate 0.0302 Epoch: 15 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:06,726-Speed 11803.11 samples/sec Loss 3.6157 LearningRate 0.0302 Epoch: 15 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:10,404-Speed 11141.70 samples/sec Loss 3.6032 LearningRate 0.0302 Epoch: 15 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:14,141-Speed 10963.61 samples/sec Loss 3.6601 LearningRate 0.0302 Epoch: 15 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:17,802-Speed 11190.34 samples/sec Loss 3.6525 LearningRate 0.0301 Epoch: 15 Global Step: 62830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:21,702-Speed 10505.12 samples/sec Loss 3.6621 LearningRate 0.0301 Epoch: 15 Global Step: 62840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:25,167-Speed 11821.30 samples/sec Loss 3.6244 LearningRate 0.0301 Epoch: 15 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:28,683-Speed 11653.37 samples/sec Loss 3.6882 LearningRate 0.0300 Epoch: 15 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:32,704-Speed 10188.25 samples/sec Loss 3.6303 LearningRate 0.0300 Epoch: 15 Global Step: 62870 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:21:36,278-Speed 11464.43 samples/sec Loss 3.6768 LearningRate 0.0300 Epoch: 15 Global Step: 62880 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:21:39,971-Speed 11095.99 samples/sec Loss 3.6664 LearningRate 0.0300 Epoch: 15 Global Step: 62890 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:21:43,821-Speed 10640.04 samples/sec Loss 3.6104 LearningRate 0.0299 Epoch: 15 Global Step: 62900 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:21:47,320-Speed 11709.82 samples/sec Loss 3.6432 LearningRate 0.0299 Epoch: 15 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:50,776-Speed 11854.85 samples/sec Loss 3.6884 LearningRate 0.0299 Epoch: 15 Global Step: 62920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:54,564-Speed 10816.57 samples/sec Loss 3.6429 LearningRate 0.0298 Epoch: 15 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:21:58,790-Speed 9695.00 samples/sec Loss 3.6482 LearningRate 0.0298 Epoch: 15 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:22:02,422-Speed 11279.98 samples/sec Loss 3.6692 LearningRate 0.0298 Epoch: 15 Global Step: 62950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:22:05,977-Speed 11525.27 samples/sec Loss 3.6437 LearningRate 0.0297 Epoch: 15 Global Step: 62960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:22:09,445-Speed 11814.17 samples/sec Loss 3.6607 LearningRate 0.0297 Epoch: 15 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:22:12,971-Speed 11618.43 samples/sec Loss 3.6978 LearningRate 0.0297 Epoch: 15 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:22:16,499-Speed 11615.05 samples/sec Loss 3.6443 LearningRate 0.0297 Epoch: 15 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:22:20,052-Speed 11530.07 samples/sec Loss 3.6356 LearningRate 0.0296 Epoch: 15 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:22:23,571-Speed 11644.56 samples/sec Loss 3.6678 LearningRate 0.0296 Epoch: 15 Global Step: 63010 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:26,949-Speed 12124.92 samples/sec Loss 3.6967 LearningRate 0.0296 Epoch: 15 Global Step: 63020 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:30,489-Speed 11573.88 samples/sec Loss 3.6784 LearningRate 0.0295 Epoch: 15 Global Step: 63030 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:34,059-Speed 11477.18 samples/sec Loss 3.6576 LearningRate 0.0295 Epoch: 15 Global Step: 63040 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:37,812-Speed 10916.33 samples/sec Loss 3.6914 LearningRate 0.0295 Epoch: 15 Global Step: 63050 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:41,516-Speed 11060.41 samples/sec Loss 3.6729 LearningRate 0.0295 Epoch: 15 Global Step: 63060 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:45,161-Speed 11242.32 samples/sec Loss 3.6805 LearningRate 0.0294 Epoch: 15 Global Step: 63070 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:48,759-Speed 11387.92 samples/sec Loss 3.6806 LearningRate 0.0294 Epoch: 15 Global Step: 63080 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:52,278-Speed 11641.92 samples/sec Loss 3.6379 LearningRate 0.0294 Epoch: 15 Global Step: 63090 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:55,927-Speed 11228.83 samples/sec Loss 3.7110 LearningRate 0.0293 Epoch: 15 Global Step: 63100 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:22:59,707-Speed 10838.80 samples/sec Loss 3.6570 LearningRate 0.0293 Epoch: 15 Global Step: 63110 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:23:03,219-Speed 11664.10 samples/sec Loss 3.7135 LearningRate 0.0293 Epoch: 15 Global Step: 63120 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:23:06,807-Speed 11419.80 samples/sec Loss 3.7336 LearningRate 0.0293 Epoch: 15 Global Step: 63130 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:23:10,531-Speed 11002.32 samples/sec Loss 3.6721 LearningRate 0.0292 Epoch: 15 Global Step: 63140 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:23:14,111-Speed 11443.44 samples/sec Loss 3.7084 LearningRate 0.0292 Epoch: 15 Global Step: 63150 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:23:17,498-Speed 12096.78 samples/sec Loss 3.7309 LearningRate 0.0292 Epoch: 15 Global Step: 63160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:20,994-Speed 11718.47 samples/sec Loss 3.7395 LearningRate 0.0291 Epoch: 15 Global Step: 63170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:25,524-Speed 9044.71 samples/sec Loss 3.7000 LearningRate 0.0291 Epoch: 15 Global Step: 63180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:29,275-Speed 10921.01 samples/sec Loss 3.7377 LearningRate 0.0291 Epoch: 15 Global Step: 63190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:33,173-Speed 10512.21 samples/sec Loss 3.7055 LearningRate 0.0291 Epoch: 15 Global Step: 63200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:36,832-Speed 11196.98 samples/sec Loss 3.7352 LearningRate 0.0290 Epoch: 15 Global Step: 63210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:40,493-Speed 11188.84 samples/sec Loss 3.7377 LearningRate 0.0290 Epoch: 15 Global Step: 63220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:43,926-Speed 11936.04 samples/sec Loss 3.7213 LearningRate 0.0290 Epoch: 15 Global Step: 63230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:47,509-Speed 11437.54 samples/sec Loss 3.7280 LearningRate 0.0289 Epoch: 15 Global Step: 63240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:51,273-Speed 10885.23 samples/sec Loss 3.7080 LearningRate 0.0289 Epoch: 15 Global Step: 63250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:23:54,996-Speed 11006.17 samples/sec Loss 3.7383 LearningRate 0.0289 Epoch: 15 Global Step: 63260 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:23:58,713-Speed 11022.32 samples/sec Loss 3.7194 LearningRate 0.0289 Epoch: 15 Global Step: 63270 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:02,532-Speed 10726.21 samples/sec Loss 3.6943 LearningRate 0.0288 Epoch: 15 Global Step: 63280 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:06,299-Speed 10877.20 samples/sec Loss 3.7357 LearningRate 0.0288 Epoch: 15 Global Step: 63290 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:09,993-Speed 11091.72 samples/sec Loss 3.7264 LearningRate 0.0288 Epoch: 15 Global Step: 63300 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:13,392-Speed 12051.18 samples/sec Loss 3.7740 LearningRate 0.0287 Epoch: 15 Global Step: 63310 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:17,056-Speed 11183.73 samples/sec Loss 3.7676 LearningRate 0.0287 Epoch: 15 Global Step: 63320 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:20,641-Speed 11428.26 samples/sec Loss 3.7298 LearningRate 0.0287 Epoch: 15 Global Step: 63330 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:24,340-Speed 11075.21 samples/sec Loss 3.7371 LearningRate 0.0287 Epoch: 15 Global Step: 63340 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:27,890-Speed 11539.57 samples/sec Loss 3.7082 LearningRate 0.0286 Epoch: 15 Global Step: 63350 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:31,641-Speed 10922.65 samples/sec Loss 3.7262 LearningRate 0.0286 Epoch: 15 Global Step: 63360 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:35,159-Speed 11644.83 samples/sec Loss 3.7591 LearningRate 0.0286 Epoch: 15 Global Step: 63370 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:38,663-Speed 11694.45 samples/sec Loss 3.7391 LearningRate 0.0285 Epoch: 15 Global Step: 63380 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:42,618-Speed 10361.18 samples/sec Loss 3.8270 LearningRate 0.0285 Epoch: 15 Global Step: 63390 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:46,214-Speed 11392.95 samples/sec Loss 3.7342 LearningRate 0.0285 Epoch: 15 Global Step: 63400 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:49,829-Speed 11330.65 samples/sec Loss 3.7907 LearningRate 0.0285 Epoch: 15 Global Step: 63410 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:53,327-Speed 11715.69 samples/sec Loss 3.7675 LearningRate 0.0284 Epoch: 15 Global Step: 63420 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:24:56,962-Speed 11271.65 samples/sec Loss 3.7703 LearningRate 0.0284 Epoch: 15 Global Step: 63430 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:25:00,640-Speed 11138.27 samples/sec Loss 3.7307 LearningRate 0.0284 Epoch: 15 Global Step: 63440 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:25:04,149-Speed 11674.87 samples/sec Loss 3.7420 LearningRate 0.0283 Epoch: 15 Global Step: 63450 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:25:07,861-Speed 11037.99 samples/sec Loss 3.7904 LearningRate 0.0283 Epoch: 15 Global Step: 63460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:11,497-Speed 11268.55 samples/sec Loss 3.7169 LearningRate 0.0283 Epoch: 15 Global Step: 63470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:15,009-Speed 11666.75 samples/sec Loss 3.7792 LearningRate 0.0283 Epoch: 15 Global Step: 63480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:18,476-Speed 11817.32 samples/sec Loss 3.7454 LearningRate 0.0282 Epoch: 15 Global Step: 63490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:22,035-Speed 11509.65 samples/sec Loss 3.7772 LearningRate 0.0282 Epoch: 15 Global Step: 63500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:25,512-Speed 11783.55 samples/sec Loss 3.7768 LearningRate 0.0282 Epoch: 15 Global Step: 63510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:28,932-Speed 11980.46 samples/sec Loss 3.7804 LearningRate 0.0281 Epoch: 15 Global Step: 63520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:32,423-Speed 11734.91 samples/sec Loss 3.7762 LearningRate 0.0281 Epoch: 15 Global Step: 63530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:35,986-Speed 11500.92 samples/sec Loss 3.7964 LearningRate 0.0281 Epoch: 15 Global Step: 63540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:39,636-Speed 11223.47 samples/sec Loss 3.8048 LearningRate 0.0281 Epoch: 15 Global Step: 63550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:43,933-Speed 9532.93 samples/sec Loss 3.7924 LearningRate 0.0280 Epoch: 15 Global Step: 63560 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:25:47,551-Speed 11326.77 samples/sec Loss 3.7912 LearningRate 0.0280 Epoch: 15 Global Step: 63570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:50,966-Speed 11995.04 samples/sec Loss 3.7532 LearningRate 0.0280 Epoch: 15 Global Step: 63580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:54,491-Speed 11624.19 samples/sec Loss 3.8150 LearningRate 0.0279 Epoch: 15 Global Step: 63590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:25:58,052-Speed 11505.37 samples/sec Loss 3.7634 LearningRate 0.0279 Epoch: 15 Global Step: 63600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:01,740-Speed 11109.90 samples/sec Loss 3.8148 LearningRate 0.0279 Epoch: 15 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:05,242-Speed 11699.57 samples/sec Loss 3.8096 LearningRate 0.0279 Epoch: 15 Global Step: 63620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:08,968-Speed 10996.88 samples/sec Loss 3.8067 LearningRate 0.0278 Epoch: 15 Global Step: 63630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:12,598-Speed 11285.16 samples/sec Loss 3.7560 LearningRate 0.0278 Epoch: 15 Global Step: 63640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:16,234-Speed 11271.00 samples/sec Loss 3.7593 LearningRate 0.0278 Epoch: 15 Global Step: 63650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:19,998-Speed 10883.35 samples/sec Loss 3.8190 LearningRate 0.0278 Epoch: 15 Global Step: 63660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:23,411-Speed 12003.45 samples/sec Loss 3.8200 LearningRate 0.0277 Epoch: 15 Global Step: 63670 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:26:27,179-Speed 10875.56 samples/sec Loss 3.7731 LearningRate 0.0277 Epoch: 15 Global Step: 63680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:30,851-Speed 11154.78 samples/sec Loss 3.8100 LearningRate 0.0277 Epoch: 15 Global Step: 63690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:34,290-Speed 11913.20 samples/sec Loss 3.7845 LearningRate 0.0276 Epoch: 15 Global Step: 63700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:37,764-Speed 11793.51 samples/sec Loss 3.7702 LearningRate 0.0276 Epoch: 15 Global Step: 63710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:42,111-Speed 9425.79 samples/sec Loss 3.8151 LearningRate 0.0276 Epoch: 15 Global Step: 63720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:45,797-Speed 11116.40 samples/sec Loss 3.8255 LearningRate 0.0276 Epoch: 15 Global Step: 63730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:49,221-Speed 11964.38 samples/sec Loss 3.7896 LearningRate 0.0275 Epoch: 15 Global Step: 63740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:52,779-Speed 11517.49 samples/sec Loss 3.8107 LearningRate 0.0275 Epoch: 15 Global Step: 63750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:56,466-Speed 11108.99 samples/sec Loss 3.7839 LearningRate 0.0275 Epoch: 15 Global Step: 63760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:26:59,948-Speed 11768.12 samples/sec Loss 3.7389 LearningRate 0.0274 Epoch: 15 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:03,403-Speed 11858.53 samples/sec Loss 3.8130 LearningRate 0.0274 Epoch: 15 Global Step: 63780 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:27:07,285-Speed 10555.21 samples/sec Loss 3.7871 LearningRate 0.0274 Epoch: 15 Global Step: 63790 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:27:10,827-Speed 11567.32 samples/sec Loss 3.7930 LearningRate 0.0274 Epoch: 15 Global Step: 63800 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:27:14,252-Speed 11962.37 samples/sec Loss 3.7726 LearningRate 0.0273 Epoch: 15 Global Step: 63810 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:27:17,810-Speed 11514.69 samples/sec Loss 3.7465 LearningRate 0.0273 Epoch: 15 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:21,334-Speed 11626.38 samples/sec Loss 3.8330 LearningRate 0.0273 Epoch: 15 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:24,766-Speed 11940.44 samples/sec Loss 3.8150 LearningRate 0.0272 Epoch: 15 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:28,180-Speed 12000.64 samples/sec Loss 3.8196 LearningRate 0.0272 Epoch: 15 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:31,633-Speed 11865.66 samples/sec Loss 3.8177 LearningRate 0.0272 Epoch: 15 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:35,105-Speed 11799.87 samples/sec Loss 3.8188 LearningRate 0.0272 Epoch: 15 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:38,617-Speed 11666.30 samples/sec Loss 3.7793 LearningRate 0.0271 Epoch: 15 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:42,305-Speed 11106.59 samples/sec Loss 3.8338 LearningRate 0.0271 Epoch: 15 Global Step: 63890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:45,979-Speed 11153.48 samples/sec Loss 3.8151 LearningRate 0.0271 Epoch: 15 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:49,414-Speed 11926.15 samples/sec Loss 3.8349 LearningRate 0.0271 Epoch: 15 Global Step: 63910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:27:52,901-Speed 11749.84 samples/sec Loss 3.8299 LearningRate 0.0270 Epoch: 15 Global Step: 63920 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:27:57,152-Speed 9637.55 samples/sec Loss 3.8252 LearningRate 0.0270 Epoch: 15 Global Step: 63930 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:00,878-Speed 10997.46 samples/sec Loss 3.8424 LearningRate 0.0270 Epoch: 15 Global Step: 63940 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:04,353-Speed 11789.51 samples/sec Loss 3.7947 LearningRate 0.0269 Epoch: 15 Global Step: 63950 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:07,756-Speed 12038.04 samples/sec Loss 3.8320 LearningRate 0.0269 Epoch: 15 Global Step: 63960 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:11,397-Speed 11251.54 samples/sec Loss 3.8301 LearningRate 0.0269 Epoch: 15 Global Step: 63970 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:14,838-Speed 11908.65 samples/sec Loss 3.7916 LearningRate 0.0269 Epoch: 15 Global Step: 63980 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:18,265-Speed 11958.84 samples/sec Loss 3.8391 LearningRate 0.0268 Epoch: 15 Global Step: 63990 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:21,715-Speed 11877.78 samples/sec Loss 3.8709 LearningRate 0.0268 Epoch: 15 Global Step: 64000 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:25,276-Speed 11503.61 samples/sec Loss 3.8472 LearningRate 0.0268 Epoch: 15 Global Step: 64010 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:29,628-Speed 9414.20 samples/sec Loss 3.8057 LearningRate 0.0268 Epoch: 15 Global Step: 64020 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:33,123-Speed 11720.11 samples/sec Loss 3.8032 LearningRate 0.0267 Epoch: 15 Global Step: 64030 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:36,683-Speed 11510.30 samples/sec Loss 3.8148 LearningRate 0.0267 Epoch: 15 Global Step: 64040 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:28:40,222-Speed 11578.69 samples/sec Loss 3.8760 LearningRate 0.0267 Epoch: 15 Global Step: 64050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:28:43,740-Speed 11646.38 samples/sec Loss 3.9076 LearningRate 0.0266 Epoch: 15 Global Step: 64060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:28:47,143-Speed 12040.41 samples/sec Loss 3.8147 LearningRate 0.0266 Epoch: 15 Global Step: 64070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:28:50,670-Speed 11615.26 samples/sec Loss 3.8113 LearningRate 0.0266 Epoch: 15 Global Step: 64080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:28:54,390-Speed 11013.18 samples/sec Loss 3.8726 LearningRate 0.0266 Epoch: 15 Global Step: 64090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:28:58,077-Speed 11112.94 samples/sec Loss 3.8095 LearningRate 0.0265 Epoch: 15 Global Step: 64100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:01,546-Speed 11809.38 samples/sec Loss 3.8368 LearningRate 0.0265 Epoch: 15 Global Step: 64110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:05,055-Speed 11676.57 samples/sec Loss 3.8287 LearningRate 0.0265 Epoch: 15 Global Step: 64120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:08,525-Speed 11806.56 samples/sec Loss 3.7938 LearningRate 0.0264 Epoch: 15 Global Step: 64130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:12,115-Speed 11411.65 samples/sec Loss 3.8559 LearningRate 0.0264 Epoch: 15 Global Step: 64140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:15,688-Speed 11468.08 samples/sec Loss 3.8400 LearningRate 0.0264 Epoch: 15 Global Step: 64150 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:29:19,291-Speed 11371.65 samples/sec Loss 3.8218 LearningRate 0.0264 Epoch: 15 Global Step: 64160 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:29:23,008-Speed 11022.60 samples/sec Loss 3.8002 LearningRate 0.0263 Epoch: 15 Global Step: 64170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:26,586-Speed 11449.33 samples/sec Loss 3.8291 LearningRate 0.0263 Epoch: 15 Global Step: 64180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:30,094-Speed 11680.25 samples/sec Loss 3.8229 LearningRate 0.0263 Epoch: 15 Global Step: 64190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:33,671-Speed 11453.86 samples/sec Loss 3.8494 LearningRate 0.0263 Epoch: 15 Global Step: 64200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:37,115-Speed 11898.73 samples/sec Loss 3.8296 LearningRate 0.0262 Epoch: 15 Global Step: 64210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:40,826-Speed 11039.01 samples/sec Loss 3.8650 LearningRate 0.0262 Epoch: 15 Global Step: 64220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:44,467-Speed 11255.88 samples/sec Loss 3.8251 LearningRate 0.0262 Epoch: 15 Global Step: 64230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:48,107-Speed 11254.35 samples/sec Loss 3.8534 LearningRate 0.0261 Epoch: 15 Global Step: 64240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:51,767-Speed 11194.10 samples/sec Loss 3.8283 LearningRate 0.0261 Epoch: 15 Global Step: 64250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:55,142-Speed 12140.88 samples/sec Loss 3.8187 LearningRate 0.0261 Epoch: 15 Global Step: 64260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:29:59,055-Speed 10470.27 samples/sec Loss 3.8521 LearningRate 0.0261 Epoch: 15 Global Step: 64270 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:30:02,827-Speed 10872.36 samples/sec Loss 3.8421 LearningRate 0.0260 Epoch: 15 Global Step: 64280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:06,364-Speed 11584.12 samples/sec Loss 3.8350 LearningRate 0.0260 Epoch: 15 Global Step: 64290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:10,116-Speed 10916.71 samples/sec Loss 3.8333 LearningRate 0.0260 Epoch: 15 Global Step: 64300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:13,925-Speed 10758.38 samples/sec Loss 3.8529 LearningRate 0.0260 Epoch: 15 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:17,635-Speed 11043.68 samples/sec Loss 3.8988 LearningRate 0.0259 Epoch: 15 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:21,246-Speed 11343.91 samples/sec Loss 3.8319 LearningRate 0.0259 Epoch: 15 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:24,902-Speed 11207.93 samples/sec Loss 3.8324 LearningRate 0.0259 Epoch: 15 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:28,381-Speed 11776.30 samples/sec Loss 3.8505 LearningRate 0.0258 Epoch: 15 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:32,175-Speed 10800.98 samples/sec Loss 3.8364 LearningRate 0.0258 Epoch: 15 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:35,700-Speed 11622.92 samples/sec Loss 3.9034 LearningRate 0.0258 Epoch: 15 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:30:39,111-Speed 12010.26 samples/sec Loss 3.8445 LearningRate 0.0258 Epoch: 15 Global Step: 64380 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:30:42,518-Speed 12023.34 samples/sec Loss 3.8773 LearningRate 0.0257 Epoch: 15 Global Step: 64390 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:30:45,892-Speed 12146.61 samples/sec Loss 3.8376 LearningRate 0.0257 Epoch: 15 Global Step: 64400 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:30:49,285-Speed 12074.44 samples/sec Loss 3.8486 LearningRate 0.0257 Epoch: 15 Global Step: 64410 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:30:52,691-Speed 12028.37 samples/sec Loss 3.8192 LearningRate 0.0257 Epoch: 15 Global Step: 64420 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:30:56,548-Speed 10621.98 samples/sec Loss 3.9159 LearningRate 0.0256 Epoch: 15 Global Step: 64430 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:00,017-Speed 11812.86 samples/sec Loss 3.8559 LearningRate 0.0256 Epoch: 15 Global Step: 64440 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:03,637-Speed 11316.91 samples/sec Loss 3.8824 LearningRate 0.0256 Epoch: 15 Global Step: 64450 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:07,117-Speed 11775.03 samples/sec Loss 3.8574 LearningRate 0.0256 Epoch: 15 Global Step: 64460 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:10,541-Speed 11967.44 samples/sec Loss 3.8905 LearningRate 0.0255 Epoch: 15 Global Step: 64470 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:14,008-Speed 11815.99 samples/sec Loss 3.8216 LearningRate 0.0255 Epoch: 15 Global Step: 64480 Fp16 Grad Scale: 524288 Required: 2 hours Training: 2022-01-17 04:31:17,411-Speed 12037.69 samples/sec Loss 3.8933 LearningRate 0.0255 Epoch: 15 Global Step: 64490 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:20,876-Speed 11823.27 samples/sec Loss 3.8664 LearningRate 0.0254 Epoch: 15 Global Step: 64500 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:24,448-Speed 11469.14 samples/sec Loss 3.9059 LearningRate 0.0254 Epoch: 15 Global Step: 64510 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:28,006-Speed 11518.50 samples/sec Loss 3.8541 LearningRate 0.0254 Epoch: 15 Global Step: 64520 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:31,431-Speed 11959.98 samples/sec Loss 3.8600 LearningRate 0.0254 Epoch: 15 Global Step: 64530 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:34,799-Speed 12168.16 samples/sec Loss 3.8466 LearningRate 0.0253 Epoch: 15 Global Step: 64540 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:38,334-Speed 11589.25 samples/sec Loss 3.8556 LearningRate 0.0253 Epoch: 15 Global Step: 64550 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:41,809-Speed 11788.10 samples/sec Loss 3.8452 LearningRate 0.0253 Epoch: 15 Global Step: 64560 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:31:45,546-Speed 10964.01 samples/sec Loss 3.8598 LearningRate 0.0253 Epoch: 15 Global Step: 64570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:31:48,971-Speed 11964.36 samples/sec Loss 3.8561 LearningRate 0.0252 Epoch: 15 Global Step: 64580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:31:52,393-Speed 11970.46 samples/sec Loss 3.8725 LearningRate 0.0252 Epoch: 15 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:31:56,807-Speed 9281.76 samples/sec Loss 3.8365 LearningRate 0.0252 Epoch: 15 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:00,516-Speed 11047.33 samples/sec Loss 3.8550 LearningRate 0.0251 Epoch: 15 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:04,035-Speed 11642.93 samples/sec Loss 3.9036 LearningRate 0.0251 Epoch: 15 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:07,630-Speed 11395.29 samples/sec Loss 3.8584 LearningRate 0.0251 Epoch: 15 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:11,368-Speed 10960.58 samples/sec Loss 3.8655 LearningRate 0.0251 Epoch: 15 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:15,042-Speed 11151.20 samples/sec Loss 3.8449 LearningRate 0.0250 Epoch: 15 Global Step: 64650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:18,532-Speed 11741.50 samples/sec Loss 3.8636 LearningRate 0.0250 Epoch: 15 Global Step: 64660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:22,084-Speed 11532.10 samples/sec Loss 3.8409 LearningRate 0.0250 Epoch: 15 Global Step: 64670 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:32:25,692-Speed 11357.67 samples/sec Loss 3.8909 LearningRate 0.0250 Epoch: 15 Global Step: 64680 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:32:29,234-Speed 11565.90 samples/sec Loss 3.8706 LearningRate 0.0249 Epoch: 15 Global Step: 64690 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:32:32,909-Speed 11150.80 samples/sec Loss 3.8562 LearningRate 0.0249 Epoch: 15 Global Step: 64700 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:32:36,636-Speed 10991.11 samples/sec Loss 3.9205 LearningRate 0.0249 Epoch: 15 Global Step: 64710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:40,231-Speed 11397.65 samples/sec Loss 3.8453 LearningRate 0.0249 Epoch: 15 Global Step: 64720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:43,736-Speed 11688.45 samples/sec Loss 3.8289 LearningRate 0.0248 Epoch: 15 Global Step: 64730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:47,172-Speed 11925.18 samples/sec Loss 3.8463 LearningRate 0.0248 Epoch: 15 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:50,616-Speed 11895.45 samples/sec Loss 3.8590 LearningRate 0.0248 Epoch: 15 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:54,168-Speed 11534.53 samples/sec Loss 3.8277 LearningRate 0.0248 Epoch: 15 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:32:58,041-Speed 10578.63 samples/sec Loss 3.8666 LearningRate 0.0247 Epoch: 15 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:01,444-Speed 12041.70 samples/sec Loss 3.8848 LearningRate 0.0247 Epoch: 15 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:04,843-Speed 12053.95 samples/sec Loss 3.8572 LearningRate 0.0247 Epoch: 15 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:08,520-Speed 11142.35 samples/sec Loss 3.8389 LearningRate 0.0246 Epoch: 15 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:12,184-Speed 11180.61 samples/sec Loss 3.9036 LearningRate 0.0246 Epoch: 15 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:15,830-Speed 11236.60 samples/sec Loss 3.8268 LearningRate 0.0246 Epoch: 15 Global Step: 64820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:19,910-Speed 10043.86 samples/sec Loss 3.8531 LearningRate 0.0246 Epoch: 15 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:23,309-Speed 12053.01 samples/sec Loss 3.8996 LearningRate 0.0245 Epoch: 15 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:26,793-Speed 11758.87 samples/sec Loss 3.8663 LearningRate 0.0245 Epoch: 15 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:30,323-Speed 11605.92 samples/sec Loss 3.8746 LearningRate 0.0245 Epoch: 15 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:34,143-Speed 10726.68 samples/sec Loss 3.8239 LearningRate 0.0245 Epoch: 15 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:37,756-Speed 11338.68 samples/sec Loss 3.9207 LearningRate 0.0244 Epoch: 15 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:41,297-Speed 11571.18 samples/sec Loss 3.8293 LearningRate 0.0244 Epoch: 15 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:44,754-Speed 11851.97 samples/sec Loss 3.8667 LearningRate 0.0244 Epoch: 15 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:33:48,326-Speed 11470.16 samples/sec Loss 3.8631 LearningRate 0.0244 Epoch: 15 Global Step: 64910 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:33:51,968-Speed 11250.23 samples/sec Loss 3.8890 LearningRate 0.0243 Epoch: 15 Global Step: 64920 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:33:55,465-Speed 11717.38 samples/sec Loss 3.8844 LearningRate 0.0243 Epoch: 15 Global Step: 64930 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:33:59,278-Speed 10743.13 samples/sec Loss 3.8256 LearningRate 0.0243 Epoch: 15 Global Step: 64940 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:34:02,911-Speed 11279.35 samples/sec Loss 3.8869 LearningRate 0.0242 Epoch: 15 Global Step: 64950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:34:06,397-Speed 11749.46 samples/sec Loss 3.8478 LearningRate 0.0242 Epoch: 15 Global Step: 64960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:34:09,825-Speed 11953.17 samples/sec Loss 3.8681 LearningRate 0.0242 Epoch: 15 Global Step: 64970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:34:13,723-Speed 10510.18 samples/sec Loss 3.9011 LearningRate 0.0242 Epoch: 15 Global Step: 64980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:34:17,171-Speed 11883.02 samples/sec Loss 3.8292 LearningRate 0.0241 Epoch: 15 Global Step: 64990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:34:21,814-Speed 8823.19 samples/sec Loss 3.8886 LearningRate 0.0241 Epoch: 15 Global Step: 65000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:34:42,996-[lfw][65000]XNorm: 7.396431 Training: 2022-01-17 04:34:42,997-[lfw][65000]Accuracy-Flip: 0.99733+-0.00281 Training: 2022-01-17 04:34:42,997-[lfw][65000]Accuracy-Highest: 0.99733 Training: 2022-01-17 04:35:07,445-[cfp_fp][65000]XNorm: 6.308295 Training: 2022-01-17 04:35:07,446-[cfp_fp][65000]Accuracy-Flip: 0.97300+-0.00846 Training: 2022-01-17 04:35:07,446-[cfp_fp][65000]Accuracy-Highest: 0.97300 Training: 2022-01-17 04:35:28,556-[agedb_30][65000]XNorm: 7.055578 Training: 2022-01-17 04:35:28,557-[agedb_30][65000]Accuracy-Flip: 0.96817+-0.00555 Training: 2022-01-17 04:35:28,557-[agedb_30][65000]Accuracy-Highest: 0.96933 Training: 2022-01-17 04:35:31,979-Speed 583.78 samples/sec Loss 3.9028 LearningRate 0.0241 Epoch: 15 Global Step: 65010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:35,291-Speed 12372.10 samples/sec Loss 3.8420 LearningRate 0.0241 Epoch: 15 Global Step: 65020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:38,664-Speed 12147.37 samples/sec Loss 3.8781 LearningRate 0.0240 Epoch: 15 Global Step: 65030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:42,026-Speed 12185.16 samples/sec Loss 3.8766 LearningRate 0.0240 Epoch: 15 Global Step: 65040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:45,494-Speed 11813.66 samples/sec Loss 3.8309 LearningRate 0.0240 Epoch: 15 Global Step: 65050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:49,151-Speed 11202.60 samples/sec Loss 3.8519 LearningRate 0.0240 Epoch: 15 Global Step: 65060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:52,662-Speed 11668.77 samples/sec Loss 3.8671 LearningRate 0.0239 Epoch: 15 Global Step: 65070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:56,218-Speed 11523.89 samples/sec Loss 3.8794 LearningRate 0.0239 Epoch: 15 Global Step: 65080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:35:59,724-Speed 11685.56 samples/sec Loss 3.8763 LearningRate 0.0239 Epoch: 15 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:36:03,211-Speed 11751.16 samples/sec Loss 3.8857 LearningRate 0.0239 Epoch: 15 Global Step: 65100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:36:06,790-Speed 11446.79 samples/sec Loss 3.8867 LearningRate 0.0238 Epoch: 15 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:36:10,301-Speed 11666.46 samples/sec Loss 3.8742 LearningRate 0.0238 Epoch: 15 Global Step: 65120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:36:13,839-Speed 11579.28 samples/sec Loss 3.8439 LearningRate 0.0238 Epoch: 15 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:17,287-Speed 11885.30 samples/sec Loss 3.8703 LearningRate 0.0238 Epoch: 15 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:20,754-Speed 11817.47 samples/sec Loss 3.8630 LearningRate 0.0237 Epoch: 15 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:24,417-Speed 11185.94 samples/sec Loss 3.8744 LearningRate 0.0237 Epoch: 15 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:28,465-Speed 10119.43 samples/sec Loss 3.8708 LearningRate 0.0237 Epoch: 15 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:31,911-Speed 11890.70 samples/sec Loss 3.8677 LearningRate 0.0236 Epoch: 15 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:35,438-Speed 11614.47 samples/sec Loss 3.8656 LearningRate 0.0236 Epoch: 15 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:39,014-Speed 11458.20 samples/sec Loss 3.8985 LearningRate 0.0236 Epoch: 15 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:42,691-Speed 11142.69 samples/sec Loss 3.9273 LearningRate 0.0236 Epoch: 15 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:46,337-Speed 11236.71 samples/sec Loss 3.9276 LearningRate 0.0235 Epoch: 15 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-17 04:36:49,794-Speed 11849.75 samples/sec Loss 3.8454 LearningRate 0.0235 Epoch: 15 Global Step: 65230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:36:53,212-Speed 11988.91 samples/sec Loss 3.8307 LearningRate 0.0235 Epoch: 15 Global Step: 65240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:36:56,664-Speed 11865.61 samples/sec Loss 3.9011 LearningRate 0.0235 Epoch: 15 Global Step: 65250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:00,191-Speed 11617.90 samples/sec Loss 3.8862 LearningRate 0.0234 Epoch: 15 Global Step: 65260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:03,809-Speed 11324.38 samples/sec Loss 3.8328 LearningRate 0.0234 Epoch: 15 Global Step: 65270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:07,240-Speed 11941.39 samples/sec Loss 3.8272 LearningRate 0.0234 Epoch: 15 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:10,777-Speed 11581.51 samples/sec Loss 3.8472 LearningRate 0.0234 Epoch: 15 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:14,221-Speed 11896.95 samples/sec Loss 3.8842 LearningRate 0.0233 Epoch: 15 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:18,244-Speed 10186.27 samples/sec Loss 3.8889 LearningRate 0.0233 Epoch: 15 Global Step: 65310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:21,764-Speed 11638.75 samples/sec Loss 3.8675 LearningRate 0.0233 Epoch: 15 Global Step: 65320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:25,295-Speed 11602.50 samples/sec Loss 3.8555 LearningRate 0.0233 Epoch: 15 Global Step: 65330 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:37:29,453-Speed 9851.56 samples/sec Loss 3.8652 LearningRate 0.0232 Epoch: 15 Global Step: 65340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:33,000-Speed 11552.04 samples/sec Loss 3.8748 LearningRate 0.0232 Epoch: 15 Global Step: 65350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:36,636-Speed 11268.42 samples/sec Loss 3.8708 LearningRate 0.0232 Epoch: 15 Global Step: 65360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:40,355-Speed 11015.43 samples/sec Loss 3.8625 LearningRate 0.0232 Epoch: 15 Global Step: 65370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:44,043-Speed 11110.97 samples/sec Loss 3.9014 LearningRate 0.0231 Epoch: 15 Global Step: 65380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:47,593-Speed 11540.48 samples/sec Loss 3.8908 LearningRate 0.0231 Epoch: 15 Global Step: 65390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:51,119-Speed 11621.00 samples/sec Loss 3.8641 LearningRate 0.0231 Epoch: 15 Global Step: 65400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:54,825-Speed 11054.11 samples/sec Loss 3.9214 LearningRate 0.0231 Epoch: 15 Global Step: 65410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:37:58,210-Speed 12106.12 samples/sec Loss 3.8985 LearningRate 0.0230 Epoch: 15 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:01,848-Speed 11259.62 samples/sec Loss 3.8549 LearningRate 0.0230 Epoch: 15 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:05,561-Speed 11035.56 samples/sec Loss 3.8310 LearningRate 0.0230 Epoch: 15 Global Step: 65440 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:38:09,622-Speed 10087.68 samples/sec Loss 3.8706 LearningRate 0.0230 Epoch: 15 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:12,999-Speed 12132.25 samples/sec Loss 3.8688 LearningRate 0.0229 Epoch: 15 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:16,438-Speed 11913.41 samples/sec Loss 3.8492 LearningRate 0.0229 Epoch: 15 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:20,005-Speed 11488.41 samples/sec Loss 3.8632 LearningRate 0.0229 Epoch: 15 Global Step: 65480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:23,643-Speed 11260.34 samples/sec Loss 3.8469 LearningRate 0.0229 Epoch: 15 Global Step: 65490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:27,655-Speed 10212.28 samples/sec Loss 3.8756 LearningRate 0.0228 Epoch: 15 Global Step: 65500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:32,019-Speed 9389.32 samples/sec Loss 3.8527 LearningRate 0.0228 Epoch: 15 Global Step: 65510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:35,446-Speed 11954.15 samples/sec Loss 3.8541 LearningRate 0.0228 Epoch: 15 Global Step: 65520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:39,229-Speed 10832.04 samples/sec Loss 3.8636 LearningRate 0.0228 Epoch: 15 Global Step: 65530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:43,224-Speed 10254.65 samples/sec Loss 3.8628 LearningRate 0.0227 Epoch: 15 Global Step: 65540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:38:46,806-Speed 11441.56 samples/sec Loss 3.8577 LearningRate 0.0227 Epoch: 15 Global Step: 65550 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:38:50,203-Speed 12061.23 samples/sec Loss 3.8628 LearningRate 0.0227 Epoch: 15 Global Step: 65560 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:38:53,707-Speed 11693.08 samples/sec Loss 3.8615 LearningRate 0.0227 Epoch: 15 Global Step: 65570 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:38:57,238-Speed 11604.88 samples/sec Loss 3.9048 LearningRate 0.0226 Epoch: 15 Global Step: 65580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:00,789-Speed 11535.62 samples/sec Loss 3.8433 LearningRate 0.0226 Epoch: 15 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:04,390-Speed 11378.46 samples/sec Loss 3.8965 LearningRate 0.0226 Epoch: 15 Global Step: 65600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:08,171-Speed 10834.88 samples/sec Loss 3.8734 LearningRate 0.0225 Epoch: 15 Global Step: 65610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:11,691-Speed 11638.14 samples/sec Loss 3.8478 LearningRate 0.0225 Epoch: 15 Global Step: 65620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:15,412-Speed 11012.36 samples/sec Loss 3.8852 LearningRate 0.0225 Epoch: 15 Global Step: 65630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:18,977-Speed 11493.85 samples/sec Loss 3.8692 LearningRate 0.0225 Epoch: 15 Global Step: 65640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:22,836-Speed 10614.07 samples/sec Loss 3.8627 LearningRate 0.0224 Epoch: 15 Global Step: 65650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:26,316-Speed 11772.75 samples/sec Loss 3.8624 LearningRate 0.0224 Epoch: 15 Global Step: 65660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:29,838-Speed 11633.18 samples/sec Loss 3.9229 LearningRate 0.0224 Epoch: 15 Global Step: 65670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:34,166-Speed 9466.79 samples/sec Loss 3.9148 LearningRate 0.0224 Epoch: 15 Global Step: 65680 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:39:38,289-Speed 9936.98 samples/sec Loss 3.8439 LearningRate 0.0223 Epoch: 15 Global Step: 65690 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:39:41,851-Speed 11501.31 samples/sec Loss 3.8646 LearningRate 0.0223 Epoch: 15 Global Step: 65700 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:39:45,422-Speed 11472.38 samples/sec Loss 3.8496 LearningRate 0.0223 Epoch: 15 Global Step: 65710 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:39:48,954-Speed 11601.27 samples/sec Loss 3.8994 LearningRate 0.0223 Epoch: 15 Global Step: 65720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:52,413-Speed 11844.50 samples/sec Loss 3.8835 LearningRate 0.0222 Epoch: 15 Global Step: 65730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:56,083-Speed 11163.90 samples/sec Loss 3.8563 LearningRate 0.0222 Epoch: 15 Global Step: 65740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:39:59,753-Speed 11162.14 samples/sec Loss 3.8643 LearningRate 0.0222 Epoch: 15 Global Step: 65750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:03,303-Speed 11540.95 samples/sec Loss 3.8630 LearningRate 0.0222 Epoch: 15 Global Step: 65760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:06,988-Speed 11117.47 samples/sec Loss 3.9032 LearningRate 0.0221 Epoch: 15 Global Step: 65770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:10,553-Speed 11493.34 samples/sec Loss 3.9058 LearningRate 0.0221 Epoch: 15 Global Step: 65780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:14,079-Speed 11619.51 samples/sec Loss 3.9138 LearningRate 0.0221 Epoch: 15 Global Step: 65790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:17,504-Speed 11962.34 samples/sec Loss 3.8538 LearningRate 0.0221 Epoch: 15 Global Step: 65800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:21,407-Speed 10496.91 samples/sec Loss 3.8653 LearningRate 0.0220 Epoch: 15 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:24,798-Speed 12080.43 samples/sec Loss 3.8595 LearningRate 0.0220 Epoch: 15 Global Step: 65820 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:40:28,290-Speed 11735.67 samples/sec Loss 3.8704 LearningRate 0.0220 Epoch: 15 Global Step: 65830 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:40:31,807-Speed 11646.43 samples/sec Loss 3.8552 LearningRate 0.0220 Epoch: 15 Global Step: 65840 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:40:36,540-Speed 8659.23 samples/sec Loss 3.8715 LearningRate 0.0219 Epoch: 15 Global Step: 65850 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:40:40,146-Speed 11359.27 samples/sec Loss 3.8968 LearningRate 0.0219 Epoch: 15 Global Step: 65860 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:40:43,861-Speed 11030.01 samples/sec Loss 3.8594 LearningRate 0.0219 Epoch: 15 Global Step: 65870 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:40:47,372-Speed 11668.83 samples/sec Loss 3.9362 LearningRate 0.0219 Epoch: 15 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:51,082-Speed 11041.16 samples/sec Loss 3.8591 LearningRate 0.0218 Epoch: 15 Global Step: 65890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:54,826-Speed 10944.29 samples/sec Loss 3.9085 LearningRate 0.0218 Epoch: 15 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:40:58,449-Speed 11308.80 samples/sec Loss 3.8718 LearningRate 0.0218 Epoch: 15 Global Step: 65910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:02,117-Speed 11169.03 samples/sec Loss 3.8535 LearningRate 0.0218 Epoch: 15 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:06,008-Speed 10530.87 samples/sec Loss 3.9175 LearningRate 0.0217 Epoch: 15 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:09,859-Speed 10636.60 samples/sec Loss 3.8539 LearningRate 0.0217 Epoch: 15 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:13,455-Speed 11461.22 samples/sec Loss 3.8592 LearningRate 0.0217 Epoch: 15 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:16,983-Speed 11614.49 samples/sec Loss 3.8821 LearningRate 0.0217 Epoch: 15 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:20,401-Speed 11985.38 samples/sec Loss 3.8673 LearningRate 0.0216 Epoch: 15 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:23,848-Speed 11885.51 samples/sec Loss 3.8489 LearningRate 0.0216 Epoch: 15 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:27,257-Speed 12022.42 samples/sec Loss 3.8812 LearningRate 0.0216 Epoch: 15 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:30,788-Speed 11600.43 samples/sec Loss 3.8676 LearningRate 0.0216 Epoch: 15 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:34,494-Speed 11056.74 samples/sec Loss 3.8570 LearningRate 0.0215 Epoch: 15 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:38,296-Speed 10776.97 samples/sec Loss 3.8667 LearningRate 0.0215 Epoch: 15 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:41,895-Speed 11383.10 samples/sec Loss 3.8558 LearningRate 0.0215 Epoch: 15 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:45,559-Speed 11181.05 samples/sec Loss 3.8727 LearningRate 0.0215 Epoch: 15 Global Step: 66040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:49,432-Speed 10577.63 samples/sec Loss 3.8784 LearningRate 0.0214 Epoch: 15 Global Step: 66050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:53,247-Speed 10739.49 samples/sec Loss 3.8846 LearningRate 0.0214 Epoch: 15 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:41:57,302-Speed 10104.58 samples/sec Loss 3.9424 LearningRate 0.0214 Epoch: 15 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:00,766-Speed 11826.46 samples/sec Loss 3.8381 LearningRate 0.0214 Epoch: 15 Global Step: 66080 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:42:04,327-Speed 11507.71 samples/sec Loss 3.8363 LearningRate 0.0214 Epoch: 15 Global Step: 66090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:07,746-Speed 11981.45 samples/sec Loss 3.8629 LearningRate 0.0213 Epoch: 15 Global Step: 66100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:11,341-Speed 11394.70 samples/sec Loss 3.8944 LearningRate 0.0213 Epoch: 15 Global Step: 66110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:14,812-Speed 11804.99 samples/sec Loss 3.9064 LearningRate 0.0213 Epoch: 15 Global Step: 66120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:18,521-Speed 11049.15 samples/sec Loss 3.8311 LearningRate 0.0213 Epoch: 15 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:22,104-Speed 11435.19 samples/sec Loss 3.8533 LearningRate 0.0212 Epoch: 15 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:25,927-Speed 10714.45 samples/sec Loss 3.8551 LearningRate 0.0212 Epoch: 15 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:29,402-Speed 11790.64 samples/sec Loss 3.8762 LearningRate 0.0212 Epoch: 15 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:32,854-Speed 11869.14 samples/sec Loss 3.8820 LearningRate 0.0212 Epoch: 15 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:36,339-Speed 11756.60 samples/sec Loss 3.8554 LearningRate 0.0211 Epoch: 15 Global Step: 66180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:42:40,253-Speed 10469.28 samples/sec Loss 3.8580 LearningRate 0.0211 Epoch: 15 Global Step: 66190 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:42:43,701-Speed 11881.08 samples/sec Loss 3.8722 LearningRate 0.0211 Epoch: 15 Global Step: 66200 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:42:47,317-Speed 11331.94 samples/sec Loss 3.8650 LearningRate 0.0211 Epoch: 15 Global Step: 66210 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:42:51,020-Speed 11063.07 samples/sec Loss 3.9005 LearningRate 0.0210 Epoch: 15 Global Step: 66220 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:42:54,616-Speed 11393.68 samples/sec Loss 3.8582 LearningRate 0.0210 Epoch: 15 Global Step: 66230 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:42:58,063-Speed 11888.45 samples/sec Loss 3.8677 LearningRate 0.0210 Epoch: 15 Global Step: 66240 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:43:01,861-Speed 10786.13 samples/sec Loss 3.8583 LearningRate 0.0210 Epoch: 15 Global Step: 66250 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:43:05,719-Speed 10620.46 samples/sec Loss 3.8526 LearningRate 0.0209 Epoch: 15 Global Step: 66260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:09,230-Speed 11669.49 samples/sec Loss 3.8622 LearningRate 0.0209 Epoch: 15 Global Step: 66270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:13,361-Speed 9919.51 samples/sec Loss 3.8661 LearningRate 0.0209 Epoch: 15 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:17,290-Speed 10424.85 samples/sec Loss 3.8752 LearningRate 0.0209 Epoch: 15 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:20,777-Speed 11751.60 samples/sec Loss 3.8737 LearningRate 0.0208 Epoch: 15 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:24,257-Speed 11773.46 samples/sec Loss 3.9171 LearningRate 0.0208 Epoch: 15 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:27,658-Speed 12049.32 samples/sec Loss 3.8916 LearningRate 0.0208 Epoch: 15 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:31,122-Speed 11827.93 samples/sec Loss 3.8760 LearningRate 0.0208 Epoch: 15 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:34,652-Speed 11605.37 samples/sec Loss 3.8583 LearningRate 0.0207 Epoch: 15 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:38,208-Speed 11520.94 samples/sec Loss 3.8545 LearningRate 0.0207 Epoch: 15 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:43:42,131-Speed 10442.37 samples/sec Loss 3.8590 LearningRate 0.0207 Epoch: 15 Global Step: 66360 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:43:45,581-Speed 11874.71 samples/sec Loss 3.8376 LearningRate 0.0207 Epoch: 15 Global Step: 66370 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:43:49,358-Speed 10849.48 samples/sec Loss 3.8814 LearningRate 0.0206 Epoch: 15 Global Step: 66380 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:43:52,830-Speed 11800.99 samples/sec Loss 3.8994 LearningRate 0.0206 Epoch: 15 Global Step: 66390 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:43:56,387-Speed 11520.07 samples/sec Loss 3.8701 LearningRate 0.0206 Epoch: 15 Global Step: 66400 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:43:59,805-Speed 11987.06 samples/sec Loss 3.8988 LearningRate 0.0206 Epoch: 15 Global Step: 66410 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:44:03,251-Speed 11891.70 samples/sec Loss 3.8491 LearningRate 0.0205 Epoch: 15 Global Step: 66420 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:44:06,970-Speed 11015.85 samples/sec Loss 3.8317 LearningRate 0.0205 Epoch: 15 Global Step: 66430 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:44:10,779-Speed 10756.49 samples/sec Loss 3.8372 LearningRate 0.0205 Epoch: 15 Global Step: 66440 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:44:14,209-Speed 11943.59 samples/sec Loss 3.8508 LearningRate 0.0205 Epoch: 15 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:17,733-Speed 11624.12 samples/sec Loss 3.8972 LearningRate 0.0205 Epoch: 15 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:21,344-Speed 11347.03 samples/sec Loss 3.8719 LearningRate 0.0204 Epoch: 15 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:25,022-Speed 11138.08 samples/sec Loss 3.8829 LearningRate 0.0204 Epoch: 15 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:28,565-Speed 11565.96 samples/sec Loss 3.8585 LearningRate 0.0204 Epoch: 15 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:32,175-Speed 11348.37 samples/sec Loss 3.8576 LearningRate 0.0204 Epoch: 15 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:35,740-Speed 11494.59 samples/sec Loss 3.8650 LearningRate 0.0203 Epoch: 15 Global Step: 66510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:39,128-Speed 12096.43 samples/sec Loss 3.8503 LearningRate 0.0203 Epoch: 15 Global Step: 66520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:42,919-Speed 10805.98 samples/sec Loss 3.8005 LearningRate 0.0203 Epoch: 15 Global Step: 66530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:46,937-Speed 10196.55 samples/sec Loss 3.7800 LearningRate 0.0203 Epoch: 15 Global Step: 66540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:44:50,563-Speed 11300.67 samples/sec Loss 3.8589 LearningRate 0.0202 Epoch: 15 Global Step: 66550 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:44:54,077-Speed 11656.72 samples/sec Loss 3.8570 LearningRate 0.0202 Epoch: 15 Global Step: 66560 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:44:57,492-Speed 11997.59 samples/sec Loss 3.8503 LearningRate 0.0202 Epoch: 15 Global Step: 66570 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:45:01,060-Speed 11483.85 samples/sec Loss 3.8411 LearningRate 0.0202 Epoch: 15 Global Step: 66580 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:45:04,449-Speed 12089.45 samples/sec Loss 3.8599 LearningRate 0.0201 Epoch: 15 Global Step: 66590 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:45:07,840-Speed 12081.45 samples/sec Loss 3.8580 LearningRate 0.0201 Epoch: 15 Global Step: 66600 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:45:11,266-Speed 11959.34 samples/sec Loss 3.8116 LearningRate 0.0201 Epoch: 15 Global Step: 66610 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:45:14,720-Speed 11862.89 samples/sec Loss 3.8756 LearningRate 0.0201 Epoch: 15 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:18,644-Speed 10441.41 samples/sec Loss 3.8360 LearningRate 0.0200 Epoch: 15 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:22,043-Speed 12052.47 samples/sec Loss 3.8522 LearningRate 0.0200 Epoch: 15 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:25,511-Speed 11814.37 samples/sec Loss 3.9026 LearningRate 0.0200 Epoch: 15 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:28,977-Speed 11822.67 samples/sec Loss 3.8886 LearningRate 0.0200 Epoch: 15 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:32,416-Speed 11912.54 samples/sec Loss 3.8638 LearningRate 0.0199 Epoch: 15 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:35,863-Speed 11888.55 samples/sec Loss 3.8512 LearningRate 0.0199 Epoch: 15 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:39,421-Speed 11511.99 samples/sec Loss 3.8982 LearningRate 0.0199 Epoch: 15 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:43,310-Speed 10536.96 samples/sec Loss 3.8895 LearningRate 0.0199 Epoch: 15 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:46,791-Speed 11767.37 samples/sec Loss 3.8313 LearningRate 0.0199 Epoch: 15 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:45:50,416-Speed 11306.59 samples/sec Loss 3.8536 LearningRate 0.0198 Epoch: 15 Global Step: 66720 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:45:54,078-Speed 11188.07 samples/sec Loss 3.8665 LearningRate 0.0198 Epoch: 15 Global Step: 66730 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:45:57,705-Speed 11295.33 samples/sec Loss 3.8581 LearningRate 0.0198 Epoch: 15 Global Step: 66740 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:46:01,928-Speed 9699.45 samples/sec Loss 3.8313 LearningRate 0.0198 Epoch: 15 Global Step: 66750 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:46:37,015-Speed 1167.44 samples/sec Loss 3.4174 LearningRate 0.0197 Epoch: 16 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:46:40,833-Speed 10734.11 samples/sec Loss 3.3262 LearningRate 0.0197 Epoch: 16 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:46:45,228-Speed 9320.78 samples/sec Loss 3.3017 LearningRate 0.0197 Epoch: 16 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:46:48,918-Speed 11105.03 samples/sec Loss 3.3063 LearningRate 0.0197 Epoch: 16 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:46:52,903-Speed 10282.12 samples/sec Loss 3.2768 LearningRate 0.0196 Epoch: 16 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:46:56,682-Speed 10839.00 samples/sec Loss 3.2827 LearningRate 0.0196 Epoch: 16 Global Step: 66810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:47:00,335-Speed 11216.14 samples/sec Loss 3.2966 LearningRate 0.0196 Epoch: 16 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:47:03,970-Speed 11271.51 samples/sec Loss 3.3440 LearningRate 0.0196 Epoch: 16 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:47:07,771-Speed 10780.69 samples/sec Loss 3.3155 LearningRate 0.0195 Epoch: 16 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:47:11,503-Speed 10978.46 samples/sec Loss 3.3457 LearningRate 0.0195 Epoch: 16 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:47:15,230-Speed 10991.66 samples/sec Loss 3.2485 LearningRate 0.0195 Epoch: 16 Global Step: 66860 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:19,108-Speed 10565.47 samples/sec Loss 3.3401 LearningRate 0.0195 Epoch: 16 Global Step: 66870 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:23,340-Speed 9681.06 samples/sec Loss 3.3456 LearningRate 0.0195 Epoch: 16 Global Step: 66880 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:26,812-Speed 11800.04 samples/sec Loss 3.3128 LearningRate 0.0194 Epoch: 16 Global Step: 66890 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:30,310-Speed 11713.90 samples/sec Loss 3.3809 LearningRate 0.0194 Epoch: 16 Global Step: 66900 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:33,916-Speed 11359.78 samples/sec Loss 3.3555 LearningRate 0.0194 Epoch: 16 Global Step: 66910 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:37,696-Speed 10839.03 samples/sec Loss 3.3422 LearningRate 0.0194 Epoch: 16 Global Step: 66920 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:41,266-Speed 11476.93 samples/sec Loss 3.3083 LearningRate 0.0193 Epoch: 16 Global Step: 66930 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:45,051-Speed 10823.48 samples/sec Loss 3.3301 LearningRate 0.0193 Epoch: 16 Global Step: 66940 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:48,592-Speed 11571.90 samples/sec Loss 3.3778 LearningRate 0.0193 Epoch: 16 Global Step: 66950 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:51,994-Speed 12041.97 samples/sec Loss 3.3499 LearningRate 0.0193 Epoch: 16 Global Step: 66960 Fp16 Grad Scale: 524288 Required: 2 hours Training: 2022-01-17 04:47:55,530-Speed 11586.83 samples/sec Loss 3.3226 LearningRate 0.0192 Epoch: 16 Global Step: 66970 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:47:59,081-Speed 11539.32 samples/sec Loss 3.3344 LearningRate 0.0192 Epoch: 16 Global Step: 66980 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:48:02,692-Speed 11348.13 samples/sec Loss 3.3614 LearningRate 0.0192 Epoch: 16 Global Step: 66990 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:48:06,170-Speed 11776.23 samples/sec Loss 3.3953 LearningRate 0.0192 Epoch: 16 Global Step: 67000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:09,746-Speed 11458.54 samples/sec Loss 3.3576 LearningRate 0.0191 Epoch: 16 Global Step: 67010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:13,192-Speed 11888.04 samples/sec Loss 3.3536 LearningRate 0.0191 Epoch: 16 Global Step: 67020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:16,932-Speed 10954.67 samples/sec Loss 3.3567 LearningRate 0.0191 Epoch: 16 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:20,427-Speed 11727.64 samples/sec Loss 3.3148 LearningRate 0.0191 Epoch: 16 Global Step: 67040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:23,870-Speed 11896.63 samples/sec Loss 3.3519 LearningRate 0.0191 Epoch: 16 Global Step: 67050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:27,562-Speed 11097.96 samples/sec Loss 3.3806 LearningRate 0.0190 Epoch: 16 Global Step: 67060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:31,048-Speed 11752.54 samples/sec Loss 3.3769 LearningRate 0.0190 Epoch: 16 Global Step: 67070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:34,722-Speed 11152.86 samples/sec Loss 3.3905 LearningRate 0.0190 Epoch: 16 Global Step: 67080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:38,446-Speed 11002.57 samples/sec Loss 3.3416 LearningRate 0.0190 Epoch: 16 Global Step: 67090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:48:42,223-Speed 10848.10 samples/sec Loss 3.3585 LearningRate 0.0189 Epoch: 16 Global Step: 67100 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:48:45,747-Speed 11624.31 samples/sec Loss 3.3970 LearningRate 0.0189 Epoch: 16 Global Step: 67110 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:48:49,230-Speed 11763.39 samples/sec Loss 3.3577 LearningRate 0.0189 Epoch: 16 Global Step: 67120 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:48:52,865-Speed 11272.08 samples/sec Loss 3.3579 LearningRate 0.0189 Epoch: 16 Global Step: 67130 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:48:56,584-Speed 11016.84 samples/sec Loss 3.3666 LearningRate 0.0188 Epoch: 16 Global Step: 67140 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:00,273-Speed 11122.32 samples/sec Loss 3.3830 LearningRate 0.0188 Epoch: 16 Global Step: 67150 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:03,894-Speed 11315.48 samples/sec Loss 3.4121 LearningRate 0.0188 Epoch: 16 Global Step: 67160 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:07,379-Speed 11755.44 samples/sec Loss 3.3650 LearningRate 0.0188 Epoch: 16 Global Step: 67170 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:11,677-Speed 9532.88 samples/sec Loss 3.3566 LearningRate 0.0188 Epoch: 16 Global Step: 67180 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:15,283-Speed 11363.71 samples/sec Loss 3.3924 LearningRate 0.0187 Epoch: 16 Global Step: 67190 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:19,000-Speed 11021.79 samples/sec Loss 3.3659 LearningRate 0.0187 Epoch: 16 Global Step: 67200 Fp16 Grad Scale: 524288 Required: 2 hours Training: 2022-01-17 04:49:22,598-Speed 11386.50 samples/sec Loss 3.4338 LearningRate 0.0187 Epoch: 16 Global Step: 67210 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:26,198-Speed 11378.84 samples/sec Loss 3.3735 LearningRate 0.0187 Epoch: 16 Global Step: 67220 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:30,338-Speed 9896.34 samples/sec Loss 3.3566 LearningRate 0.0186 Epoch: 16 Global Step: 67230 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:33,902-Speed 11496.76 samples/sec Loss 3.4034 LearningRate 0.0186 Epoch: 16 Global Step: 67240 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:37,607-Speed 11059.12 samples/sec Loss 3.3931 LearningRate 0.0186 Epoch: 16 Global Step: 67250 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:41,313-Speed 11052.63 samples/sec Loss 3.4232 LearningRate 0.0186 Epoch: 16 Global Step: 67260 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:49:44,854-Speed 11572.04 samples/sec Loss 3.3881 LearningRate 0.0185 Epoch: 16 Global Step: 67270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:49:48,748-Speed 10522.31 samples/sec Loss 3.4236 LearningRate 0.0185 Epoch: 16 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:49:52,362-Speed 11336.99 samples/sec Loss 3.4405 LearningRate 0.0185 Epoch: 16 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:49:55,898-Speed 11586.27 samples/sec Loss 3.4155 LearningRate 0.0185 Epoch: 16 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:49:59,678-Speed 10837.48 samples/sec Loss 3.4271 LearningRate 0.0185 Epoch: 16 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:03,373-Speed 11089.49 samples/sec Loss 3.4079 LearningRate 0.0184 Epoch: 16 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:06,895-Speed 11633.62 samples/sec Loss 3.4158 LearningRate 0.0184 Epoch: 16 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:10,414-Speed 11644.09 samples/sec Loss 3.4065 LearningRate 0.0184 Epoch: 16 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:14,074-Speed 11194.74 samples/sec Loss 3.4498 LearningRate 0.0184 Epoch: 16 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:17,805-Speed 10979.67 samples/sec Loss 3.4319 LearningRate 0.0183 Epoch: 16 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:21,558-Speed 10915.65 samples/sec Loss 3.4625 LearningRate 0.0183 Epoch: 16 Global Step: 67370 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:50:25,137-Speed 11449.33 samples/sec Loss 3.4543 LearningRate 0.0183 Epoch: 16 Global Step: 67380 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:50:28,757-Speed 11315.03 samples/sec Loss 3.4254 LearningRate 0.0183 Epoch: 16 Global Step: 67390 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:50:32,311-Speed 11529.76 samples/sec Loss 3.4620 LearningRate 0.0182 Epoch: 16 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:35,737-Speed 11962.13 samples/sec Loss 3.4690 LearningRate 0.0182 Epoch: 16 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:39,233-Speed 11719.53 samples/sec Loss 3.4755 LearningRate 0.0182 Epoch: 16 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:42,883-Speed 11225.14 samples/sec Loss 3.4499 LearningRate 0.0182 Epoch: 16 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:46,366-Speed 11761.02 samples/sec Loss 3.4853 LearningRate 0.0182 Epoch: 16 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:49,966-Speed 11382.28 samples/sec Loss 3.4562 LearningRate 0.0181 Epoch: 16 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:53,403-Speed 11920.61 samples/sec Loss 3.4345 LearningRate 0.0181 Epoch: 16 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:50:56,955-Speed 11533.37 samples/sec Loss 3.4296 LearningRate 0.0181 Epoch: 16 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:00,596-Speed 11253.13 samples/sec Loss 3.4571 LearningRate 0.0181 Epoch: 16 Global Step: 67480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:04,384-Speed 10817.09 samples/sec Loss 3.4805 LearningRate 0.0180 Epoch: 16 Global Step: 67490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:07,925-Speed 11568.29 samples/sec Loss 3.4450 LearningRate 0.0180 Epoch: 16 Global Step: 67500 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:51:11,673-Speed 10933.08 samples/sec Loss 3.4307 LearningRate 0.0180 Epoch: 16 Global Step: 67510 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:51:15,111-Speed 11916.85 samples/sec Loss 3.4231 LearningRate 0.0180 Epoch: 16 Global Step: 67520 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:51:18,758-Speed 11233.86 samples/sec Loss 3.4419 LearningRate 0.0180 Epoch: 16 Global Step: 67530 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:51:22,367-Speed 11350.68 samples/sec Loss 3.4386 LearningRate 0.0179 Epoch: 16 Global Step: 67540 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:51:25,750-Speed 12110.55 samples/sec Loss 3.5104 LearningRate 0.0179 Epoch: 16 Global Step: 67550 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:51:29,376-Speed 11300.67 samples/sec Loss 3.4183 LearningRate 0.0179 Epoch: 16 Global Step: 67560 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:51:32,967-Speed 11408.96 samples/sec Loss 3.4183 LearningRate 0.0179 Epoch: 16 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:36,906-Speed 10400.36 samples/sec Loss 3.3945 LearningRate 0.0178 Epoch: 16 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:40,549-Speed 11244.70 samples/sec Loss 3.4953 LearningRate 0.0178 Epoch: 16 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:44,101-Speed 11535.38 samples/sec Loss 3.4619 LearningRate 0.0178 Epoch: 16 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:47,613-Speed 11666.49 samples/sec Loss 3.4609 LearningRate 0.0178 Epoch: 16 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:51,475-Speed 10609.97 samples/sec Loss 3.4858 LearningRate 0.0178 Epoch: 16 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:55,169-Speed 11091.46 samples/sec Loss 3.4401 LearningRate 0.0177 Epoch: 16 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:51:58,678-Speed 11673.88 samples/sec Loss 3.4637 LearningRate 0.0177 Epoch: 16 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:52:02,242-Speed 11496.84 samples/sec Loss 3.4479 LearningRate 0.0177 Epoch: 16 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:52:06,032-Speed 10810.76 samples/sec Loss 3.4289 LearningRate 0.0177 Epoch: 16 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:52:09,581-Speed 11542.83 samples/sec Loss 3.4491 LearningRate 0.0176 Epoch: 16 Global Step: 67670 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:13,031-Speed 11876.35 samples/sec Loss 3.4656 LearningRate 0.0176 Epoch: 16 Global Step: 67680 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:16,641-Speed 11348.46 samples/sec Loss 3.4723 LearningRate 0.0176 Epoch: 16 Global Step: 67690 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:20,113-Speed 11802.70 samples/sec Loss 3.4980 LearningRate 0.0176 Epoch: 16 Global Step: 67700 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:23,771-Speed 11198.45 samples/sec Loss 3.4250 LearningRate 0.0176 Epoch: 16 Global Step: 67710 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:27,491-Speed 11015.24 samples/sec Loss 3.4609 LearningRate 0.0175 Epoch: 16 Global Step: 67720 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:31,250-Speed 10898.69 samples/sec Loss 3.4711 LearningRate 0.0175 Epoch: 16 Global Step: 67730 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:34,883-Speed 11278.08 samples/sec Loss 3.4430 LearningRate 0.0175 Epoch: 16 Global Step: 67740 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:38,659-Speed 10849.97 samples/sec Loss 3.5072 LearningRate 0.0175 Epoch: 16 Global Step: 67750 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:42,451-Speed 10806.05 samples/sec Loss 3.4796 LearningRate 0.0174 Epoch: 16 Global Step: 67760 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:45,954-Speed 11695.94 samples/sec Loss 3.4684 LearningRate 0.0174 Epoch: 16 Global Step: 67770 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:50,125-Speed 9821.47 samples/sec Loss 3.5056 LearningRate 0.0174 Epoch: 16 Global Step: 67780 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:53,608-Speed 11762.17 samples/sec Loss 3.5078 LearningRate 0.0174 Epoch: 16 Global Step: 67790 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:52:57,229-Speed 11315.58 samples/sec Loss 3.5077 LearningRate 0.0174 Epoch: 16 Global Step: 67800 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:53:00,885-Speed 11204.54 samples/sec Loss 3.4703 LearningRate 0.0173 Epoch: 16 Global Step: 67810 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:53:04,363-Speed 11782.34 samples/sec Loss 3.4683 LearningRate 0.0173 Epoch: 16 Global Step: 67820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:07,915-Speed 11533.91 samples/sec Loss 3.4821 LearningRate 0.0173 Epoch: 16 Global Step: 67830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:11,373-Speed 11848.08 samples/sec Loss 3.4788 LearningRate 0.0173 Epoch: 16 Global Step: 67840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:14,982-Speed 11352.16 samples/sec Loss 3.5132 LearningRate 0.0172 Epoch: 16 Global Step: 67850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:18,640-Speed 11201.48 samples/sec Loss 3.4381 LearningRate 0.0172 Epoch: 16 Global Step: 67860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:22,177-Speed 11583.83 samples/sec Loss 3.4885 LearningRate 0.0172 Epoch: 16 Global Step: 67870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:25,861-Speed 11120.70 samples/sec Loss 3.4735 LearningRate 0.0172 Epoch: 16 Global Step: 67880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:29,401-Speed 11572.73 samples/sec Loss 3.5038 LearningRate 0.0172 Epoch: 16 Global Step: 67890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:33,254-Speed 10632.76 samples/sec Loss 3.5027 LearningRate 0.0171 Epoch: 16 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:36,717-Speed 11830.04 samples/sec Loss 3.5017 LearningRate 0.0171 Epoch: 16 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:53:40,183-Speed 11821.68 samples/sec Loss 3.4892 LearningRate 0.0171 Epoch: 16 Global Step: 67920 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:53:43,770-Speed 11423.94 samples/sec Loss 3.4947 LearningRate 0.0171 Epoch: 16 Global Step: 67930 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:53:48,053-Speed 9565.24 samples/sec Loss 3.5426 LearningRate 0.0170 Epoch: 16 Global Step: 67940 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:53:51,585-Speed 11599.86 samples/sec Loss 3.5348 LearningRate 0.0170 Epoch: 16 Global Step: 67950 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:53:55,235-Speed 11224.76 samples/sec Loss 3.5109 LearningRate 0.0170 Epoch: 16 Global Step: 67960 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:53:59,014-Speed 10841.69 samples/sec Loss 3.4811 LearningRate 0.0170 Epoch: 16 Global Step: 67970 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:54:02,471-Speed 11850.77 samples/sec Loss 3.5130 LearningRate 0.0170 Epoch: 16 Global Step: 67980 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:54:06,053-Speed 11439.00 samples/sec Loss 3.4641 LearningRate 0.0169 Epoch: 16 Global Step: 67990 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:54:09,696-Speed 11245.68 samples/sec Loss 3.4928 LearningRate 0.0169 Epoch: 16 Global Step: 68000 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:54:13,138-Speed 11903.80 samples/sec Loss 3.4905 LearningRate 0.0169 Epoch: 16 Global Step: 68010 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:54:16,659-Speed 11635.47 samples/sec Loss 3.5147 LearningRate 0.0169 Epoch: 16 Global Step: 68020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:20,351-Speed 11099.97 samples/sec Loss 3.4876 LearningRate 0.0168 Epoch: 16 Global Step: 68030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:24,138-Speed 10818.12 samples/sec Loss 3.5211 LearningRate 0.0168 Epoch: 16 Global Step: 68040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:27,706-Speed 11480.99 samples/sec Loss 3.5517 LearningRate 0.0168 Epoch: 16 Global Step: 68050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:31,388-Speed 11124.99 samples/sec Loss 3.5190 LearningRate 0.0168 Epoch: 16 Global Step: 68060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:34,802-Speed 12003.40 samples/sec Loss 3.4910 LearningRate 0.0168 Epoch: 16 Global Step: 68070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:38,403-Speed 11375.99 samples/sec Loss 3.4641 LearningRate 0.0167 Epoch: 16 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:42,222-Speed 10728.26 samples/sec Loss 3.5123 LearningRate 0.0167 Epoch: 16 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:45,930-Speed 11050.60 samples/sec Loss 3.5131 LearningRate 0.0167 Epoch: 16 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:50,967-Speed 8134.36 samples/sec Loss 3.4523 LearningRate 0.0167 Epoch: 16 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:54:54,715-Speed 10931.93 samples/sec Loss 3.4923 LearningRate 0.0166 Epoch: 16 Global Step: 68120 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:54:58,199-Speed 11757.38 samples/sec Loss 3.5477 LearningRate 0.0166 Epoch: 16 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:01,782-Speed 11437.87 samples/sec Loss 3.5095 LearningRate 0.0166 Epoch: 16 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:05,478-Speed 11082.31 samples/sec Loss 3.5161 LearningRate 0.0166 Epoch: 16 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:09,228-Speed 10923.81 samples/sec Loss 3.5416 LearningRate 0.0166 Epoch: 16 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:12,862-Speed 11277.07 samples/sec Loss 3.5023 LearningRate 0.0165 Epoch: 16 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:16,418-Speed 11522.37 samples/sec Loss 3.4881 LearningRate 0.0165 Epoch: 16 Global Step: 68180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:19,912-Speed 11728.98 samples/sec Loss 3.5128 LearningRate 0.0165 Epoch: 16 Global Step: 68190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:23,423-Speed 11667.66 samples/sec Loss 3.5652 LearningRate 0.0165 Epoch: 16 Global Step: 68200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:26,892-Speed 11810.01 samples/sec Loss 3.4756 LearningRate 0.0165 Epoch: 16 Global Step: 68210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:30,539-Speed 11233.35 samples/sec Loss 3.5131 LearningRate 0.0164 Epoch: 16 Global Step: 68220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:34,311-Speed 10864.75 samples/sec Loss 3.5232 LearningRate 0.0164 Epoch: 16 Global Step: 68230 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:55:37,936-Speed 11301.03 samples/sec Loss 3.5257 LearningRate 0.0164 Epoch: 16 Global Step: 68240 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:55:41,543-Speed 11359.29 samples/sec Loss 3.5238 LearningRate 0.0164 Epoch: 16 Global Step: 68250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:45,162-Speed 11355.97 samples/sec Loss 3.5045 LearningRate 0.0163 Epoch: 16 Global Step: 68260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:49,514-Speed 9413.92 samples/sec Loss 3.4993 LearningRate 0.0163 Epoch: 16 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:53,062-Speed 11549.65 samples/sec Loss 3.4972 LearningRate 0.0163 Epoch: 16 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:55:56,592-Speed 11605.31 samples/sec Loss 3.5086 LearningRate 0.0163 Epoch: 16 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:00,141-Speed 11542.29 samples/sec Loss 3.5181 LearningRate 0.0163 Epoch: 16 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:03,727-Speed 11426.71 samples/sec Loss 3.5276 LearningRate 0.0162 Epoch: 16 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:07,510-Speed 10831.30 samples/sec Loss 3.5105 LearningRate 0.0162 Epoch: 16 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:11,416-Speed 10488.38 samples/sec Loss 3.5180 LearningRate 0.0162 Epoch: 16 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:15,109-Speed 11093.58 samples/sec Loss 3.5545 LearningRate 0.0162 Epoch: 16 Global Step: 68340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:18,563-Speed 11860.68 samples/sec Loss 3.5027 LearningRate 0.0162 Epoch: 16 Global Step: 68350 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:56:22,006-Speed 11901.23 samples/sec Loss 3.5496 LearningRate 0.0161 Epoch: 16 Global Step: 68360 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:56:25,828-Speed 10719.23 samples/sec Loss 3.5141 LearningRate 0.0161 Epoch: 16 Global Step: 68370 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:56:29,439-Speed 11346.68 samples/sec Loss 3.5290 LearningRate 0.0161 Epoch: 16 Global Step: 68380 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:56:32,862-Speed 11969.97 samples/sec Loss 3.4990 LearningRate 0.0161 Epoch: 16 Global Step: 68390 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:56:36,487-Speed 11301.05 samples/sec Loss 3.4839 LearningRate 0.0160 Epoch: 16 Global Step: 68400 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:56:40,094-Speed 11358.06 samples/sec Loss 3.5358 LearningRate 0.0160 Epoch: 16 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:43,735-Speed 11254.52 samples/sec Loss 3.5091 LearningRate 0.0160 Epoch: 16 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:47,606-Speed 10584.98 samples/sec Loss 3.5547 LearningRate 0.0160 Epoch: 16 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:51,198-Speed 11406.94 samples/sec Loss 3.4937 LearningRate 0.0160 Epoch: 16 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:54,724-Speed 11620.26 samples/sec Loss 3.4861 LearningRate 0.0159 Epoch: 16 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:56:58,642-Speed 10456.12 samples/sec Loss 3.5022 LearningRate 0.0159 Epoch: 16 Global Step: 68460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:02,521-Speed 10561.17 samples/sec Loss 3.5463 LearningRate 0.0159 Epoch: 16 Global Step: 68470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:05,980-Speed 11845.38 samples/sec Loss 3.4869 LearningRate 0.0159 Epoch: 16 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:10,554-Speed 8957.24 samples/sec Loss 3.5435 LearningRate 0.0159 Epoch: 16 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:13,914-Speed 12191.54 samples/sec Loss 3.5457 LearningRate 0.0158 Epoch: 16 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:17,584-Speed 11163.45 samples/sec Loss 3.5459 LearningRate 0.0158 Epoch: 16 Global Step: 68510 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:57:21,050-Speed 11821.85 samples/sec Loss 3.5346 LearningRate 0.0158 Epoch: 16 Global Step: 68520 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:57:24,532-Speed 11769.47 samples/sec Loss 3.5511 LearningRate 0.0158 Epoch: 16 Global Step: 68530 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:57:28,369-Speed 10676.20 samples/sec Loss 3.5306 LearningRate 0.0157 Epoch: 16 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:32,179-Speed 10753.60 samples/sec Loss 3.5558 LearningRate 0.0157 Epoch: 16 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:36,029-Speed 10640.68 samples/sec Loss 3.5247 LearningRate 0.0157 Epoch: 16 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:39,556-Speed 11617.72 samples/sec Loss 3.5439 LearningRate 0.0157 Epoch: 16 Global Step: 68570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:43,143-Speed 11423.36 samples/sec Loss 3.5431 LearningRate 0.0157 Epoch: 16 Global Step: 68580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:47,082-Speed 10398.90 samples/sec Loss 3.5662 LearningRate 0.0156 Epoch: 16 Global Step: 68590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:50,758-Speed 11146.11 samples/sec Loss 3.5630 LearningRate 0.0156 Epoch: 16 Global Step: 68600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:54,290-Speed 11601.48 samples/sec Loss 3.5519 LearningRate 0.0156 Epoch: 16 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:57:57,733-Speed 11896.56 samples/sec Loss 3.5201 LearningRate 0.0156 Epoch: 16 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:01,621-Speed 10540.65 samples/sec Loss 3.5244 LearningRate 0.0156 Epoch: 16 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:05,364-Speed 10944.35 samples/sec Loss 3.5498 LearningRate 0.0155 Epoch: 16 Global Step: 68640 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:58:09,327-Speed 10338.02 samples/sec Loss 3.5153 LearningRate 0.0155 Epoch: 16 Global Step: 68650 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:58:14,002-Speed 8763.10 samples/sec Loss 3.5259 LearningRate 0.0155 Epoch: 16 Global Step: 68660 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:58:17,662-Speed 11193.89 samples/sec Loss 3.5102 LearningRate 0.0155 Epoch: 16 Global Step: 68670 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:58:21,136-Speed 11794.22 samples/sec Loss 3.5131 LearningRate 0.0155 Epoch: 16 Global Step: 68680 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:58:24,863-Speed 10992.81 samples/sec Loss 3.5014 LearningRate 0.0154 Epoch: 16 Global Step: 68690 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:58:28,380-Speed 11648.33 samples/sec Loss 3.5776 LearningRate 0.0154 Epoch: 16 Global Step: 68700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:32,018-Speed 11261.77 samples/sec Loss 3.5162 LearningRate 0.0154 Epoch: 16 Global Step: 68710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:35,588-Speed 11483.71 samples/sec Loss 3.5149 LearningRate 0.0154 Epoch: 16 Global Step: 68720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:39,033-Speed 11897.33 samples/sec Loss 3.5616 LearningRate 0.0153 Epoch: 16 Global Step: 68730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:42,561-Speed 11611.77 samples/sec Loss 3.5546 LearningRate 0.0153 Epoch: 16 Global Step: 68740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:46,152-Speed 11410.37 samples/sec Loss 3.5691 LearningRate 0.0153 Epoch: 16 Global Step: 68750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:49,764-Speed 11343.17 samples/sec Loss 3.5516 LearningRate 0.0153 Epoch: 16 Global Step: 68760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:53,215-Speed 11869.79 samples/sec Loss 3.5375 LearningRate 0.0153 Epoch: 16 Global Step: 68770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:58:56,619-Speed 12035.63 samples/sec Loss 3.5309 LearningRate 0.0152 Epoch: 16 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:00,079-Speed 11843.74 samples/sec Loss 3.5195 LearningRate 0.0152 Epoch: 16 Global Step: 68790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:03,954-Speed 10572.34 samples/sec Loss 3.4954 LearningRate 0.0152 Epoch: 16 Global Step: 68800 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:59:07,797-Speed 10662.28 samples/sec Loss 3.5300 LearningRate 0.0152 Epoch: 16 Global Step: 68810 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:59:11,530-Speed 10972.97 samples/sec Loss 3.5511 LearningRate 0.0152 Epoch: 16 Global Step: 68820 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:59:15,226-Speed 11084.63 samples/sec Loss 3.5053 LearningRate 0.0151 Epoch: 16 Global Step: 68830 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:59:18,608-Speed 12116.26 samples/sec Loss 3.5219 LearningRate 0.0151 Epoch: 16 Global Step: 68840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:22,029-Speed 11976.80 samples/sec Loss 3.5439 LearningRate 0.0151 Epoch: 16 Global Step: 68850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:25,471-Speed 11901.33 samples/sec Loss 3.5579 LearningRate 0.0151 Epoch: 16 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:28,920-Speed 11880.15 samples/sec Loss 3.5530 LearningRate 0.0151 Epoch: 16 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:32,255-Speed 12283.58 samples/sec Loss 3.5521 LearningRate 0.0150 Epoch: 16 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:35,683-Speed 11953.65 samples/sec Loss 3.5486 LearningRate 0.0150 Epoch: 16 Global Step: 68890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:39,109-Speed 11959.79 samples/sec Loss 3.5868 LearningRate 0.0150 Epoch: 16 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:42,794-Speed 11116.97 samples/sec Loss 3.5731 LearningRate 0.0150 Epoch: 16 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:46,369-Speed 11458.87 samples/sec Loss 3.5705 LearningRate 0.0150 Epoch: 16 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:49,835-Speed 11822.86 samples/sec Loss 3.5296 LearningRate 0.0149 Epoch: 16 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 04:59:53,631-Speed 10791.51 samples/sec Loss 3.5761 LearningRate 0.0149 Epoch: 16 Global Step: 68940 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 04:59:57,236-Speed 11365.41 samples/sec Loss 3.5896 LearningRate 0.0149 Epoch: 16 Global Step: 68950 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:00:00,979-Speed 10947.54 samples/sec Loss 3.5387 LearningRate 0.0149 Epoch: 16 Global Step: 68960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:04,766-Speed 10817.44 samples/sec Loss 3.5344 LearningRate 0.0149 Epoch: 16 Global Step: 68970 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:08,354-Speed 11418.95 samples/sec Loss 3.5189 LearningRate 0.0148 Epoch: 16 Global Step: 68980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:12,084-Speed 10985.75 samples/sec Loss 3.5554 LearningRate 0.0148 Epoch: 16 Global Step: 68990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:16,243-Speed 9850.17 samples/sec Loss 3.5545 LearningRate 0.0148 Epoch: 16 Global Step: 69000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:20,024-Speed 10833.45 samples/sec Loss 3.5123 LearningRate 0.0148 Epoch: 16 Global Step: 69010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:23,683-Speed 11200.22 samples/sec Loss 3.5572 LearningRate 0.0147 Epoch: 16 Global Step: 69020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:27,388-Speed 11056.75 samples/sec Loss 3.5712 LearningRate 0.0147 Epoch: 16 Global Step: 69030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:30,995-Speed 11361.22 samples/sec Loss 3.5394 LearningRate 0.0147 Epoch: 16 Global Step: 69040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:34,433-Speed 11915.50 samples/sec Loss 3.5361 LearningRate 0.0147 Epoch: 16 Global Step: 69050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:38,037-Speed 11367.61 samples/sec Loss 3.5737 LearningRate 0.0147 Epoch: 16 Global Step: 69060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:41,890-Speed 10634.59 samples/sec Loss 3.5619 LearningRate 0.0146 Epoch: 16 Global Step: 69070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:45,669-Speed 10840.30 samples/sec Loss 3.5632 LearningRate 0.0146 Epoch: 16 Global Step: 69080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:49,302-Speed 11277.20 samples/sec Loss 3.5575 LearningRate 0.0146 Epoch: 16 Global Step: 69090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:52,736-Speed 11931.84 samples/sec Loss 3.5360 LearningRate 0.0146 Epoch: 16 Global Step: 69100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:56,159-Speed 11967.97 samples/sec Loss 3.5531 LearningRate 0.0146 Epoch: 16 Global Step: 69110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:00:59,721-Speed 11503.63 samples/sec Loss 3.5166 LearningRate 0.0145 Epoch: 16 Global Step: 69120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:03,539-Speed 10729.40 samples/sec Loss 3.5789 LearningRate 0.0145 Epoch: 16 Global Step: 69130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:07,169-Speed 11286.35 samples/sec Loss 3.5325 LearningRate 0.0145 Epoch: 16 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:10,924-Speed 10910.97 samples/sec Loss 3.5835 LearningRate 0.0145 Epoch: 16 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:14,637-Speed 11035.12 samples/sec Loss 3.5526 LearningRate 0.0145 Epoch: 16 Global Step: 69160 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:01:18,803-Speed 9833.05 samples/sec Loss 3.5636 LearningRate 0.0144 Epoch: 16 Global Step: 69170 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:01:22,226-Speed 11969.81 samples/sec Loss 3.6140 LearningRate 0.0144 Epoch: 16 Global Step: 69180 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:01:26,029-Speed 10772.72 samples/sec Loss 3.5674 LearningRate 0.0144 Epoch: 16 Global Step: 69190 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:01:29,475-Speed 11892.13 samples/sec Loss 3.5587 LearningRate 0.0144 Epoch: 16 Global Step: 69200 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:01:32,920-Speed 11891.09 samples/sec Loss 3.5846 LearningRate 0.0144 Epoch: 16 Global Step: 69210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:36,401-Speed 11770.94 samples/sec Loss 3.5515 LearningRate 0.0143 Epoch: 16 Global Step: 69220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:39,971-Speed 11475.90 samples/sec Loss 3.5664 LearningRate 0.0143 Epoch: 16 Global Step: 69230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:43,515-Speed 11561.23 samples/sec Loss 3.4794 LearningRate 0.0143 Epoch: 16 Global Step: 69240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:47,047-Speed 11598.44 samples/sec Loss 3.5790 LearningRate 0.0143 Epoch: 16 Global Step: 69250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:50,719-Speed 11160.19 samples/sec Loss 3.5516 LearningRate 0.0143 Epoch: 16 Global Step: 69260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:54,400-Speed 11129.50 samples/sec Loss 3.5380 LearningRate 0.0142 Epoch: 16 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:01:58,335-Speed 10413.33 samples/sec Loss 3.5345 LearningRate 0.0142 Epoch: 16 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:02,117-Speed 10831.36 samples/sec Loss 3.5596 LearningRate 0.0142 Epoch: 16 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:05,607-Speed 11740.87 samples/sec Loss 3.5479 LearningRate 0.0142 Epoch: 16 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:09,099-Speed 11734.33 samples/sec Loss 3.5207 LearningRate 0.0142 Epoch: 16 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:12,849-Speed 10925.15 samples/sec Loss 3.5598 LearningRate 0.0141 Epoch: 16 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:16,553-Speed 11061.01 samples/sec Loss 3.5573 LearningRate 0.0141 Epoch: 16 Global Step: 69330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:20,083-Speed 11607.26 samples/sec Loss 3.5993 LearningRate 0.0141 Epoch: 16 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:23,566-Speed 11763.08 samples/sec Loss 3.5470 LearningRate 0.0141 Epoch: 16 Global Step: 69350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:27,095-Speed 11607.64 samples/sec Loss 3.5288 LearningRate 0.0141 Epoch: 16 Global Step: 69360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:30,675-Speed 11446.30 samples/sec Loss 3.5955 LearningRate 0.0140 Epoch: 16 Global Step: 69370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:34,131-Speed 11855.43 samples/sec Loss 3.5396 LearningRate 0.0140 Epoch: 16 Global Step: 69380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:37,740-Speed 11351.92 samples/sec Loss 3.5486 LearningRate 0.0140 Epoch: 16 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:41,829-Speed 10019.66 samples/sec Loss 3.5794 LearningRate 0.0140 Epoch: 16 Global Step: 69400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-17 05:02:45,278-Speed 11878.35 samples/sec Loss 3.5349 LearningRate 0.0140 Epoch: 16 Global Step: 69410 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:02:49,052-Speed 10856.15 samples/sec Loss 3.5570 LearningRate 0.0139 Epoch: 16 Global Step: 69420 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:02:52,880-Speed 10701.12 samples/sec Loss 3.5749 LearningRate 0.0139 Epoch: 16 Global Step: 69430 Fp16 Grad Scale: 262144 Required: 2 hours Training: 2022-01-17 05:02:57,198-Speed 9488.19 samples/sec Loss 3.5528 LearningRate 0.0139 Epoch: 16 Global Step: 69440 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:00,916-Speed 11020.64 samples/sec Loss 3.5864 LearningRate 0.0139 Epoch: 16 Global Step: 69450 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:04,532-Speed 11328.86 samples/sec Loss 3.5887 LearningRate 0.0139 Epoch: 16 Global Step: 69460 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:08,346-Speed 10743.79 samples/sec Loss 3.5445 LearningRate 0.0138 Epoch: 16 Global Step: 69470 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:11,971-Speed 11299.63 samples/sec Loss 3.5894 LearningRate 0.0138 Epoch: 16 Global Step: 69480 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:15,712-Speed 10950.22 samples/sec Loss 3.5384 LearningRate 0.0138 Epoch: 16 Global Step: 69490 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:19,254-Speed 11567.13 samples/sec Loss 3.5792 LearningRate 0.0138 Epoch: 16 Global Step: 69500 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:22,809-Speed 11524.19 samples/sec Loss 3.6147 LearningRate 0.0138 Epoch: 16 Global Step: 69510 Fp16 Grad Scale: 524288 Required: 1 hours Training: 2022-01-17 05:03:26,266-Speed 11854.35 samples/sec Loss 3.5751 LearningRate 0.0137 Epoch: 16 Global Step: 69520 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:29,744-Speed 11780.47 samples/sec Loss 3.5852 LearningRate 0.0137 Epoch: 16 Global Step: 69530 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:33,494-Speed 10924.19 samples/sec Loss 3.5651 LearningRate 0.0137 Epoch: 16 Global Step: 69540 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:37,292-Speed 10788.84 samples/sec Loss 3.5370 LearningRate 0.0137 Epoch: 16 Global Step: 69550 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:40,740-Speed 11883.32 samples/sec Loss 3.5445 LearningRate 0.0137 Epoch: 16 Global Step: 69560 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:03:44,134-Speed 12073.26 samples/sec Loss 3.5260 LearningRate 0.0136 Epoch: 16 Global Step: 69570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:03:47,600-Speed 11819.33 samples/sec Loss 3.5643 LearningRate 0.0136 Epoch: 16 Global Step: 69580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:03:51,100-Speed 11705.40 samples/sec Loss 3.5266 LearningRate 0.0136 Epoch: 16 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:03:55,442-Speed 9436.45 samples/sec Loss 3.5506 LearningRate 0.0136 Epoch: 16 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:03:59,585-Speed 9888.35 samples/sec Loss 3.5692 LearningRate 0.0136 Epoch: 16 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:03,219-Speed 11275.41 samples/sec Loss 3.5589 LearningRate 0.0135 Epoch: 16 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:06,554-Speed 12287.38 samples/sec Loss 3.5212 LearningRate 0.0135 Epoch: 16 Global Step: 69630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:10,072-Speed 11645.22 samples/sec Loss 3.5796 LearningRate 0.0135 Epoch: 16 Global Step: 69640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:13,545-Speed 11795.56 samples/sec Loss 3.5740 LearningRate 0.0135 Epoch: 16 Global Step: 69650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:17,435-Speed 10531.12 samples/sec Loss 3.5772 LearningRate 0.0135 Epoch: 16 Global Step: 69660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:21,080-Speed 11242.63 samples/sec Loss 3.5300 LearningRate 0.0134 Epoch: 16 Global Step: 69670 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:04:24,717-Speed 11264.42 samples/sec Loss 3.5586 LearningRate 0.0134 Epoch: 16 Global Step: 69680 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:04:28,278-Speed 11505.94 samples/sec Loss 3.5760 LearningRate 0.0134 Epoch: 16 Global Step: 69690 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:04:31,903-Speed 11302.66 samples/sec Loss 3.5537 LearningRate 0.0134 Epoch: 16 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:35,816-Speed 10471.57 samples/sec Loss 3.5304 LearningRate 0.0134 Epoch: 16 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:39,744-Speed 10429.61 samples/sec Loss 3.5608 LearningRate 0.0134 Epoch: 16 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:43,408-Speed 11181.10 samples/sec Loss 3.5626 LearningRate 0.0133 Epoch: 16 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:47,515-Speed 9974.33 samples/sec Loss 3.5799 LearningRate 0.0133 Epoch: 16 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:51,689-Speed 9815.83 samples/sec Loss 3.5888 LearningRate 0.0133 Epoch: 16 Global Step: 69750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:55,786-Speed 9999.82 samples/sec Loss 3.5428 LearningRate 0.0133 Epoch: 16 Global Step: 69760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:04:59,611-Speed 10712.63 samples/sec Loss 3.5600 LearningRate 0.0133 Epoch: 16 Global Step: 69770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:03,080-Speed 11810.19 samples/sec Loss 3.5816 LearningRate 0.0132 Epoch: 16 Global Step: 69780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:06,805-Speed 10999.67 samples/sec Loss 3.4980 LearningRate 0.0132 Epoch: 16 Global Step: 69790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:10,811-Speed 10227.17 samples/sec Loss 3.5782 LearningRate 0.0132 Epoch: 16 Global Step: 69800 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:05:14,492-Speed 11131.95 samples/sec Loss 3.5743 LearningRate 0.0132 Epoch: 16 Global Step: 69810 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:05:18,596-Speed 9980.48 samples/sec Loss 3.5561 LearningRate 0.0132 Epoch: 16 Global Step: 69820 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:05:22,049-Speed 11868.38 samples/sec Loss 3.5656 LearningRate 0.0131 Epoch: 16 Global Step: 69830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:26,010-Speed 10341.79 samples/sec Loss 3.5464 LearningRate 0.0131 Epoch: 16 Global Step: 69840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:29,924-Speed 10469.42 samples/sec Loss 3.5539 LearningRate 0.0131 Epoch: 16 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:33,346-Speed 11972.57 samples/sec Loss 3.5664 LearningRate 0.0131 Epoch: 16 Global Step: 69860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:36,764-Speed 11984.47 samples/sec Loss 3.6001 LearningRate 0.0131 Epoch: 16 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:41,039-Speed 9585.06 samples/sec Loss 3.5345 LearningRate 0.0130 Epoch: 16 Global Step: 69880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:44,481-Speed 11899.71 samples/sec Loss 3.5415 LearningRate 0.0130 Epoch: 16 Global Step: 69890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:47,888-Speed 12027.16 samples/sec Loss 3.5870 LearningRate 0.0130 Epoch: 16 Global Step: 69900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:51,461-Speed 11470.35 samples/sec Loss 3.5560 LearningRate 0.0130 Epoch: 16 Global Step: 69910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:55,614-Speed 9863.23 samples/sec Loss 3.5551 LearningRate 0.0130 Epoch: 16 Global Step: 69920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:05:59,058-Speed 11899.71 samples/sec Loss 3.5756 LearningRate 0.0129 Epoch: 16 Global Step: 69930 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:06:02,864-Speed 10762.66 samples/sec Loss 3.5318 LearningRate 0.0129 Epoch: 16 Global Step: 69940 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:06:06,478-Speed 11337.27 samples/sec Loss 3.5992 LearningRate 0.0129 Epoch: 16 Global Step: 69950 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:06:09,907-Speed 11949.76 samples/sec Loss 3.6015 LearningRate 0.0129 Epoch: 16 Global Step: 69960 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:06:13,336-Speed 11951.99 samples/sec Loss 3.5474 LearningRate 0.0129 Epoch: 16 Global Step: 69970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:06:16,751-Speed 11997.81 samples/sec Loss 3.5757 LearningRate 0.0129 Epoch: 16 Global Step: 69980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:06:20,381-Speed 11285.92 samples/sec Loss 3.5353 LearningRate 0.0128 Epoch: 16 Global Step: 69990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:06:23,799-Speed 11984.86 samples/sec Loss 3.5881 LearningRate 0.0128 Epoch: 16 Global Step: 70000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:06:45,224-[lfw][70000]XNorm: 7.067692 Training: 2022-01-17 05:06:45,225-[lfw][70000]Accuracy-Flip: 0.99617+-0.00279 Training: 2022-01-17 05:06:45,225-[lfw][70000]Accuracy-Highest: 0.99733 Training: 2022-01-17 05:07:09,836-[cfp_fp][70000]XNorm: 6.014103 Training: 2022-01-17 05:07:09,837-[cfp_fp][70000]Accuracy-Flip: 0.97300+-0.00931 Training: 2022-01-17 05:07:09,837-[cfp_fp][70000]Accuracy-Highest: 0.97300 Training: 2022-01-17 05:07:31,087-[agedb_30][70000]XNorm: 6.772341 Training: 2022-01-17 05:07:31,087-[agedb_30][70000]Accuracy-Flip: 0.97200+-0.00627 Training: 2022-01-17 05:07:31,088-[agedb_30][70000]Accuracy-Highest: 0.97200 Training: 2022-01-17 05:07:34,482-Speed 579.50 samples/sec Loss 3.5643 LearningRate 0.0128 Epoch: 16 Global Step: 70010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:07:37,837-Speed 12212.18 samples/sec Loss 3.5616 LearningRate 0.0128 Epoch: 16 Global Step: 70020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:07:41,204-Speed 12171.20 samples/sec Loss 3.5397 LearningRate 0.0128 Epoch: 16 Global Step: 70030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:07:44,548-Speed 12251.42 samples/sec Loss 3.5498 LearningRate 0.0127 Epoch: 16 Global Step: 70040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:07:47,900-Speed 12225.91 samples/sec Loss 3.5299 LearningRate 0.0127 Epoch: 16 Global Step: 70050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:07:51,298-Speed 12054.94 samples/sec Loss 3.5625 LearningRate 0.0127 Epoch: 16 Global Step: 70060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:07:54,667-Speed 12161.34 samples/sec Loss 3.5323 LearningRate 0.0127 Epoch: 16 Global Step: 70070 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:07:58,048-Speed 12118.08 samples/sec Loss 3.5981 LearningRate 0.0127 Epoch: 16 Global Step: 70080 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:08:01,444-Speed 12066.54 samples/sec Loss 3.5739 LearningRate 0.0126 Epoch: 16 Global Step: 70090 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:08:04,846-Speed 12040.89 samples/sec Loss 3.5687 LearningRate 0.0126 Epoch: 16 Global Step: 70100 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:08:08,214-Speed 12164.98 samples/sec Loss 3.5318 LearningRate 0.0126 Epoch: 16 Global Step: 70110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:12,279-Speed 10078.60 samples/sec Loss 3.5739 LearningRate 0.0126 Epoch: 16 Global Step: 70120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:15,903-Speed 11307.70 samples/sec Loss 3.5450 LearningRate 0.0126 Epoch: 16 Global Step: 70130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:19,321-Speed 11986.18 samples/sec Loss 3.5886 LearningRate 0.0125 Epoch: 16 Global Step: 70140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:22,718-Speed 12059.99 samples/sec Loss 3.5243 LearningRate 0.0125 Epoch: 16 Global Step: 70150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:26,282-Speed 11495.36 samples/sec Loss 3.5294 LearningRate 0.0125 Epoch: 16 Global Step: 70160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:29,768-Speed 11754.94 samples/sec Loss 3.5685 LearningRate 0.0125 Epoch: 16 Global Step: 70170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:33,494-Speed 10996.22 samples/sec Loss 3.5576 LearningRate 0.0125 Epoch: 16 Global Step: 70180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:36,981-Speed 11746.00 samples/sec Loss 3.5347 LearningRate 0.0125 Epoch: 16 Global Step: 70190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:40,664-Speed 11123.34 samples/sec Loss 3.5143 LearningRate 0.0124 Epoch: 16 Global Step: 70200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:44,155-Speed 11738.95 samples/sec Loss 3.5181 LearningRate 0.0124 Epoch: 16 Global Step: 70210 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:08:47,803-Speed 11229.36 samples/sec Loss 3.5952 LearningRate 0.0124 Epoch: 16 Global Step: 70220 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:08:51,810-Speed 10226.74 samples/sec Loss 3.5708 LearningRate 0.0124 Epoch: 16 Global Step: 70230 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:08:55,242-Speed 11937.68 samples/sec Loss 3.5614 LearningRate 0.0124 Epoch: 16 Global Step: 70240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:08:58,986-Speed 10943.09 samples/sec Loss 3.5457 LearningRate 0.0123 Epoch: 16 Global Step: 70250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:03,133-Speed 9878.31 samples/sec Loss 3.5911 LearningRate 0.0123 Epoch: 16 Global Step: 70260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:06,587-Speed 11862.29 samples/sec Loss 3.5685 LearningRate 0.0123 Epoch: 16 Global Step: 70270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:10,119-Speed 11600.42 samples/sec Loss 3.5597 LearningRate 0.0123 Epoch: 16 Global Step: 70280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:13,583-Speed 11827.32 samples/sec Loss 3.5235 LearningRate 0.0123 Epoch: 16 Global Step: 70290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:17,046-Speed 11830.26 samples/sec Loss 3.5771 LearningRate 0.0122 Epoch: 16 Global Step: 70300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:20,490-Speed 11896.66 samples/sec Loss 3.5651 LearningRate 0.0122 Epoch: 16 Global Step: 70310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:24,668-Speed 9806.02 samples/sec Loss 3.5593 LearningRate 0.0122 Epoch: 16 Global Step: 70320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:09:28,310-Speed 11251.03 samples/sec Loss 3.5454 LearningRate 0.0122 Epoch: 16 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:31,824-Speed 11659.83 samples/sec Loss 3.5693 LearningRate 0.0122 Epoch: 16 Global Step: 70340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:35,548-Speed 11000.23 samples/sec Loss 3.5666 LearningRate 0.0122 Epoch: 16 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:39,118-Speed 11498.16 samples/sec Loss 3.5654 LearningRate 0.0121 Epoch: 16 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:42,559-Speed 11905.76 samples/sec Loss 3.5792 LearningRate 0.0121 Epoch: 16 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:45,992-Speed 11932.38 samples/sec Loss 3.5311 LearningRate 0.0121 Epoch: 16 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:49,421-Speed 11951.82 samples/sec Loss 3.5524 LearningRate 0.0121 Epoch: 16 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:52,877-Speed 11855.86 samples/sec Loss 3.5470 LearningRate 0.0121 Epoch: 16 Global Step: 70400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:09:56,660-Speed 10828.99 samples/sec Loss 3.6002 LearningRate 0.0120 Epoch: 16 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:10:00,097-Speed 11918.17 samples/sec Loss 3.5610 LearningRate 0.0120 Epoch: 16 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:10:03,965-Speed 10593.43 samples/sec Loss 3.5565 LearningRate 0.0120 Epoch: 16 Global Step: 70430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:07,388-Speed 11970.17 samples/sec Loss 3.5628 LearningRate 0.0120 Epoch: 16 Global Step: 70440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:10,943-Speed 11526.87 samples/sec Loss 3.5682 LearningRate 0.0120 Epoch: 16 Global Step: 70450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:14,440-Speed 11712.01 samples/sec Loss 3.5443 LearningRate 0.0120 Epoch: 16 Global Step: 70460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:18,058-Speed 11325.14 samples/sec Loss 3.5563 LearningRate 0.0119 Epoch: 16 Global Step: 70470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:21,544-Speed 11752.91 samples/sec Loss 3.5444 LearningRate 0.0119 Epoch: 16 Global Step: 70480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:25,137-Speed 11402.30 samples/sec Loss 3.5693 LearningRate 0.0119 Epoch: 16 Global Step: 70490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:28,831-Speed 11090.91 samples/sec Loss 3.5347 LearningRate 0.0119 Epoch: 16 Global Step: 70500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:32,537-Speed 11057.24 samples/sec Loss 3.5787 LearningRate 0.0119 Epoch: 16 Global Step: 70510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:36,193-Speed 11206.34 samples/sec Loss 3.5494 LearningRate 0.0118 Epoch: 16 Global Step: 70520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:40,020-Speed 10704.21 samples/sec Loss 3.5611 LearningRate 0.0118 Epoch: 16 Global Step: 70530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:43,428-Speed 12022.67 samples/sec Loss 3.5310 LearningRate 0.0118 Epoch: 16 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:46,804-Speed 12137.59 samples/sec Loss 3.5789 LearningRate 0.0118 Epoch: 16 Global Step: 70550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:50,267-Speed 11830.77 samples/sec Loss 3.5293 LearningRate 0.0118 Epoch: 16 Global Step: 70560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:54,084-Speed 10731.03 samples/sec Loss 3.6072 LearningRate 0.0117 Epoch: 16 Global Step: 70570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:10:58,358-Speed 9586.10 samples/sec Loss 3.5593 LearningRate 0.0117 Epoch: 16 Global Step: 70580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:11:02,389-Speed 10164.20 samples/sec Loss 3.5630 LearningRate 0.0117 Epoch: 16 Global Step: 70590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:11:06,042-Speed 11215.90 samples/sec Loss 3.5509 LearningRate 0.0117 Epoch: 16 Global Step: 70600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:11:09,514-Speed 11800.39 samples/sec Loss 3.5310 LearningRate 0.0117 Epoch: 16 Global Step: 70610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:11:12,930-Speed 11993.84 samples/sec Loss 3.5415 LearningRate 0.0117 Epoch: 16 Global Step: 70620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:11:16,415-Speed 11756.68 samples/sec Loss 3.5722 LearningRate 0.0116 Epoch: 16 Global Step: 70630 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:20,347-Speed 10417.79 samples/sec Loss 3.5671 LearningRate 0.0116 Epoch: 16 Global Step: 70640 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:24,055-Speed 11049.70 samples/sec Loss 3.5526 LearningRate 0.0116 Epoch: 16 Global Step: 70650 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:27,850-Speed 10795.19 samples/sec Loss 3.5667 LearningRate 0.0116 Epoch: 16 Global Step: 70660 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:31,489-Speed 11259.38 samples/sec Loss 3.5202 LearningRate 0.0116 Epoch: 16 Global Step: 70670 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:34,908-Speed 11982.15 samples/sec Loss 3.5706 LearningRate 0.0115 Epoch: 16 Global Step: 70680 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:38,342-Speed 11933.10 samples/sec Loss 3.5632 LearningRate 0.0115 Epoch: 16 Global Step: 70690 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:41,853-Speed 11670.13 samples/sec Loss 3.5411 LearningRate 0.0115 Epoch: 16 Global Step: 70700 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:45,567-Speed 11031.49 samples/sec Loss 3.5729 LearningRate 0.0115 Epoch: 16 Global Step: 70710 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:49,477-Speed 10476.77 samples/sec Loss 3.5312 LearningRate 0.0115 Epoch: 16 Global Step: 70720 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:11:53,203-Speed 10995.03 samples/sec Loss 3.5636 LearningRate 0.0115 Epoch: 16 Global Step: 70730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:11:57,171-Speed 10326.32 samples/sec Loss 3.5428 LearningRate 0.0114 Epoch: 16 Global Step: 70740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:00,601-Speed 11943.25 samples/sec Loss 3.5591 LearningRate 0.0114 Epoch: 16 Global Step: 70750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:04,698-Speed 10001.00 samples/sec Loss 3.5474 LearningRate 0.0114 Epoch: 16 Global Step: 70760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:08,402-Speed 11060.90 samples/sec Loss 3.5355 LearningRate 0.0114 Epoch: 16 Global Step: 70770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:11,924-Speed 11634.49 samples/sec Loss 3.5856 LearningRate 0.0114 Epoch: 16 Global Step: 70780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:15,323-Speed 12053.91 samples/sec Loss 3.5165 LearningRate 0.0114 Epoch: 16 Global Step: 70790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:18,781-Speed 11846.37 samples/sec Loss 3.5575 LearningRate 0.0113 Epoch: 16 Global Step: 70800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:22,217-Speed 11923.63 samples/sec Loss 3.5463 LearningRate 0.0113 Epoch: 16 Global Step: 70810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:25,812-Speed 11398.05 samples/sec Loss 3.5528 LearningRate 0.0113 Epoch: 16 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:12:29,539-Speed 10993.31 samples/sec Loss 3.5854 LearningRate 0.0113 Epoch: 16 Global Step: 70830 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:33,376-Speed 10674.85 samples/sec Loss 3.6000 LearningRate 0.0113 Epoch: 16 Global Step: 70840 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:36,884-Speed 11677.84 samples/sec Loss 3.5415 LearningRate 0.0112 Epoch: 16 Global Step: 70850 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:40,493-Speed 11354.89 samples/sec Loss 3.5639 LearningRate 0.0112 Epoch: 16 Global Step: 70860 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:44,229-Speed 10965.29 samples/sec Loss 3.5611 LearningRate 0.0112 Epoch: 16 Global Step: 70870 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:47,749-Speed 11641.39 samples/sec Loss 3.5938 LearningRate 0.0112 Epoch: 16 Global Step: 70880 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:51,276-Speed 11615.93 samples/sec Loss 3.5528 LearningRate 0.0112 Epoch: 16 Global Step: 70890 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:54,760-Speed 11756.76 samples/sec Loss 3.5321 LearningRate 0.0112 Epoch: 16 Global Step: 70900 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:12:58,409-Speed 11226.61 samples/sec Loss 3.5500 LearningRate 0.0111 Epoch: 16 Global Step: 70910 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:13:02,764-Speed 9410.17 samples/sec Loss 3.5288 LearningRate 0.0111 Epoch: 16 Global Step: 70920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:13:45,787-Speed 952.07 samples/sec Loss 3.3022 LearningRate 0.0111 Epoch: 17 Global Step: 70930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:13:49,482-Speed 11088.59 samples/sec Loss 3.0771 LearningRate 0.0111 Epoch: 17 Global Step: 70940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:13:53,619-Speed 9903.27 samples/sec Loss 3.1139 LearningRate 0.0111 Epoch: 17 Global Step: 70950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:13:57,032-Speed 12004.95 samples/sec Loss 3.1414 LearningRate 0.0110 Epoch: 17 Global Step: 70960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:00,471-Speed 11912.53 samples/sec Loss 3.1562 LearningRate 0.0110 Epoch: 17 Global Step: 70970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:04,806-Speed 9451.78 samples/sec Loss 3.0798 LearningRate 0.0110 Epoch: 17 Global Step: 70980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:08,538-Speed 10979.36 samples/sec Loss 3.0956 LearningRate 0.0110 Epoch: 17 Global Step: 70990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:12,183-Speed 11238.62 samples/sec Loss 3.1139 LearningRate 0.0110 Epoch: 17 Global Step: 71000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:15,602-Speed 11984.05 samples/sec Loss 3.1356 LearningRate 0.0110 Epoch: 17 Global Step: 71010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:19,252-Speed 11225.33 samples/sec Loss 3.1313 LearningRate 0.0109 Epoch: 17 Global Step: 71020 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:14:22,994-Speed 10947.68 samples/sec Loss 3.1386 LearningRate 0.0109 Epoch: 17 Global Step: 71030 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:14:26,689-Speed 11086.15 samples/sec Loss 3.1025 LearningRate 0.0109 Epoch: 17 Global Step: 71040 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:14:31,051-Speed 9392.73 samples/sec Loss 3.1656 LearningRate 0.0109 Epoch: 17 Global Step: 71050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:34,695-Speed 11245.16 samples/sec Loss 3.1450 LearningRate 0.0109 Epoch: 17 Global Step: 71060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:38,211-Speed 11652.22 samples/sec Loss 3.1424 LearningRate 0.0109 Epoch: 17 Global Step: 71070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:41,670-Speed 11843.05 samples/sec Loss 3.1837 LearningRate 0.0108 Epoch: 17 Global Step: 71080 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:45,072-Speed 12046.98 samples/sec Loss 3.0840 LearningRate 0.0108 Epoch: 17 Global Step: 71090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:48,443-Speed 12153.25 samples/sec Loss 3.1453 LearningRate 0.0108 Epoch: 17 Global Step: 71100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:51,982-Speed 11575.29 samples/sec Loss 3.1066 LearningRate 0.0108 Epoch: 17 Global Step: 71110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:55,671-Speed 11103.52 samples/sec Loss 3.1419 LearningRate 0.0108 Epoch: 17 Global Step: 71120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:14:59,167-Speed 11721.11 samples/sec Loss 3.1520 LearningRate 0.0107 Epoch: 17 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:02,769-Speed 11375.89 samples/sec Loss 3.1068 LearningRate 0.0107 Epoch: 17 Global Step: 71140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:06,437-Speed 11169.39 samples/sec Loss 3.1479 LearningRate 0.0107 Epoch: 17 Global Step: 71150 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:15:10,048-Speed 11346.80 samples/sec Loss 3.1318 LearningRate 0.0107 Epoch: 17 Global Step: 71160 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:15:13,830-Speed 10832.94 samples/sec Loss 3.1503 LearningRate 0.0107 Epoch: 17 Global Step: 71170 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:15:17,354-Speed 11625.51 samples/sec Loss 3.1524 LearningRate 0.0107 Epoch: 17 Global Step: 71180 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:15:21,019-Speed 11178.90 samples/sec Loss 3.1075 LearningRate 0.0106 Epoch: 17 Global Step: 71190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:24,469-Speed 11876.10 samples/sec Loss 3.1503 LearningRate 0.0106 Epoch: 17 Global Step: 71200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:27,965-Speed 11717.52 samples/sec Loss 3.1109 LearningRate 0.0106 Epoch: 17 Global Step: 71210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:31,629-Speed 11182.17 samples/sec Loss 3.1406 LearningRate 0.0106 Epoch: 17 Global Step: 71220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:35,270-Speed 11254.04 samples/sec Loss 3.1769 LearningRate 0.0106 Epoch: 17 Global Step: 71230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:39,555-Speed 9562.05 samples/sec Loss 3.1673 LearningRate 0.0106 Epoch: 17 Global Step: 71240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:43,353-Speed 10785.48 samples/sec Loss 3.1432 LearningRate 0.0105 Epoch: 17 Global Step: 71250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:46,975-Speed 11314.32 samples/sec Loss 3.1591 LearningRate 0.0105 Epoch: 17 Global Step: 71260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:50,471-Speed 11719.20 samples/sec Loss 3.1689 LearningRate 0.0105 Epoch: 17 Global Step: 71270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:54,062-Speed 11406.43 samples/sec Loss 3.1781 LearningRate 0.0105 Epoch: 17 Global Step: 71280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:15:57,786-Speed 11001.98 samples/sec Loss 3.1648 LearningRate 0.0105 Epoch: 17 Global Step: 71290 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:16:01,296-Speed 11675.10 samples/sec Loss 3.1419 LearningRate 0.0105 Epoch: 17 Global Step: 71300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:04,885-Speed 11414.84 samples/sec Loss 3.1884 LearningRate 0.0104 Epoch: 17 Global Step: 71310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:08,443-Speed 11513.07 samples/sec Loss 3.1901 LearningRate 0.0104 Epoch: 17 Global Step: 71320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:11,958-Speed 11655.83 samples/sec Loss 3.1723 LearningRate 0.0104 Epoch: 17 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:15,556-Speed 11390.37 samples/sec Loss 3.1101 LearningRate 0.0104 Epoch: 17 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:19,094-Speed 11579.52 samples/sec Loss 3.1889 LearningRate 0.0104 Epoch: 17 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:22,754-Speed 11196.31 samples/sec Loss 3.1638 LearningRate 0.0104 Epoch: 17 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:26,664-Speed 10477.63 samples/sec Loss 3.1562 LearningRate 0.0103 Epoch: 17 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:30,304-Speed 11254.90 samples/sec Loss 3.1685 LearningRate 0.0103 Epoch: 17 Global Step: 71380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:34,067-Speed 10887.67 samples/sec Loss 3.1482 LearningRate 0.0103 Epoch: 17 Global Step: 71390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:16:37,712-Speed 11239.58 samples/sec Loss 3.1736 LearningRate 0.0103 Epoch: 17 Global Step: 71400 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:16:41,196-Speed 11759.11 samples/sec Loss 3.1693 LearningRate 0.0103 Epoch: 17 Global Step: 71410 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:16:44,768-Speed 11467.82 samples/sec Loss 3.1994 LearningRate 0.0102 Epoch: 17 Global Step: 71420 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:16:48,365-Speed 11391.49 samples/sec Loss 3.1934 LearningRate 0.0102 Epoch: 17 Global Step: 71430 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:16:51,859-Speed 11726.69 samples/sec Loss 3.1588 LearningRate 0.0102 Epoch: 17 Global Step: 71440 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:16:55,375-Speed 11653.18 samples/sec Loss 3.1609 LearningRate 0.0102 Epoch: 17 Global Step: 71450 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:16:59,023-Speed 11228.66 samples/sec Loss 3.2138 LearningRate 0.0102 Epoch: 17 Global Step: 71460 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:17:03,392-Speed 9376.64 samples/sec Loss 3.2072 LearningRate 0.0102 Epoch: 17 Global Step: 71470 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:17:07,044-Speed 11217.96 samples/sec Loss 3.1873 LearningRate 0.0101 Epoch: 17 Global Step: 71480 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:17:10,554-Speed 11672.36 samples/sec Loss 3.1813 LearningRate 0.0101 Epoch: 17 Global Step: 71490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:14,094-Speed 11577.13 samples/sec Loss 3.1850 LearningRate 0.0101 Epoch: 17 Global Step: 71500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:17,531-Speed 11920.87 samples/sec Loss 3.2049 LearningRate 0.0101 Epoch: 17 Global Step: 71510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:21,158-Speed 11296.44 samples/sec Loss 3.1759 LearningRate 0.0101 Epoch: 17 Global Step: 71520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:24,687-Speed 11609.02 samples/sec Loss 3.1858 LearningRate 0.0101 Epoch: 17 Global Step: 71530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:28,319-Speed 11280.10 samples/sec Loss 3.1609 LearningRate 0.0100 Epoch: 17 Global Step: 71540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:31,739-Speed 11981.08 samples/sec Loss 3.2016 LearningRate 0.0100 Epoch: 17 Global Step: 71550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:35,209-Speed 11808.21 samples/sec Loss 3.2115 LearningRate 0.0100 Epoch: 17 Global Step: 71560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:38,662-Speed 11862.77 samples/sec Loss 3.1694 LearningRate 0.0100 Epoch: 17 Global Step: 71570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:42,117-Speed 11858.67 samples/sec Loss 3.2096 LearningRate 0.0100 Epoch: 17 Global Step: 71580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:17:45,809-Speed 11096.66 samples/sec Loss 3.1702 LearningRate 0.0100 Epoch: 17 Global Step: 71590 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:17:49,388-Speed 11447.49 samples/sec Loss 3.2254 LearningRate 0.0099 Epoch: 17 Global Step: 71600 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:17:53,263-Speed 10573.79 samples/sec Loss 3.2038 LearningRate 0.0099 Epoch: 17 Global Step: 71610 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:17:56,778-Speed 11660.19 samples/sec Loss 3.1688 LearningRate 0.0099 Epoch: 17 Global Step: 71620 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:18:00,420-Speed 11249.42 samples/sec Loss 3.2080 LearningRate 0.0099 Epoch: 17 Global Step: 71630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:04,191-Speed 10866.10 samples/sec Loss 3.1900 LearningRate 0.0099 Epoch: 17 Global Step: 71640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:07,861-Speed 11164.13 samples/sec Loss 3.1896 LearningRate 0.0099 Epoch: 17 Global Step: 71650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:11,340-Speed 11775.94 samples/sec Loss 3.1352 LearningRate 0.0098 Epoch: 17 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:14,842-Speed 11696.75 samples/sec Loss 3.2125 LearningRate 0.0098 Epoch: 17 Global Step: 71670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:18,670-Speed 10705.32 samples/sec Loss 3.1859 LearningRate 0.0098 Epoch: 17 Global Step: 71680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:22,226-Speed 11520.19 samples/sec Loss 3.2111 LearningRate 0.0098 Epoch: 17 Global Step: 71690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:25,792-Speed 11490.39 samples/sec Loss 3.2124 LearningRate 0.0098 Epoch: 17 Global Step: 71700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:29,210-Speed 11987.63 samples/sec Loss 3.1844 LearningRate 0.0098 Epoch: 17 Global Step: 71710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:32,719-Speed 11673.60 samples/sec Loss 3.1646 LearningRate 0.0097 Epoch: 17 Global Step: 71720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:36,229-Speed 11671.95 samples/sec Loss 3.1843 LearningRate 0.0097 Epoch: 17 Global Step: 71730 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:18:39,943-Speed 11033.51 samples/sec Loss 3.2299 LearningRate 0.0097 Epoch: 17 Global Step: 71740 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:18:43,438-Speed 11721.44 samples/sec Loss 3.1949 LearningRate 0.0097 Epoch: 17 Global Step: 71750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:46,997-Speed 11513.33 samples/sec Loss 3.2234 LearningRate 0.0097 Epoch: 17 Global Step: 71760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:50,640-Speed 11247.05 samples/sec Loss 3.2160 LearningRate 0.0097 Epoch: 17 Global Step: 71770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:54,598-Speed 10348.92 samples/sec Loss 3.2036 LearningRate 0.0096 Epoch: 17 Global Step: 71780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:18:58,539-Speed 10396.36 samples/sec Loss 3.2443 LearningRate 0.0096 Epoch: 17 Global Step: 71790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:19:02,418-Speed 10562.95 samples/sec Loss 3.2268 LearningRate 0.0096 Epoch: 17 Global Step: 71800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:19:06,122-Speed 11059.29 samples/sec Loss 3.2273 LearningRate 0.0096 Epoch: 17 Global Step: 71810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:19:09,626-Speed 11691.57 samples/sec Loss 3.1587 LearningRate 0.0096 Epoch: 17 Global Step: 71820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:19:13,263-Speed 11265.31 samples/sec Loss 3.2161 LearningRate 0.0096 Epoch: 17 Global Step: 71830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:19:17,074-Speed 10751.41 samples/sec Loss 3.1995 LearningRate 0.0095 Epoch: 17 Global Step: 71840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:19:21,226-Speed 9868.24 samples/sec Loss 3.1905 LearningRate 0.0095 Epoch: 17 Global Step: 71850 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:24,788-Speed 11518.09 samples/sec Loss 3.2054 LearningRate 0.0095 Epoch: 17 Global Step: 71860 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:28,340-Speed 11536.84 samples/sec Loss 3.1979 LearningRate 0.0095 Epoch: 17 Global Step: 71870 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:31,905-Speed 11491.71 samples/sec Loss 3.2145 LearningRate 0.0095 Epoch: 17 Global Step: 71880 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:35,422-Speed 11648.46 samples/sec Loss 3.1841 LearningRate 0.0095 Epoch: 17 Global Step: 71890 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:39,082-Speed 11192.23 samples/sec Loss 3.1860 LearningRate 0.0094 Epoch: 17 Global Step: 71900 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:42,719-Speed 11267.12 samples/sec Loss 3.2240 LearningRate 0.0094 Epoch: 17 Global Step: 71910 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:46,317-Speed 11385.80 samples/sec Loss 3.2643 LearningRate 0.0094 Epoch: 17 Global Step: 71920 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:49,937-Speed 11319.55 samples/sec Loss 3.2174 LearningRate 0.0094 Epoch: 17 Global Step: 71930 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:19:53,817-Speed 10557.36 samples/sec Loss 3.2280 LearningRate 0.0094 Epoch: 17 Global Step: 71940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:19:57,705-Speed 10538.86 samples/sec Loss 3.2282 LearningRate 0.0094 Epoch: 17 Global Step: 71950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:01,419-Speed 11034.70 samples/sec Loss 3.2222 LearningRate 0.0093 Epoch: 17 Global Step: 71960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:04,949-Speed 11605.64 samples/sec Loss 3.2314 LearningRate 0.0093 Epoch: 17 Global Step: 71970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:08,546-Speed 11387.96 samples/sec Loss 3.1852 LearningRate 0.0093 Epoch: 17 Global Step: 71980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:12,253-Speed 11052.28 samples/sec Loss 3.2158 LearningRate 0.0093 Epoch: 17 Global Step: 71990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:16,161-Speed 10485.66 samples/sec Loss 3.1925 LearningRate 0.0093 Epoch: 17 Global Step: 72000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:20,417-Speed 9627.52 samples/sec Loss 3.2345 LearningRate 0.0093 Epoch: 17 Global Step: 72010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:24,132-Speed 11026.01 samples/sec Loss 3.2212 LearningRate 0.0093 Epoch: 17 Global Step: 72020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:27,706-Speed 11463.54 samples/sec Loss 3.2260 LearningRate 0.0092 Epoch: 17 Global Step: 72030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:31,274-Speed 11484.80 samples/sec Loss 3.1974 LearningRate 0.0092 Epoch: 17 Global Step: 72040 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:20:35,045-Speed 10864.45 samples/sec Loss 3.2122 LearningRate 0.0092 Epoch: 17 Global Step: 72050 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:20:38,668-Speed 11308.19 samples/sec Loss 3.2515 LearningRate 0.0092 Epoch: 17 Global Step: 72060 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:20:42,284-Speed 11331.74 samples/sec Loss 3.1569 LearningRate 0.0092 Epoch: 17 Global Step: 72070 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:20:46,160-Speed 10570.31 samples/sec Loss 3.2355 LearningRate 0.0092 Epoch: 17 Global Step: 72080 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:20:49,881-Speed 11024.93 samples/sec Loss 3.2387 LearningRate 0.0091 Epoch: 17 Global Step: 72090 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:20:53,399-Speed 11645.49 samples/sec Loss 3.2391 LearningRate 0.0091 Epoch: 17 Global Step: 72100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:20:57,155-Speed 10910.97 samples/sec Loss 3.2389 LearningRate 0.0091 Epoch: 17 Global Step: 72110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:00,794-Speed 11255.49 samples/sec Loss 3.2096 LearningRate 0.0091 Epoch: 17 Global Step: 72120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:04,437-Speed 11244.97 samples/sec Loss 3.2294 LearningRate 0.0091 Epoch: 17 Global Step: 72130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:07,872-Speed 11927.71 samples/sec Loss 3.2579 LearningRate 0.0091 Epoch: 17 Global Step: 72140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:11,519-Speed 11236.97 samples/sec Loss 3.2156 LearningRate 0.0090 Epoch: 17 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:15,251-Speed 10976.70 samples/sec Loss 3.2150 LearningRate 0.0090 Epoch: 17 Global Step: 72160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:19,443-Speed 9773.34 samples/sec Loss 3.2201 LearningRate 0.0090 Epoch: 17 Global Step: 72170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:23,043-Speed 11381.98 samples/sec Loss 3.2351 LearningRate 0.0090 Epoch: 17 Global Step: 72180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:26,665-Speed 11314.19 samples/sec Loss 3.2103 LearningRate 0.0090 Epoch: 17 Global Step: 72190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:30,389-Speed 11003.15 samples/sec Loss 3.2347 LearningRate 0.0090 Epoch: 17 Global Step: 72200 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:21:34,050-Speed 11189.70 samples/sec Loss 3.2145 LearningRate 0.0089 Epoch: 17 Global Step: 72210 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:21:37,538-Speed 11746.66 samples/sec Loss 3.2024 LearningRate 0.0089 Epoch: 17 Global Step: 72220 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:21:41,256-Speed 11019.49 samples/sec Loss 3.2319 LearningRate 0.0089 Epoch: 17 Global Step: 72230 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:21:44,978-Speed 11007.01 samples/sec Loss 3.2448 LearningRate 0.0089 Epoch: 17 Global Step: 72240 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:21:48,593-Speed 11332.27 samples/sec Loss 3.1992 LearningRate 0.0089 Epoch: 17 Global Step: 72250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:52,276-Speed 11123.97 samples/sec Loss 3.2904 LearningRate 0.0089 Epoch: 17 Global Step: 72260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:55,909-Speed 11279.07 samples/sec Loss 3.2199 LearningRate 0.0088 Epoch: 17 Global Step: 72270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:21:59,424-Speed 11654.93 samples/sec Loss 3.2196 LearningRate 0.0088 Epoch: 17 Global Step: 72280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:02,938-Speed 11659.50 samples/sec Loss 3.2090 LearningRate 0.0088 Epoch: 17 Global Step: 72290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:06,772-Speed 10687.83 samples/sec Loss 3.2289 LearningRate 0.0088 Epoch: 17 Global Step: 72300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:10,339-Speed 11484.41 samples/sec Loss 3.2163 LearningRate 0.0088 Epoch: 17 Global Step: 72310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:14,018-Speed 11137.37 samples/sec Loss 3.2311 LearningRate 0.0088 Epoch: 17 Global Step: 72320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:17,647-Speed 11291.11 samples/sec Loss 3.2415 LearningRate 0.0088 Epoch: 17 Global Step: 72330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:22,026-Speed 9356.91 samples/sec Loss 3.2705 LearningRate 0.0087 Epoch: 17 Global Step: 72340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:25,620-Speed 11396.46 samples/sec Loss 3.2593 LearningRate 0.0087 Epoch: 17 Global Step: 72350 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:22:29,419-Speed 10786.08 samples/sec Loss 3.2048 LearningRate 0.0087 Epoch: 17 Global Step: 72360 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:22:33,414-Speed 10255.94 samples/sec Loss 3.2405 LearningRate 0.0087 Epoch: 17 Global Step: 72370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:37,077-Speed 11183.60 samples/sec Loss 3.2430 LearningRate 0.0087 Epoch: 17 Global Step: 72380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:40,882-Speed 10768.00 samples/sec Loss 3.2165 LearningRate 0.0087 Epoch: 17 Global Step: 72390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:44,598-Speed 11025.99 samples/sec Loss 3.2438 LearningRate 0.0086 Epoch: 17 Global Step: 72400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:48,220-Speed 11311.87 samples/sec Loss 3.2608 LearningRate 0.0086 Epoch: 17 Global Step: 72410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:51,762-Speed 11567.65 samples/sec Loss 3.2384 LearningRate 0.0086 Epoch: 17 Global Step: 72420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:55,309-Speed 11549.96 samples/sec Loss 3.2809 LearningRate 0.0086 Epoch: 17 Global Step: 72430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:22:59,111-Speed 10775.78 samples/sec Loss 3.2478 LearningRate 0.0086 Epoch: 17 Global Step: 72440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:02,756-Speed 11239.33 samples/sec Loss 3.2297 LearningRate 0.0086 Epoch: 17 Global Step: 72450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:06,438-Speed 11128.55 samples/sec Loss 3.2172 LearningRate 0.0086 Epoch: 17 Global Step: 72460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:10,143-Speed 11059.08 samples/sec Loss 3.2368 LearningRate 0.0085 Epoch: 17 Global Step: 72470 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:23:14,345-Speed 9749.68 samples/sec Loss 3.2501 LearningRate 0.0085 Epoch: 17 Global Step: 72480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:18,115-Speed 10866.77 samples/sec Loss 3.2522 LearningRate 0.0085 Epoch: 17 Global Step: 72490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:22,686-Speed 8963.36 samples/sec Loss 3.2574 LearningRate 0.0085 Epoch: 17 Global Step: 72500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:26,316-Speed 11286.28 samples/sec Loss 3.2343 LearningRate 0.0085 Epoch: 17 Global Step: 72510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:30,209-Speed 10522.95 samples/sec Loss 3.2495 LearningRate 0.0085 Epoch: 17 Global Step: 72520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:33,871-Speed 11190.07 samples/sec Loss 3.2701 LearningRate 0.0084 Epoch: 17 Global Step: 72530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:37,574-Speed 11063.63 samples/sec Loss 3.2459 LearningRate 0.0084 Epoch: 17 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:41,327-Speed 10913.68 samples/sec Loss 3.2973 LearningRate 0.0084 Epoch: 17 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:45,091-Speed 10885.48 samples/sec Loss 3.2377 LearningRate 0.0084 Epoch: 17 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:48,695-Speed 11368.04 samples/sec Loss 3.2433 LearningRate 0.0084 Epoch: 17 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:23:52,220-Speed 11625.27 samples/sec Loss 3.2479 LearningRate 0.0084 Epoch: 17 Global Step: 72580 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:23:55,817-Speed 11394.51 samples/sec Loss 3.2376 LearningRate 0.0083 Epoch: 17 Global Step: 72590 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:23:59,539-Speed 11006.96 samples/sec Loss 3.2604 LearningRate 0.0083 Epoch: 17 Global Step: 72600 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:03,174-Speed 11271.38 samples/sec Loss 3.2289 LearningRate 0.0083 Epoch: 17 Global Step: 72610 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:06,763-Speed 11417.57 samples/sec Loss 3.2829 LearningRate 0.0083 Epoch: 17 Global Step: 72620 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:10,274-Speed 11671.22 samples/sec Loss 3.2669 LearningRate 0.0083 Epoch: 17 Global Step: 72630 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:14,203-Speed 10426.59 samples/sec Loss 3.2422 LearningRate 0.0083 Epoch: 17 Global Step: 72640 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:18,258-Speed 10103.52 samples/sec Loss 3.2475 LearningRate 0.0083 Epoch: 17 Global Step: 72650 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:22,101-Speed 10657.99 samples/sec Loss 3.1866 LearningRate 0.0082 Epoch: 17 Global Step: 72660 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:25,702-Speed 11378.90 samples/sec Loss 3.2485 LearningRate 0.0082 Epoch: 17 Global Step: 72670 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:29,454-Speed 10920.14 samples/sec Loss 3.2763 LearningRate 0.0082 Epoch: 17 Global Step: 72680 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:33,340-Speed 10543.62 samples/sec Loss 3.2547 LearningRate 0.0082 Epoch: 17 Global Step: 72690 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:36,865-Speed 11620.95 samples/sec Loss 3.2873 LearningRate 0.0082 Epoch: 17 Global Step: 72700 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:40,592-Speed 10991.51 samples/sec Loss 3.2045 LearningRate 0.0082 Epoch: 17 Global Step: 72710 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:24:45,020-Speed 9252.60 samples/sec Loss 3.2794 LearningRate 0.0082 Epoch: 17 Global Step: 72720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:24:48,834-Speed 10741.80 samples/sec Loss 3.2262 LearningRate 0.0081 Epoch: 17 Global Step: 72730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:24:52,490-Speed 11206.23 samples/sec Loss 3.2134 LearningRate 0.0081 Epoch: 17 Global Step: 72740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:24:56,400-Speed 10476.79 samples/sec Loss 3.2382 LearningRate 0.0081 Epoch: 17 Global Step: 72750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:24:59,913-Speed 11664.73 samples/sec Loss 3.2511 LearningRate 0.0081 Epoch: 17 Global Step: 72760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:25:03,525-Speed 11342.40 samples/sec Loss 3.2041 LearningRate 0.0081 Epoch: 17 Global Step: 72770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:25:07,489-Speed 10336.21 samples/sec Loss 3.2406 LearningRate 0.0081 Epoch: 17 Global Step: 72780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:25:11,191-Speed 11068.31 samples/sec Loss 3.2498 LearningRate 0.0080 Epoch: 17 Global Step: 72790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:25:14,990-Speed 10782.49 samples/sec Loss 3.2590 LearningRate 0.0080 Epoch: 17 Global Step: 72800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:25:18,719-Speed 10986.08 samples/sec Loss 3.2666 LearningRate 0.0080 Epoch: 17 Global Step: 72810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:25:22,395-Speed 11148.89 samples/sec Loss 3.2268 LearningRate 0.0080 Epoch: 17 Global Step: 72820 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:25,874-Speed 11773.84 samples/sec Loss 3.2166 LearningRate 0.0080 Epoch: 17 Global Step: 72830 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:29,298-Speed 11965.84 samples/sec Loss 3.2417 LearningRate 0.0080 Epoch: 17 Global Step: 72840 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:32,747-Speed 11877.73 samples/sec Loss 3.2407 LearningRate 0.0080 Epoch: 17 Global Step: 72850 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:36,564-Speed 10735.92 samples/sec Loss 3.2600 LearningRate 0.0079 Epoch: 17 Global Step: 72860 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:40,445-Speed 10556.63 samples/sec Loss 3.2396 LearningRate 0.0079 Epoch: 17 Global Step: 72870 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:43,999-Speed 11528.22 samples/sec Loss 3.2913 LearningRate 0.0079 Epoch: 17 Global Step: 72880 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:48,421-Speed 9264.79 samples/sec Loss 3.2683 LearningRate 0.0079 Epoch: 17 Global Step: 72890 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:51,882-Speed 11836.32 samples/sec Loss 3.2503 LearningRate 0.0079 Epoch: 17 Global Step: 72900 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:25:55,412-Speed 11607.25 samples/sec Loss 3.2581 LearningRate 0.0079 Epoch: 17 Global Step: 72910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:25:59,028-Speed 11330.42 samples/sec Loss 3.2935 LearningRate 0.0078 Epoch: 17 Global Step: 72920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:02,590-Speed 11501.09 samples/sec Loss 3.2334 LearningRate 0.0078 Epoch: 17 Global Step: 72930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:06,120-Speed 11606.77 samples/sec Loss 3.2325 LearningRate 0.0078 Epoch: 17 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:09,818-Speed 11079.79 samples/sec Loss 3.2645 LearningRate 0.0078 Epoch: 17 Global Step: 72950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:13,525-Speed 11052.89 samples/sec Loss 3.2844 LearningRate 0.0078 Epoch: 17 Global Step: 72960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:17,220-Speed 11085.63 samples/sec Loss 3.2463 LearningRate 0.0078 Epoch: 17 Global Step: 72970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:20,857-Speed 11265.48 samples/sec Loss 3.2063 LearningRate 0.0078 Epoch: 17 Global Step: 72980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:24,691-Speed 10685.10 samples/sec Loss 3.2314 LearningRate 0.0077 Epoch: 17 Global Step: 72990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:28,365-Speed 11152.81 samples/sec Loss 3.2791 LearningRate 0.0077 Epoch: 17 Global Step: 73000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:26:32,046-Speed 11131.11 samples/sec Loss 3.2948 LearningRate 0.0077 Epoch: 17 Global Step: 73010 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:26:35,605-Speed 11511.60 samples/sec Loss 3.2436 LearningRate 0.0077 Epoch: 17 Global Step: 73020 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:26:39,230-Speed 11303.65 samples/sec Loss 3.3085 LearningRate 0.0077 Epoch: 17 Global Step: 73030 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:26:42,977-Speed 10933.25 samples/sec Loss 3.2472 LearningRate 0.0077 Epoch: 17 Global Step: 73040 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:26:46,525-Speed 11545.52 samples/sec Loss 3.2657 LearningRate 0.0077 Epoch: 17 Global Step: 73050 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:26:50,351-Speed 10710.72 samples/sec Loss 3.2990 LearningRate 0.0076 Epoch: 17 Global Step: 73060 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:26:54,232-Speed 10555.46 samples/sec Loss 3.2691 LearningRate 0.0076 Epoch: 17 Global Step: 73070 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:26:57,846-Speed 11336.81 samples/sec Loss 3.2554 LearningRate 0.0076 Epoch: 17 Global Step: 73080 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:01,593-Speed 10933.66 samples/sec Loss 3.2308 LearningRate 0.0076 Epoch: 17 Global Step: 73090 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:05,128-Speed 11590.27 samples/sec Loss 3.2345 LearningRate 0.0076 Epoch: 17 Global Step: 73100 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:08,616-Speed 11746.16 samples/sec Loss 3.2937 LearningRate 0.0076 Epoch: 17 Global Step: 73110 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:12,735-Speed 9949.14 samples/sec Loss 3.2362 LearningRate 0.0076 Epoch: 17 Global Step: 73120 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:16,487-Speed 10916.23 samples/sec Loss 3.2833 LearningRate 0.0075 Epoch: 17 Global Step: 73130 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:19,973-Speed 11754.53 samples/sec Loss 3.3001 LearningRate 0.0075 Epoch: 17 Global Step: 73140 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:23,759-Speed 10820.92 samples/sec Loss 3.2691 LearningRate 0.0075 Epoch: 17 Global Step: 73150 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:27,671-Speed 10475.20 samples/sec Loss 3.2666 LearningRate 0.0075 Epoch: 17 Global Step: 73160 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:31,416-Speed 10939.18 samples/sec Loss 3.2570 LearningRate 0.0075 Epoch: 17 Global Step: 73170 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:34,946-Speed 11609.74 samples/sec Loss 3.2529 LearningRate 0.0075 Epoch: 17 Global Step: 73180 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:38,761-Speed 10736.80 samples/sec Loss 3.2307 LearningRate 0.0075 Epoch: 17 Global Step: 73190 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:42,312-Speed 11540.09 samples/sec Loss 3.2925 LearningRate 0.0074 Epoch: 17 Global Step: 73200 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:45,972-Speed 11191.68 samples/sec Loss 3.2496 LearningRate 0.0074 Epoch: 17 Global Step: 73210 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:49,704-Speed 10980.03 samples/sec Loss 3.2796 LearningRate 0.0074 Epoch: 17 Global Step: 73220 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:53,190-Speed 11751.84 samples/sec Loss 3.3189 LearningRate 0.0074 Epoch: 17 Global Step: 73230 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:27:57,038-Speed 10646.64 samples/sec Loss 3.2219 LearningRate 0.0074 Epoch: 17 Global Step: 73240 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:28:00,673-Speed 11270.07 samples/sec Loss 3.2617 LearningRate 0.0074 Epoch: 17 Global Step: 73250 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:28:04,304-Speed 11286.77 samples/sec Loss 3.2507 LearningRate 0.0074 Epoch: 17 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:07,926-Speed 11309.05 samples/sec Loss 3.2783 LearningRate 0.0073 Epoch: 17 Global Step: 73270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:11,773-Speed 10651.74 samples/sec Loss 3.2740 LearningRate 0.0073 Epoch: 17 Global Step: 73280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:15,837-Speed 10079.38 samples/sec Loss 3.2818 LearningRate 0.0073 Epoch: 17 Global Step: 73290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:20,055-Speed 9711.99 samples/sec Loss 3.2286 LearningRate 0.0073 Epoch: 17 Global Step: 73300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:23,543-Speed 11750.16 samples/sec Loss 3.2781 LearningRate 0.0073 Epoch: 17 Global Step: 73310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:27,417-Speed 10573.86 samples/sec Loss 3.2530 LearningRate 0.0073 Epoch: 17 Global Step: 73320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:31,083-Speed 11176.35 samples/sec Loss 3.2814 LearningRate 0.0072 Epoch: 17 Global Step: 73330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:34,565-Speed 11766.16 samples/sec Loss 3.2733 LearningRate 0.0072 Epoch: 17 Global Step: 73340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:38,002-Speed 11919.57 samples/sec Loss 3.2794 LearningRate 0.0072 Epoch: 17 Global Step: 73350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:28:41,601-Speed 11384.80 samples/sec Loss 3.2672 LearningRate 0.0072 Epoch: 17 Global Step: 73360 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:28:45,146-Speed 11557.03 samples/sec Loss 3.2879 LearningRate 0.0072 Epoch: 17 Global Step: 73370 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:28:49,818-Speed 8768.33 samples/sec Loss 3.2903 LearningRate 0.0072 Epoch: 17 Global Step: 73380 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:28:53,265-Speed 11883.92 samples/sec Loss 3.2432 LearningRate 0.0072 Epoch: 17 Global Step: 73390 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:28:56,735-Speed 11809.48 samples/sec Loss 3.3188 LearningRate 0.0071 Epoch: 17 Global Step: 73400 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:29:00,177-Speed 11903.88 samples/sec Loss 3.2306 LearningRate 0.0071 Epoch: 17 Global Step: 73410 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:29:03,658-Speed 11769.66 samples/sec Loss 3.2793 LearningRate 0.0071 Epoch: 17 Global Step: 73420 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:29:07,327-Speed 11165.04 samples/sec Loss 3.2418 LearningRate 0.0071 Epoch: 17 Global Step: 73430 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:29:11,161-Speed 10687.14 samples/sec Loss 3.2679 LearningRate 0.0071 Epoch: 17 Global Step: 73440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:14,971-Speed 10752.28 samples/sec Loss 3.2236 LearningRate 0.0071 Epoch: 17 Global Step: 73450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:19,158-Speed 9785.55 samples/sec Loss 3.2816 LearningRate 0.0071 Epoch: 17 Global Step: 73460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:22,874-Speed 11026.46 samples/sec Loss 3.2873 LearningRate 0.0071 Epoch: 17 Global Step: 73470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:26,330-Speed 11853.42 samples/sec Loss 3.2563 LearningRate 0.0070 Epoch: 17 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:30,526-Speed 9763.48 samples/sec Loss 3.2275 LearningRate 0.0070 Epoch: 17 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:34,234-Speed 11050.44 samples/sec Loss 3.2418 LearningRate 0.0070 Epoch: 17 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:38,200-Speed 10329.70 samples/sec Loss 3.2718 LearningRate 0.0070 Epoch: 17 Global Step: 73510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:41,797-Speed 11393.42 samples/sec Loss 3.2750 LearningRate 0.0070 Epoch: 17 Global Step: 73520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:45,238-Speed 11905.03 samples/sec Loss 3.2771 LearningRate 0.0070 Epoch: 17 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:29:48,933-Speed 11087.10 samples/sec Loss 3.2772 LearningRate 0.0070 Epoch: 17 Global Step: 73540 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:29:52,695-Speed 10891.62 samples/sec Loss 3.2964 LearningRate 0.0069 Epoch: 17 Global Step: 73550 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:29:56,651-Speed 10357.26 samples/sec Loss 3.2951 LearningRate 0.0069 Epoch: 17 Global Step: 73560 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:30:01,059-Speed 9293.54 samples/sec Loss 3.2588 LearningRate 0.0069 Epoch: 17 Global Step: 73570 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:30:04,767-Speed 11047.84 samples/sec Loss 3.2776 LearningRate 0.0069 Epoch: 17 Global Step: 73580 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:30:08,383-Speed 11332.28 samples/sec Loss 3.2923 LearningRate 0.0069 Epoch: 17 Global Step: 73590 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:30:12,146-Speed 10889.44 samples/sec Loss 3.2536 LearningRate 0.0069 Epoch: 17 Global Step: 73600 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:30:15,771-Speed 11300.81 samples/sec Loss 3.2838 LearningRate 0.0069 Epoch: 17 Global Step: 73610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:19,371-Speed 11380.19 samples/sec Loss 3.2772 LearningRate 0.0068 Epoch: 17 Global Step: 73620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:23,740-Speed 9377.60 samples/sec Loss 3.2428 LearningRate 0.0068 Epoch: 17 Global Step: 73630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:27,361-Speed 11315.16 samples/sec Loss 3.2651 LearningRate 0.0068 Epoch: 17 Global Step: 73640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:31,065-Speed 11061.64 samples/sec Loss 3.2250 LearningRate 0.0068 Epoch: 17 Global Step: 73650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:34,522-Speed 11852.22 samples/sec Loss 3.2786 LearningRate 0.0068 Epoch: 17 Global Step: 73660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:38,021-Speed 11707.72 samples/sec Loss 3.2290 LearningRate 0.0068 Epoch: 17 Global Step: 73670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:41,694-Speed 11155.33 samples/sec Loss 3.2956 LearningRate 0.0068 Epoch: 17 Global Step: 73680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:45,187-Speed 11730.63 samples/sec Loss 3.2612 LearningRate 0.0067 Epoch: 17 Global Step: 73690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:48,836-Speed 11226.57 samples/sec Loss 3.2588 LearningRate 0.0067 Epoch: 17 Global Step: 73700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:30:52,757-Speed 10449.10 samples/sec Loss 3.2705 LearningRate 0.0067 Epoch: 17 Global Step: 73710 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:30:56,443-Speed 11117.51 samples/sec Loss 3.2521 LearningRate 0.0067 Epoch: 17 Global Step: 73720 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:00,091-Speed 11228.88 samples/sec Loss 3.2811 LearningRate 0.0067 Epoch: 17 Global Step: 73730 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:03,658-Speed 11485.93 samples/sec Loss 3.2578 LearningRate 0.0067 Epoch: 17 Global Step: 73740 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:07,417-Speed 10901.52 samples/sec Loss 3.2889 LearningRate 0.0067 Epoch: 17 Global Step: 73750 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:11,273-Speed 10625.35 samples/sec Loss 3.2547 LearningRate 0.0066 Epoch: 17 Global Step: 73760 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:14,939-Speed 11175.84 samples/sec Loss 3.2546 LearningRate 0.0066 Epoch: 17 Global Step: 73770 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:18,656-Speed 11019.54 samples/sec Loss 3.2183 LearningRate 0.0066 Epoch: 17 Global Step: 73780 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:22,199-Speed 11566.46 samples/sec Loss 3.3058 LearningRate 0.0066 Epoch: 17 Global Step: 73790 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:26,371-Speed 9819.63 samples/sec Loss 3.2863 LearningRate 0.0066 Epoch: 17 Global Step: 73800 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:29,903-Speed 11598.44 samples/sec Loss 3.2714 LearningRate 0.0066 Epoch: 17 Global Step: 73810 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:31:33,462-Speed 11512.55 samples/sec Loss 3.2594 LearningRate 0.0066 Epoch: 17 Global Step: 73820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:31:37,134-Speed 11156.66 samples/sec Loss 3.2881 LearningRate 0.0066 Epoch: 17 Global Step: 73830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:31:40,689-Speed 11524.58 samples/sec Loss 3.2698 LearningRate 0.0065 Epoch: 17 Global Step: 73840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:31:44,163-Speed 11793.56 samples/sec Loss 3.2703 LearningRate 0.0065 Epoch: 17 Global Step: 73850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:31:47,955-Speed 10803.71 samples/sec Loss 3.2632 LearningRate 0.0065 Epoch: 17 Global Step: 73860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:31:51,625-Speed 11165.87 samples/sec Loss 3.2529 LearningRate 0.0065 Epoch: 17 Global Step: 73870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:31:55,302-Speed 11141.20 samples/sec Loss 3.2716 LearningRate 0.0065 Epoch: 17 Global Step: 73880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:31:58,904-Speed 11374.71 samples/sec Loss 3.2595 LearningRate 0.0065 Epoch: 17 Global Step: 73890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:02,625-Speed 11008.55 samples/sec Loss 3.2850 LearningRate 0.0065 Epoch: 17 Global Step: 73900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:06,358-Speed 10978.06 samples/sec Loss 3.2812 LearningRate 0.0064 Epoch: 17 Global Step: 73910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:09,985-Speed 11293.07 samples/sec Loss 3.2920 LearningRate 0.0064 Epoch: 17 Global Step: 73920 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:32:13,573-Speed 11419.89 samples/sec Loss 3.2570 LearningRate 0.0064 Epoch: 17 Global Step: 73930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:17,578-Speed 10229.46 samples/sec Loss 3.2383 LearningRate 0.0064 Epoch: 17 Global Step: 73940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:21,368-Speed 10810.03 samples/sec Loss 3.2793 LearningRate 0.0064 Epoch: 17 Global Step: 73950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:24,993-Speed 11302.88 samples/sec Loss 3.2613 LearningRate 0.0064 Epoch: 17 Global Step: 73960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:29,221-Speed 9689.28 samples/sec Loss 3.2601 LearningRate 0.0064 Epoch: 17 Global Step: 73970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:32,794-Speed 11468.65 samples/sec Loss 3.2853 LearningRate 0.0063 Epoch: 17 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:36,446-Speed 11216.99 samples/sec Loss 3.2769 LearningRate 0.0063 Epoch: 17 Global Step: 73990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:40,075-Speed 11291.01 samples/sec Loss 3.2867 LearningRate 0.0063 Epoch: 17 Global Step: 74000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:43,676-Speed 11376.78 samples/sec Loss 3.2340 LearningRate 0.0063 Epoch: 17 Global Step: 74010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:47,280-Speed 11368.82 samples/sec Loss 3.3021 LearningRate 0.0063 Epoch: 17 Global Step: 74020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:50,941-Speed 11189.78 samples/sec Loss 3.2767 LearningRate 0.0063 Epoch: 17 Global Step: 74030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:54,574-Speed 11277.84 samples/sec Loss 3.2161 LearningRate 0.0063 Epoch: 17 Global Step: 74040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:32:58,300-Speed 10997.47 samples/sec Loss 3.2631 LearningRate 0.0063 Epoch: 17 Global Step: 74050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:01,795-Speed 11725.02 samples/sec Loss 3.3033 LearningRate 0.0062 Epoch: 17 Global Step: 74060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:05,467-Speed 11157.43 samples/sec Loss 3.2812 LearningRate 0.0062 Epoch: 17 Global Step: 74070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:09,057-Speed 11409.04 samples/sec Loss 3.2413 LearningRate 0.0062 Epoch: 17 Global Step: 74080 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:12,948-Speed 10530.03 samples/sec Loss 3.2883 LearningRate 0.0062 Epoch: 17 Global Step: 74090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:16,870-Speed 10447.94 samples/sec Loss 3.2455 LearningRate 0.0062 Epoch: 17 Global Step: 74100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:20,498-Speed 11293.79 samples/sec Loss 3.2975 LearningRate 0.0062 Epoch: 17 Global Step: 74110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:23,979-Speed 11769.33 samples/sec Loss 3.2682 LearningRate 0.0062 Epoch: 17 Global Step: 74120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:27,772-Speed 10800.28 samples/sec Loss 3.2518 LearningRate 0.0061 Epoch: 17 Global Step: 74130 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:33:32,195-Speed 9262.40 samples/sec Loss 3.2672 LearningRate 0.0061 Epoch: 17 Global Step: 74140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:36,084-Speed 10535.37 samples/sec Loss 3.2981 LearningRate 0.0061 Epoch: 17 Global Step: 74150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:39,785-Speed 11072.61 samples/sec Loss 3.2936 LearningRate 0.0061 Epoch: 17 Global Step: 74160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:43,471-Speed 11113.85 samples/sec Loss 3.3198 LearningRate 0.0061 Epoch: 17 Global Step: 74170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:47,419-Speed 10376.64 samples/sec Loss 3.2914 LearningRate 0.0061 Epoch: 17 Global Step: 74180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:51,085-Speed 11175.98 samples/sec Loss 3.2986 LearningRate 0.0061 Epoch: 17 Global Step: 74190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:54,678-Speed 11402.43 samples/sec Loss 3.2830 LearningRate 0.0061 Epoch: 17 Global Step: 74200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:33:58,250-Speed 11470.28 samples/sec Loss 3.2805 LearningRate 0.0060 Epoch: 17 Global Step: 74210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:01,737-Speed 11750.32 samples/sec Loss 3.2702 LearningRate 0.0060 Epoch: 17 Global Step: 74220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:05,572-Speed 10682.23 samples/sec Loss 3.2853 LearningRate 0.0060 Epoch: 17 Global Step: 74230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:09,353-Speed 10834.26 samples/sec Loss 3.2750 LearningRate 0.0060 Epoch: 17 Global Step: 74240 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:34:12,912-Speed 11513.32 samples/sec Loss 3.3028 LearningRate 0.0060 Epoch: 17 Global Step: 74250 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:34:16,764-Speed 10636.32 samples/sec Loss 3.2752 LearningRate 0.0060 Epoch: 17 Global Step: 74260 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:34:20,337-Speed 11466.30 samples/sec Loss 3.2871 LearningRate 0.0060 Epoch: 17 Global Step: 74270 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:34:23,836-Speed 11714.61 samples/sec Loss 3.2796 LearningRate 0.0060 Epoch: 17 Global Step: 74280 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:34:27,691-Speed 10628.04 samples/sec Loss 3.2765 LearningRate 0.0059 Epoch: 17 Global Step: 74290 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:34:31,580-Speed 10532.05 samples/sec Loss 3.2933 LearningRate 0.0059 Epoch: 17 Global Step: 74300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:35,846-Speed 9604.69 samples/sec Loss 3.2832 LearningRate 0.0059 Epoch: 17 Global Step: 74310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:39,561-Speed 11027.64 samples/sec Loss 3.2640 LearningRate 0.0059 Epoch: 17 Global Step: 74320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:43,180-Speed 11323.25 samples/sec Loss 3.3058 LearningRate 0.0059 Epoch: 17 Global Step: 74330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:46,740-Speed 11507.43 samples/sec Loss 3.2513 LearningRate 0.0059 Epoch: 17 Global Step: 74340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:50,199-Speed 11842.83 samples/sec Loss 3.2935 LearningRate 0.0059 Epoch: 17 Global Step: 74350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:53,718-Speed 11646.21 samples/sec Loss 3.2854 LearningRate 0.0058 Epoch: 17 Global Step: 74360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:34:57,347-Speed 11289.50 samples/sec Loss 3.2365 LearningRate 0.0058 Epoch: 17 Global Step: 74370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:00,988-Speed 11249.94 samples/sec Loss 3.2745 LearningRate 0.0058 Epoch: 17 Global Step: 74380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:04,501-Speed 11663.50 samples/sec Loss 3.2737 LearningRate 0.0058 Epoch: 17 Global Step: 74390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:07,953-Speed 11867.71 samples/sec Loss 3.2920 LearningRate 0.0058 Epoch: 17 Global Step: 74400 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:35:11,553-Speed 11380.93 samples/sec Loss 3.3172 LearningRate 0.0058 Epoch: 17 Global Step: 74410 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:35:15,356-Speed 10772.94 samples/sec Loss 3.2252 LearningRate 0.0058 Epoch: 17 Global Step: 74420 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:35:18,968-Speed 11344.52 samples/sec Loss 3.2924 LearningRate 0.0058 Epoch: 17 Global Step: 74430 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:35:22,442-Speed 11793.34 samples/sec Loss 3.2761 LearningRate 0.0057 Epoch: 17 Global Step: 74440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:26,223-Speed 10835.28 samples/sec Loss 3.2740 LearningRate 0.0057 Epoch: 17 Global Step: 74450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:29,853-Speed 11286.42 samples/sec Loss 3.2922 LearningRate 0.0057 Epoch: 17 Global Step: 74460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:33,382-Speed 11610.68 samples/sec Loss 3.2645 LearningRate 0.0057 Epoch: 17 Global Step: 74470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:38,211-Speed 8483.73 samples/sec Loss 3.3263 LearningRate 0.0057 Epoch: 17 Global Step: 74480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:41,648-Speed 11920.27 samples/sec Loss 3.3269 LearningRate 0.0057 Epoch: 17 Global Step: 74490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:45,178-Speed 11606.06 samples/sec Loss 3.2798 LearningRate 0.0057 Epoch: 17 Global Step: 74500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:48,720-Speed 11579.43 samples/sec Loss 3.2557 LearningRate 0.0057 Epoch: 17 Global Step: 74510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:52,508-Speed 10817.38 samples/sec Loss 3.2756 LearningRate 0.0056 Epoch: 17 Global Step: 74520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:56,227-Speed 11016.38 samples/sec Loss 3.2640 LearningRate 0.0056 Epoch: 17 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:35:59,871-Speed 11242.91 samples/sec Loss 3.2754 LearningRate 0.0056 Epoch: 17 Global Step: 74540 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:36:03,612-Speed 10949.76 samples/sec Loss 3.2749 LearningRate 0.0056 Epoch: 17 Global Step: 74550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:07,017-Speed 12033.48 samples/sec Loss 3.2486 LearningRate 0.0056 Epoch: 17 Global Step: 74560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:10,691-Speed 11151.51 samples/sec Loss 3.2906 LearningRate 0.0056 Epoch: 17 Global Step: 74570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:14,451-Speed 10896.74 samples/sec Loss 3.3293 LearningRate 0.0056 Epoch: 17 Global Step: 74580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:18,231-Speed 10840.21 samples/sec Loss 3.2461 LearningRate 0.0056 Epoch: 17 Global Step: 74590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:22,004-Speed 10856.40 samples/sec Loss 3.2976 LearningRate 0.0055 Epoch: 17 Global Step: 74600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:25,784-Speed 10838.91 samples/sec Loss 3.2263 LearningRate 0.0055 Epoch: 17 Global Step: 74610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:29,681-Speed 10514.91 samples/sec Loss 3.2915 LearningRate 0.0055 Epoch: 17 Global Step: 74620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:33,231-Speed 11540.67 samples/sec Loss 3.2586 LearningRate 0.0055 Epoch: 17 Global Step: 74630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:36,668-Speed 11919.92 samples/sec Loss 3.2884 LearningRate 0.0055 Epoch: 17 Global Step: 74640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:40,740-Speed 10061.64 samples/sec Loss 3.2726 LearningRate 0.0055 Epoch: 17 Global Step: 74650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:44,157-Speed 11987.87 samples/sec Loss 3.2850 LearningRate 0.0055 Epoch: 17 Global Step: 74660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:47,648-Speed 11737.64 samples/sec Loss 3.2523 LearningRate 0.0055 Epoch: 17 Global Step: 74670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:51,726-Speed 10047.61 samples/sec Loss 3.2965 LearningRate 0.0054 Epoch: 17 Global Step: 74680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:55,209-Speed 11762.36 samples/sec Loss 3.2914 LearningRate 0.0054 Epoch: 17 Global Step: 74690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:36:58,555-Speed 12243.51 samples/sec Loss 3.2576 LearningRate 0.0054 Epoch: 17 Global Step: 74700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:01,943-Speed 12092.51 samples/sec Loss 3.2748 LearningRate 0.0054 Epoch: 17 Global Step: 74710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:05,407-Speed 11827.38 samples/sec Loss 3.2794 LearningRate 0.0054 Epoch: 17 Global Step: 74720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:08,846-Speed 11915.50 samples/sec Loss 3.2705 LearningRate 0.0054 Epoch: 17 Global Step: 74730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:12,321-Speed 11788.16 samples/sec Loss 3.2714 LearningRate 0.0054 Epoch: 17 Global Step: 74740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:15,801-Speed 11776.16 samples/sec Loss 3.2821 LearningRate 0.0054 Epoch: 17 Global Step: 74750 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:37:19,689-Speed 10535.77 samples/sec Loss 3.2256 LearningRate 0.0053 Epoch: 17 Global Step: 74760 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:37:23,115-Speed 11960.14 samples/sec Loss 3.2467 LearningRate 0.0053 Epoch: 17 Global Step: 74770 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:37:26,757-Speed 11247.67 samples/sec Loss 3.3333 LearningRate 0.0053 Epoch: 17 Global Step: 74780 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:37:30,213-Speed 11856.92 samples/sec Loss 3.2532 LearningRate 0.0053 Epoch: 17 Global Step: 74790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:33,886-Speed 11155.02 samples/sec Loss 3.2864 LearningRate 0.0053 Epoch: 17 Global Step: 74800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:37,698-Speed 10751.62 samples/sec Loss 3.2578 LearningRate 0.0053 Epoch: 17 Global Step: 74810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:41,127-Speed 11949.66 samples/sec Loss 3.2733 LearningRate 0.0053 Epoch: 17 Global Step: 74820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:44,950-Speed 10715.12 samples/sec Loss 3.2822 LearningRate 0.0053 Epoch: 17 Global Step: 74830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:48,439-Speed 11742.48 samples/sec Loss 3.2964 LearningRate 0.0052 Epoch: 17 Global Step: 74840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:51,920-Speed 11772.14 samples/sec Loss 3.2919 LearningRate 0.0052 Epoch: 17 Global Step: 74850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:55,450-Speed 11607.34 samples/sec Loss 3.3008 LearningRate 0.0052 Epoch: 17 Global Step: 74860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:37:58,891-Speed 11906.96 samples/sec Loss 3.2932 LearningRate 0.0052 Epoch: 17 Global Step: 74870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:02,607-Speed 11025.04 samples/sec Loss 3.2760 LearningRate 0.0052 Epoch: 17 Global Step: 74880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:06,048-Speed 11904.68 samples/sec Loss 3.2297 LearningRate 0.0052 Epoch: 17 Global Step: 74890 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:38:09,616-Speed 11482.64 samples/sec Loss 3.3082 LearningRate 0.0052 Epoch: 17 Global Step: 74900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:13,097-Speed 11772.37 samples/sec Loss 3.3089 LearningRate 0.0052 Epoch: 17 Global Step: 74910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:16,846-Speed 10929.11 samples/sec Loss 3.2825 LearningRate 0.0051 Epoch: 17 Global Step: 74920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:20,241-Speed 12065.56 samples/sec Loss 3.2797 LearningRate 0.0051 Epoch: 17 Global Step: 74930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:23,595-Speed 12217.28 samples/sec Loss 3.2911 LearningRate 0.0051 Epoch: 17 Global Step: 74940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:27,114-Speed 11640.60 samples/sec Loss 3.2534 LearningRate 0.0051 Epoch: 17 Global Step: 74950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:30,529-Speed 11998.84 samples/sec Loss 3.2777 LearningRate 0.0051 Epoch: 17 Global Step: 74960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:34,259-Speed 10982.25 samples/sec Loss 3.2983 LearningRate 0.0051 Epoch: 17 Global Step: 74970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:37,683-Speed 11966.99 samples/sec Loss 3.2678 LearningRate 0.0051 Epoch: 17 Global Step: 74980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:41,295-Speed 11343.93 samples/sec Loss 3.2631 LearningRate 0.0051 Epoch: 17 Global Step: 74990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:38:45,732-Speed 9232.30 samples/sec Loss 3.2835 LearningRate 0.0051 Epoch: 17 Global Step: 75000 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:39:07,324-[lfw][75000]XNorm: 6.767198 Training: 2022-01-17 05:39:07,325-[lfw][75000]Accuracy-Flip: 0.99700+-0.00306 Training: 2022-01-17 05:39:07,325-[lfw][75000]Accuracy-Highest: 0.99733 Training: 2022-01-17 05:39:31,797-[cfp_fp][75000]XNorm: 5.759158 Training: 2022-01-17 05:39:31,798-[cfp_fp][75000]Accuracy-Flip: 0.97457+-0.00714 Training: 2022-01-17 05:39:31,798-[cfp_fp][75000]Accuracy-Highest: 0.97457 Training: 2022-01-17 05:39:52,865-[agedb_30][75000]XNorm: 6.480174 Training: 2022-01-17 05:39:52,865-[agedb_30][75000]Accuracy-Flip: 0.97317+-0.00689 Training: 2022-01-17 05:39:52,866-[agedb_30][75000]Accuracy-Highest: 0.97317 Training: 2022-01-17 05:39:56,259-Speed 580.78 samples/sec Loss 3.3034 LearningRate 0.0050 Epoch: 17 Global Step: 75010 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:39:59,648-Speed 12088.86 samples/sec Loss 3.2988 LearningRate 0.0050 Epoch: 17 Global Step: 75020 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:40:02,998-Speed 12228.61 samples/sec Loss 3.2618 LearningRate 0.0050 Epoch: 17 Global Step: 75030 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:40:06,372-Speed 12144.37 samples/sec Loss 3.2869 LearningRate 0.0050 Epoch: 17 Global Step: 75040 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:40:09,778-Speed 12026.61 samples/sec Loss 3.2925 LearningRate 0.0050 Epoch: 17 Global Step: 75050 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:40:13,128-Speed 12229.64 samples/sec Loss 3.3036 LearningRate 0.0050 Epoch: 17 Global Step: 75060 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:40:16,615-Speed 11749.72 samples/sec Loss 3.3014 LearningRate 0.0050 Epoch: 17 Global Step: 75070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:40:20,148-Speed 11599.08 samples/sec Loss 3.2391 LearningRate 0.0050 Epoch: 17 Global Step: 75080 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:40:24,915-Speed 8593.04 samples/sec Loss 3.2391 LearningRate 0.0049 Epoch: 17 Global Step: 75090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:40:57,624-Speed 1252.29 samples/sec Loss 3.1689 LearningRate 0.0049 Epoch: 18 Global Step: 75100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:01,446-Speed 10723.13 samples/sec Loss 2.9970 LearningRate 0.0049 Epoch: 18 Global Step: 75110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:06,132-Speed 8743.24 samples/sec Loss 3.0083 LearningRate 0.0049 Epoch: 18 Global Step: 75120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:10,212-Speed 10041.88 samples/sec Loss 2.9901 LearningRate 0.0049 Epoch: 18 Global Step: 75130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:14,823-Speed 8884.39 samples/sec Loss 2.9830 LearningRate 0.0049 Epoch: 18 Global Step: 75140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:18,500-Speed 11143.69 samples/sec Loss 3.0134 LearningRate 0.0049 Epoch: 18 Global Step: 75150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:22,260-Speed 10898.06 samples/sec Loss 2.9607 LearningRate 0.0049 Epoch: 18 Global Step: 75160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:26,031-Speed 10862.80 samples/sec Loss 3.0019 LearningRate 0.0049 Epoch: 18 Global Step: 75170 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:41:29,693-Speed 11187.16 samples/sec Loss 2.9661 LearningRate 0.0048 Epoch: 18 Global Step: 75180 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:41:33,482-Speed 10813.81 samples/sec Loss 2.9989 LearningRate 0.0048 Epoch: 18 Global Step: 75190 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:41:37,633-Speed 9871.51 samples/sec Loss 3.0236 LearningRate 0.0048 Epoch: 18 Global Step: 75200 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:41:41,593-Speed 10347.73 samples/sec Loss 2.9900 LearningRate 0.0048 Epoch: 18 Global Step: 75210 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:41:45,316-Speed 11003.50 samples/sec Loss 3.0174 LearningRate 0.0048 Epoch: 18 Global Step: 75220 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:41:48,912-Speed 11396.41 samples/sec Loss 2.9760 LearningRate 0.0048 Epoch: 18 Global Step: 75230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:52,480-Speed 11484.87 samples/sec Loss 2.9924 LearningRate 0.0048 Epoch: 18 Global Step: 75240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:56,145-Speed 11180.96 samples/sec Loss 2.9505 LearningRate 0.0048 Epoch: 18 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:41:59,920-Speed 10852.18 samples/sec Loss 2.9992 LearningRate 0.0047 Epoch: 18 Global Step: 75260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:03,772-Speed 10636.53 samples/sec Loss 2.9814 LearningRate 0.0047 Epoch: 18 Global Step: 75270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:07,382-Speed 11349.57 samples/sec Loss 2.9675 LearningRate 0.0047 Epoch: 18 Global Step: 75280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:11,086-Speed 11061.71 samples/sec Loss 2.9951 LearningRate 0.0047 Epoch: 18 Global Step: 75290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:14,562-Speed 11787.34 samples/sec Loss 2.9722 LearningRate 0.0047 Epoch: 18 Global Step: 75300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:18,124-Speed 11503.40 samples/sec Loss 2.9951 LearningRate 0.0047 Epoch: 18 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:21,724-Speed 11381.74 samples/sec Loss 3.0030 LearningRate 0.0047 Epoch: 18 Global Step: 75320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:25,810-Speed 10027.16 samples/sec Loss 2.9789 LearningRate 0.0047 Epoch: 18 Global Step: 75330 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:42:29,418-Speed 11355.75 samples/sec Loss 2.9818 LearningRate 0.0047 Epoch: 18 Global Step: 75340 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:42:33,165-Speed 10935.41 samples/sec Loss 3.0379 LearningRate 0.0046 Epoch: 18 Global Step: 75350 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:42:36,578-Speed 12006.31 samples/sec Loss 2.9749 LearningRate 0.0046 Epoch: 18 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:40,046-Speed 11813.70 samples/sec Loss 2.9760 LearningRate 0.0046 Epoch: 18 Global Step: 75370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:43,723-Speed 11141.67 samples/sec Loss 2.9889 LearningRate 0.0046 Epoch: 18 Global Step: 75380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:47,277-Speed 11528.45 samples/sec Loss 3.0452 LearningRate 0.0046 Epoch: 18 Global Step: 75390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:51,833-Speed 8992.49 samples/sec Loss 2.9951 LearningRate 0.0046 Epoch: 18 Global Step: 75400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:55,697-Speed 10603.68 samples/sec Loss 2.9874 LearningRate 0.0046 Epoch: 18 Global Step: 75410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:42:59,117-Speed 11979.52 samples/sec Loss 2.9818 LearningRate 0.0046 Epoch: 18 Global Step: 75420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:43:02,585-Speed 11817.95 samples/sec Loss 3.0198 LearningRate 0.0046 Epoch: 18 Global Step: 75430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:43:06,480-Speed 10517.84 samples/sec Loss 3.0163 LearningRate 0.0045 Epoch: 18 Global Step: 75440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:43:10,011-Speed 11609.14 samples/sec Loss 3.0337 LearningRate 0.0045 Epoch: 18 Global Step: 75450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:43:14,063-Speed 10113.91 samples/sec Loss 2.9881 LearningRate 0.0045 Epoch: 18 Global Step: 75460 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:17,521-Speed 11848.97 samples/sec Loss 2.9985 LearningRate 0.0045 Epoch: 18 Global Step: 75470 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:21,422-Speed 10500.70 samples/sec Loss 3.0227 LearningRate 0.0045 Epoch: 18 Global Step: 75480 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:24,884-Speed 11836.79 samples/sec Loss 2.9947 LearningRate 0.0045 Epoch: 18 Global Step: 75490 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:28,801-Speed 10459.61 samples/sec Loss 2.9966 LearningRate 0.0045 Epoch: 18 Global Step: 75500 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:32,435-Speed 11277.59 samples/sec Loss 3.0012 LearningRate 0.0045 Epoch: 18 Global Step: 75510 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:36,136-Speed 11067.18 samples/sec Loss 2.9663 LearningRate 0.0044 Epoch: 18 Global Step: 75520 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:39,856-Speed 11016.54 samples/sec Loss 3.0024 LearningRate 0.0044 Epoch: 18 Global Step: 75530 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:43,585-Speed 10989.34 samples/sec Loss 3.0228 LearningRate 0.0044 Epoch: 18 Global Step: 75540 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:47,102-Speed 11650.09 samples/sec Loss 2.9577 LearningRate 0.0044 Epoch: 18 Global Step: 75550 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:50,630-Speed 11610.72 samples/sec Loss 3.0167 LearningRate 0.0044 Epoch: 18 Global Step: 75560 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:54,382-Speed 10921.46 samples/sec Loss 2.9913 LearningRate 0.0044 Epoch: 18 Global Step: 75570 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:43:57,952-Speed 11475.92 samples/sec Loss 3.0040 LearningRate 0.0044 Epoch: 18 Global Step: 75580 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:44:01,776-Speed 10714.28 samples/sec Loss 3.0093 LearningRate 0.0044 Epoch: 18 Global Step: 75590 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:44:05,695-Speed 10457.32 samples/sec Loss 3.0085 LearningRate 0.0044 Epoch: 18 Global Step: 75600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:09,768-Speed 10057.81 samples/sec Loss 3.0383 LearningRate 0.0043 Epoch: 18 Global Step: 75610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:13,433-Speed 11178.72 samples/sec Loss 3.0110 LearningRate 0.0043 Epoch: 18 Global Step: 75620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:17,748-Speed 9493.13 samples/sec Loss 3.0025 LearningRate 0.0043 Epoch: 18 Global Step: 75630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:21,677-Speed 10429.18 samples/sec Loss 3.0000 LearningRate 0.0043 Epoch: 18 Global Step: 75640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:25,368-Speed 11100.04 samples/sec Loss 2.9973 LearningRate 0.0043 Epoch: 18 Global Step: 75650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:28,845-Speed 11784.63 samples/sec Loss 3.0000 LearningRate 0.0043 Epoch: 18 Global Step: 75660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:32,385-Speed 11574.20 samples/sec Loss 2.9930 LearningRate 0.0043 Epoch: 18 Global Step: 75670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:35,832-Speed 11888.33 samples/sec Loss 3.0054 LearningRate 0.0043 Epoch: 18 Global Step: 75680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:39,642-Speed 10751.87 samples/sec Loss 3.0106 LearningRate 0.0043 Epoch: 18 Global Step: 75690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:44:44,174-Speed 9040.44 samples/sec Loss 3.0038 LearningRate 0.0042 Epoch: 18 Global Step: 75700 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:44:47,811-Speed 11264.43 samples/sec Loss 3.0311 LearningRate 0.0042 Epoch: 18 Global Step: 75710 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:44:51,304-Speed 11728.11 samples/sec Loss 3.0017 LearningRate 0.0042 Epoch: 18 Global Step: 75720 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:44:54,826-Speed 11633.41 samples/sec Loss 2.9526 LearningRate 0.0042 Epoch: 18 Global Step: 75730 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:44:58,429-Speed 11372.69 samples/sec Loss 3.0040 LearningRate 0.0042 Epoch: 18 Global Step: 75740 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:45:01,902-Speed 11797.45 samples/sec Loss 2.9950 LearningRate 0.0042 Epoch: 18 Global Step: 75750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:05,594-Speed 11099.00 samples/sec Loss 3.0206 LearningRate 0.0042 Epoch: 18 Global Step: 75760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:09,365-Speed 10865.61 samples/sec Loss 3.0164 LearningRate 0.0042 Epoch: 18 Global Step: 75770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:13,161-Speed 10791.23 samples/sec Loss 2.9751 LearningRate 0.0042 Epoch: 18 Global Step: 75780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:16,878-Speed 11025.75 samples/sec Loss 2.9956 LearningRate 0.0042 Epoch: 18 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:20,270-Speed 12076.88 samples/sec Loss 2.9752 LearningRate 0.0041 Epoch: 18 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:24,063-Speed 10801.71 samples/sec Loss 3.0435 LearningRate 0.0041 Epoch: 18 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:27,740-Speed 11143.49 samples/sec Loss 3.0066 LearningRate 0.0041 Epoch: 18 Global Step: 75820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:31,424-Speed 11121.63 samples/sec Loss 2.9986 LearningRate 0.0041 Epoch: 18 Global Step: 75830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:35,738-Speed 9496.38 samples/sec Loss 3.0048 LearningRate 0.0041 Epoch: 18 Global Step: 75840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:45:39,661-Speed 10443.50 samples/sec Loss 2.9849 LearningRate 0.0041 Epoch: 18 Global Step: 75850 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:45:43,126-Speed 11825.02 samples/sec Loss 3.0087 LearningRate 0.0041 Epoch: 18 Global Step: 75860 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:45:46,731-Speed 11369.60 samples/sec Loss 3.0249 LearningRate 0.0041 Epoch: 18 Global Step: 75870 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:45:50,697-Speed 10329.34 samples/sec Loss 3.0134 LearningRate 0.0041 Epoch: 18 Global Step: 75880 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:45:54,250-Speed 11532.91 samples/sec Loss 3.0000 LearningRate 0.0040 Epoch: 18 Global Step: 75890 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:45:57,992-Speed 10950.72 samples/sec Loss 3.0085 LearningRate 0.0040 Epoch: 18 Global Step: 75900 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:46:01,540-Speed 11548.31 samples/sec Loss 3.0596 LearningRate 0.0040 Epoch: 18 Global Step: 75910 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:46:05,071-Speed 11602.38 samples/sec Loss 3.0311 LearningRate 0.0040 Epoch: 18 Global Step: 75920 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:46:08,783-Speed 11039.21 samples/sec Loss 3.0342 LearningRate 0.0040 Epoch: 18 Global Step: 75930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:12,453-Speed 11163.59 samples/sec Loss 3.0646 LearningRate 0.0040 Epoch: 18 Global Step: 75940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:16,128-Speed 11150.02 samples/sec Loss 2.9991 LearningRate 0.0040 Epoch: 18 Global Step: 75950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:20,081-Speed 10363.12 samples/sec Loss 3.0356 LearningRate 0.0040 Epoch: 18 Global Step: 75960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:24,114-Speed 10159.14 samples/sec Loss 3.0385 LearningRate 0.0040 Epoch: 18 Global Step: 75970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:27,671-Speed 11519.61 samples/sec Loss 3.0092 LearningRate 0.0039 Epoch: 18 Global Step: 75980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:31,402-Speed 10980.87 samples/sec Loss 3.0106 LearningRate 0.0039 Epoch: 18 Global Step: 75990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:35,651-Speed 9643.06 samples/sec Loss 3.0125 LearningRate 0.0039 Epoch: 18 Global Step: 76000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:39,437-Speed 10820.65 samples/sec Loss 3.0147 LearningRate 0.0039 Epoch: 18 Global Step: 76010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:42,957-Speed 11640.13 samples/sec Loss 3.0279 LearningRate 0.0039 Epoch: 18 Global Step: 76020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:46:46,810-Speed 10636.73 samples/sec Loss 3.0374 LearningRate 0.0039 Epoch: 18 Global Step: 76030 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:46:50,308-Speed 11710.92 samples/sec Loss 3.0055 LearningRate 0.0039 Epoch: 18 Global Step: 76040 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:46:53,891-Speed 11437.75 samples/sec Loss 2.9830 LearningRate 0.0039 Epoch: 18 Global Step: 76050 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:46:58,218-Speed 9468.02 samples/sec Loss 3.0378 LearningRate 0.0039 Epoch: 18 Global Step: 76060 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:01,652-Speed 11933.65 samples/sec Loss 3.0141 LearningRate 0.0039 Epoch: 18 Global Step: 76070 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:05,045-Speed 12076.17 samples/sec Loss 3.0179 LearningRate 0.0038 Epoch: 18 Global Step: 76080 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:08,470-Speed 11962.80 samples/sec Loss 3.0293 LearningRate 0.0038 Epoch: 18 Global Step: 76090 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:11,964-Speed 11729.15 samples/sec Loss 3.0138 LearningRate 0.0038 Epoch: 18 Global Step: 76100 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:15,727-Speed 10887.36 samples/sec Loss 2.9960 LearningRate 0.0038 Epoch: 18 Global Step: 76110 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:19,476-Speed 10925.99 samples/sec Loss 3.0348 LearningRate 0.0038 Epoch: 18 Global Step: 76120 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:23,179-Speed 11066.90 samples/sec Loss 3.0326 LearningRate 0.0038 Epoch: 18 Global Step: 76130 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:26,749-Speed 11475.58 samples/sec Loss 3.0272 LearningRate 0.0038 Epoch: 18 Global Step: 76140 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:30,355-Speed 11361.58 samples/sec Loss 3.0314 LearningRate 0.0038 Epoch: 18 Global Step: 76150 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:33,808-Speed 11862.46 samples/sec Loss 3.0070 LearningRate 0.0038 Epoch: 18 Global Step: 76160 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:37,473-Speed 11180.16 samples/sec Loss 3.0179 LearningRate 0.0037 Epoch: 18 Global Step: 76170 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:40,992-Speed 11642.54 samples/sec Loss 3.0202 LearningRate 0.0037 Epoch: 18 Global Step: 76180 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:44,450-Speed 11848.09 samples/sec Loss 3.0605 LearningRate 0.0037 Epoch: 18 Global Step: 76190 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:48,183-Speed 10975.69 samples/sec Loss 3.0461 LearningRate 0.0037 Epoch: 18 Global Step: 76200 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:51,709-Speed 11617.41 samples/sec Loss 3.0322 LearningRate 0.0037 Epoch: 18 Global Step: 76210 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:56,014-Speed 9517.64 samples/sec Loss 3.0274 LearningRate 0.0037 Epoch: 18 Global Step: 76220 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:47:59,801-Speed 10818.97 samples/sec Loss 3.0324 LearningRate 0.0037 Epoch: 18 Global Step: 76230 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:48:03,561-Speed 10896.55 samples/sec Loss 3.0431 LearningRate 0.0037 Epoch: 18 Global Step: 76240 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:48:07,421-Speed 10617.47 samples/sec Loss 2.9954 LearningRate 0.0037 Epoch: 18 Global Step: 76250 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:48:11,035-Speed 11334.48 samples/sec Loss 3.0459 LearningRate 0.0037 Epoch: 18 Global Step: 76260 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:48:14,473-Speed 11921.18 samples/sec Loss 3.0545 LearningRate 0.0036 Epoch: 18 Global Step: 76270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:17,958-Speed 11758.64 samples/sec Loss 3.0386 LearningRate 0.0036 Epoch: 18 Global Step: 76280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:21,936-Speed 10298.30 samples/sec Loss 3.0567 LearningRate 0.0036 Epoch: 18 Global Step: 76290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:25,730-Speed 10799.95 samples/sec Loss 3.0035 LearningRate 0.0036 Epoch: 18 Global Step: 76300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:29,274-Speed 11563.40 samples/sec Loss 3.0278 LearningRate 0.0036 Epoch: 18 Global Step: 76310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:33,003-Speed 10984.71 samples/sec Loss 2.9719 LearningRate 0.0036 Epoch: 18 Global Step: 76320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:36,599-Speed 11394.11 samples/sec Loss 3.0184 LearningRate 0.0036 Epoch: 18 Global Step: 76330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:40,115-Speed 11653.86 samples/sec Loss 3.0191 LearningRate 0.0036 Epoch: 18 Global Step: 76340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:43,678-Speed 11500.29 samples/sec Loss 3.0282 LearningRate 0.0036 Epoch: 18 Global Step: 76350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:47,461-Speed 10830.50 samples/sec Loss 3.0144 LearningRate 0.0036 Epoch: 18 Global Step: 76360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:51,134-Speed 11156.40 samples/sec Loss 3.0573 LearningRate 0.0035 Epoch: 18 Global Step: 76370 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:48:54,748-Speed 11337.43 samples/sec Loss 3.0170 LearningRate 0.0035 Epoch: 18 Global Step: 76380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:48:58,618-Speed 10586.02 samples/sec Loss 3.0277 LearningRate 0.0035 Epoch: 18 Global Step: 76390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:49:02,538-Speed 10453.50 samples/sec Loss 3.0726 LearningRate 0.0035 Epoch: 18 Global Step: 76400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:49:06,043-Speed 11690.11 samples/sec Loss 3.0197 LearningRate 0.0035 Epoch: 18 Global Step: 76410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:49:09,731-Speed 11108.58 samples/sec Loss 3.0152 LearningRate 0.0035 Epoch: 18 Global Step: 76420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:49:13,275-Speed 11562.38 samples/sec Loss 3.0752 LearningRate 0.0035 Epoch: 18 Global Step: 76430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:49:17,697-Speed 9264.32 samples/sec Loss 3.0591 LearningRate 0.0035 Epoch: 18 Global Step: 76440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:49:21,244-Speed 11549.96 samples/sec Loss 3.0263 LearningRate 0.0035 Epoch: 18 Global Step: 76450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:49:25,142-Speed 10512.92 samples/sec Loss 2.9984 LearningRate 0.0035 Epoch: 18 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:29,016-Speed 10574.47 samples/sec Loss 3.0572 LearningRate 0.0034 Epoch: 18 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:32,489-Speed 11797.36 samples/sec Loss 3.0446 LearningRate 0.0034 Epoch: 18 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:36,609-Speed 9945.30 samples/sec Loss 3.0271 LearningRate 0.0034 Epoch: 18 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:40,244-Speed 11272.66 samples/sec Loss 3.0448 LearningRate 0.0034 Epoch: 18 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:44,035-Speed 10815.12 samples/sec Loss 3.0099 LearningRate 0.0034 Epoch: 18 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:47,608-Speed 11466.30 samples/sec Loss 3.0373 LearningRate 0.0034 Epoch: 18 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:51,195-Speed 11425.47 samples/sec Loss 3.0266 LearningRate 0.0034 Epoch: 18 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:54,728-Speed 11594.29 samples/sec Loss 3.0391 LearningRate 0.0034 Epoch: 18 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:49:58,388-Speed 11196.08 samples/sec Loss 3.0464 LearningRate 0.0034 Epoch: 18 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-17 05:50:01,935-Speed 11550.68 samples/sec Loss 3.0209 LearningRate 0.0034 Epoch: 18 Global Step: 76560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:05,582-Speed 11236.66 samples/sec Loss 3.0202 LearningRate 0.0033 Epoch: 18 Global Step: 76570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:09,840-Speed 9621.21 samples/sec Loss 3.0705 LearningRate 0.0033 Epoch: 18 Global Step: 76580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:13,682-Speed 10666.72 samples/sec Loss 3.0573 LearningRate 0.0033 Epoch: 18 Global Step: 76590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:17,435-Speed 10915.71 samples/sec Loss 3.0456 LearningRate 0.0033 Epoch: 18 Global Step: 76600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:21,589-Speed 9863.57 samples/sec Loss 3.0142 LearningRate 0.0033 Epoch: 18 Global Step: 76610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:25,170-Speed 11439.47 samples/sec Loss 3.0294 LearningRate 0.0033 Epoch: 18 Global Step: 76620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:28,710-Speed 11573.71 samples/sec Loss 3.0205 LearningRate 0.0033 Epoch: 18 Global Step: 76630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:32,356-Speed 11240.20 samples/sec Loss 3.0294 LearningRate 0.0033 Epoch: 18 Global Step: 76640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:35,850-Speed 11726.71 samples/sec Loss 3.0193 LearningRate 0.0033 Epoch: 18 Global Step: 76650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:39,504-Speed 11212.80 samples/sec Loss 3.0096 LearningRate 0.0033 Epoch: 18 Global Step: 76660 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:50:43,228-Speed 11003.62 samples/sec Loss 3.0507 LearningRate 0.0033 Epoch: 18 Global Step: 76670 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:50:46,855-Speed 11296.13 samples/sec Loss 3.0147 LearningRate 0.0032 Epoch: 18 Global Step: 76680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:50,603-Speed 10930.04 samples/sec Loss 3.0645 LearningRate 0.0032 Epoch: 18 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:54,292-Speed 11104.53 samples/sec Loss 3.0069 LearningRate 0.0032 Epoch: 18 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:50:57,862-Speed 11477.45 samples/sec Loss 3.0849 LearningRate 0.0032 Epoch: 18 Global Step: 76710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:01,361-Speed 11709.23 samples/sec Loss 3.0510 LearningRate 0.0032 Epoch: 18 Global Step: 76720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:05,453-Speed 10011.69 samples/sec Loss 3.0368 LearningRate 0.0032 Epoch: 18 Global Step: 76730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:08,935-Speed 11767.87 samples/sec Loss 3.0436 LearningRate 0.0032 Epoch: 18 Global Step: 76740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:12,595-Speed 11195.97 samples/sec Loss 3.0549 LearningRate 0.0032 Epoch: 18 Global Step: 76750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:16,359-Speed 10885.17 samples/sec Loss 3.0582 LearningRate 0.0032 Epoch: 18 Global Step: 76760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:19,916-Speed 11515.86 samples/sec Loss 3.0167 LearningRate 0.0032 Epoch: 18 Global Step: 76770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:24,500-Speed 8937.92 samples/sec Loss 3.0236 LearningRate 0.0031 Epoch: 18 Global Step: 76780 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:51:28,386-Speed 10542.82 samples/sec Loss 3.0563 LearningRate 0.0031 Epoch: 18 Global Step: 76790 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:51:32,001-Speed 11335.00 samples/sec Loss 3.0645 LearningRate 0.0031 Epoch: 18 Global Step: 76800 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:51:35,565-Speed 11494.88 samples/sec Loss 3.0437 LearningRate 0.0031 Epoch: 18 Global Step: 76810 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:51:39,038-Speed 11799.91 samples/sec Loss 3.0803 LearningRate 0.0031 Epoch: 18 Global Step: 76820 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:51:43,096-Speed 10094.98 samples/sec Loss 3.0499 LearningRate 0.0031 Epoch: 18 Global Step: 76830 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:51:46,600-Speed 11693.35 samples/sec Loss 2.9943 LearningRate 0.0031 Epoch: 18 Global Step: 76840 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:51:50,322-Speed 11007.23 samples/sec Loss 3.0155 LearningRate 0.0031 Epoch: 18 Global Step: 76850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:54,086-Speed 10885.07 samples/sec Loss 3.0340 LearningRate 0.0031 Epoch: 18 Global Step: 76860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:51:57,912-Speed 10709.04 samples/sec Loss 3.0822 LearningRate 0.0031 Epoch: 18 Global Step: 76870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:01,542-Speed 11286.54 samples/sec Loss 3.0221 LearningRate 0.0031 Epoch: 18 Global Step: 76880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:05,100-Speed 11515.56 samples/sec Loss 3.0984 LearningRate 0.0030 Epoch: 18 Global Step: 76890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:08,933-Speed 10688.29 samples/sec Loss 3.0441 LearningRate 0.0030 Epoch: 18 Global Step: 76900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:12,400-Speed 11818.28 samples/sec Loss 3.0233 LearningRate 0.0030 Epoch: 18 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:15,964-Speed 11497.15 samples/sec Loss 3.0294 LearningRate 0.0030 Epoch: 18 Global Step: 76920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:19,761-Speed 10793.10 samples/sec Loss 3.0777 LearningRate 0.0030 Epoch: 18 Global Step: 76930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:23,789-Speed 10171.85 samples/sec Loss 3.0315 LearningRate 0.0030 Epoch: 18 Global Step: 76940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:27,448-Speed 11195.16 samples/sec Loss 3.0361 LearningRate 0.0030 Epoch: 18 Global Step: 76950 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:52:31,035-Speed 11423.82 samples/sec Loss 3.0592 LearningRate 0.0030 Epoch: 18 Global Step: 76960 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:52:34,497-Speed 11832.66 samples/sec Loss 3.0674 LearningRate 0.0030 Epoch: 18 Global Step: 76970 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:52:38,198-Speed 11071.56 samples/sec Loss 3.0810 LearningRate 0.0030 Epoch: 18 Global Step: 76980 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:52:41,736-Speed 11577.82 samples/sec Loss 3.0525 LearningRate 0.0030 Epoch: 18 Global Step: 76990 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:52:45,950-Speed 9723.20 samples/sec Loss 3.0398 LearningRate 0.0029 Epoch: 18 Global Step: 77000 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:52:49,466-Speed 11650.91 samples/sec Loss 3.0177 LearningRate 0.0029 Epoch: 18 Global Step: 77010 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:52:52,959-Speed 11732.25 samples/sec Loss 3.0261 LearningRate 0.0029 Epoch: 18 Global Step: 77020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:52:56,859-Speed 10504.88 samples/sec Loss 3.0466 LearningRate 0.0029 Epoch: 18 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:00,484-Speed 11304.24 samples/sec Loss 3.0174 LearningRate 0.0029 Epoch: 18 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:04,302-Speed 10730.11 samples/sec Loss 3.0492 LearningRate 0.0029 Epoch: 18 Global Step: 77050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:07,931-Speed 11288.59 samples/sec Loss 3.0249 LearningRate 0.0029 Epoch: 18 Global Step: 77060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:11,573-Speed 11250.77 samples/sec Loss 3.0248 LearningRate 0.0029 Epoch: 18 Global Step: 77070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:15,299-Speed 10996.77 samples/sec Loss 3.0765 LearningRate 0.0029 Epoch: 18 Global Step: 77080 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:19,229-Speed 10426.62 samples/sec Loss 3.0535 LearningRate 0.0029 Epoch: 18 Global Step: 77090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:22,850-Speed 11313.33 samples/sec Loss 3.0213 LearningRate 0.0029 Epoch: 18 Global Step: 77100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:26,893-Speed 10134.37 samples/sec Loss 3.0462 LearningRate 0.0028 Epoch: 18 Global Step: 77110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:30,666-Speed 10857.79 samples/sec Loss 3.0576 LearningRate 0.0028 Epoch: 18 Global Step: 77120 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:53:34,292-Speed 11300.27 samples/sec Loss 3.0525 LearningRate 0.0028 Epoch: 18 Global Step: 77130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:37,866-Speed 11465.17 samples/sec Loss 3.0394 LearningRate 0.0028 Epoch: 18 Global Step: 77140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:41,453-Speed 11423.45 samples/sec Loss 3.0399 LearningRate 0.0028 Epoch: 18 Global Step: 77150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:45,261-Speed 10759.81 samples/sec Loss 3.0434 LearningRate 0.0028 Epoch: 18 Global Step: 77160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:49,105-Speed 10658.78 samples/sec Loss 3.0018 LearningRate 0.0028 Epoch: 18 Global Step: 77170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:52,592-Speed 11751.55 samples/sec Loss 3.0453 LearningRate 0.0028 Epoch: 18 Global Step: 77180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:53:56,418-Speed 10707.23 samples/sec Loss 2.9978 LearningRate 0.0028 Epoch: 18 Global Step: 77190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:00,394-Speed 10305.53 samples/sec Loss 3.0180 LearningRate 0.0028 Epoch: 18 Global Step: 77200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:04,112-Speed 11019.25 samples/sec Loss 3.0426 LearningRate 0.0028 Epoch: 18 Global Step: 77210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:08,165-Speed 10107.91 samples/sec Loss 3.0224 LearningRate 0.0027 Epoch: 18 Global Step: 77220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:11,622-Speed 11851.25 samples/sec Loss 3.0054 LearningRate 0.0027 Epoch: 18 Global Step: 77230 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:54:15,335-Speed 11039.38 samples/sec Loss 3.0462 LearningRate 0.0027 Epoch: 18 Global Step: 77240 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:54:19,154-Speed 10727.19 samples/sec Loss 3.0622 LearningRate 0.0027 Epoch: 18 Global Step: 77250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:22,971-Speed 10733.42 samples/sec Loss 3.0294 LearningRate 0.0027 Epoch: 18 Global Step: 77260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:26,412-Speed 11907.86 samples/sec Loss 3.0745 LearningRate 0.0027 Epoch: 18 Global Step: 77270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:30,797-Speed 9343.21 samples/sec Loss 3.0112 LearningRate 0.0027 Epoch: 18 Global Step: 77280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:34,351-Speed 11528.90 samples/sec Loss 3.0223 LearningRate 0.0027 Epoch: 18 Global Step: 77290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:38,264-Speed 10469.95 samples/sec Loss 3.0574 LearningRate 0.0027 Epoch: 18 Global Step: 77300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:42,068-Speed 10771.29 samples/sec Loss 3.0356 LearningRate 0.0027 Epoch: 18 Global Step: 77310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:45,979-Speed 10476.24 samples/sec Loss 3.0467 LearningRate 0.0027 Epoch: 18 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:49,883-Speed 10493.18 samples/sec Loss 3.0920 LearningRate 0.0026 Epoch: 18 Global Step: 77330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:53,636-Speed 10915.67 samples/sec Loss 3.0386 LearningRate 0.0026 Epoch: 18 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:54:57,355-Speed 11018.64 samples/sec Loss 3.0038 LearningRate 0.0026 Epoch: 18 Global Step: 77350 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:55:01,577-Speed 9703.41 samples/sec Loss 3.0337 LearningRate 0.0026 Epoch: 18 Global Step: 77360 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:55:05,315-Speed 10959.22 samples/sec Loss 3.0520 LearningRate 0.0026 Epoch: 18 Global Step: 77370 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:09,225-Speed 10479.43 samples/sec Loss 3.0511 LearningRate 0.0026 Epoch: 18 Global Step: 77380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:13,032-Speed 10763.07 samples/sec Loss 3.0381 LearningRate 0.0026 Epoch: 18 Global Step: 77390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:16,761-Speed 10989.46 samples/sec Loss 3.0560 LearningRate 0.0026 Epoch: 18 Global Step: 77400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:20,662-Speed 10500.11 samples/sec Loss 3.0013 LearningRate 0.0026 Epoch: 18 Global Step: 77410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:24,289-Speed 11300.07 samples/sec Loss 3.0567 LearningRate 0.0026 Epoch: 18 Global Step: 77420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:28,128-Speed 10673.80 samples/sec Loss 3.0624 LearningRate 0.0026 Epoch: 18 Global Step: 77430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:31,975-Speed 10651.07 samples/sec Loss 3.0540 LearningRate 0.0026 Epoch: 18 Global Step: 77440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:37,002-Speed 8148.47 samples/sec Loss 3.0387 LearningRate 0.0025 Epoch: 18 Global Step: 77450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:40,565-Speed 11501.71 samples/sec Loss 3.0191 LearningRate 0.0025 Epoch: 18 Global Step: 77460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:55:44,218-Speed 11213.90 samples/sec Loss 3.0373 LearningRate 0.0025 Epoch: 18 Global Step: 77470 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:55:47,887-Speed 11169.53 samples/sec Loss 3.0714 LearningRate 0.0025 Epoch: 18 Global Step: 77480 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:55:51,445-Speed 11515.15 samples/sec Loss 3.0203 LearningRate 0.0025 Epoch: 18 Global Step: 77490 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:55:55,130-Speed 11118.99 samples/sec Loss 3.0627 LearningRate 0.0025 Epoch: 18 Global Step: 77500 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:55:59,077-Speed 10380.85 samples/sec Loss 3.0566 LearningRate 0.0025 Epoch: 18 Global Step: 77510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:02,561-Speed 11758.61 samples/sec Loss 3.0519 LearningRate 0.0025 Epoch: 18 Global Step: 77520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:06,194-Speed 11279.49 samples/sec Loss 3.0103 LearningRate 0.0025 Epoch: 18 Global Step: 77530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:09,769-Speed 11461.06 samples/sec Loss 3.0672 LearningRate 0.0025 Epoch: 18 Global Step: 77540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:13,513-Speed 10943.36 samples/sec Loss 3.0730 LearningRate 0.0025 Epoch: 18 Global Step: 77550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:17,091-Speed 11454.17 samples/sec Loss 3.0541 LearningRate 0.0025 Epoch: 18 Global Step: 77560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:20,702-Speed 11346.19 samples/sec Loss 3.0217 LearningRate 0.0024 Epoch: 18 Global Step: 77570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:24,475-Speed 10858.43 samples/sec Loss 3.0242 LearningRate 0.0024 Epoch: 18 Global Step: 77580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:28,205-Speed 10985.05 samples/sec Loss 3.0128 LearningRate 0.0024 Epoch: 18 Global Step: 77590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:31,737-Speed 11631.32 samples/sec Loss 3.0417 LearningRate 0.0024 Epoch: 18 Global Step: 77600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:35,249-Speed 11664.44 samples/sec Loss 3.0753 LearningRate 0.0024 Epoch: 18 Global Step: 77610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:39,860-Speed 8885.49 samples/sec Loss 3.0668 LearningRate 0.0024 Epoch: 18 Global Step: 77620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:43,417-Speed 11517.84 samples/sec Loss 3.1060 LearningRate 0.0024 Epoch: 18 Global Step: 77630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:47,133-Speed 11027.40 samples/sec Loss 3.0366 LearningRate 0.0024 Epoch: 18 Global Step: 77640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:50,988-Speed 10628.61 samples/sec Loss 3.0639 LearningRate 0.0024 Epoch: 18 Global Step: 77650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:55,091-Speed 9987.93 samples/sec Loss 3.0517 LearningRate 0.0024 Epoch: 18 Global Step: 77660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:56:58,703-Speed 11342.02 samples/sec Loss 3.0565 LearningRate 0.0024 Epoch: 18 Global Step: 77670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:57:02,155-Speed 11867.52 samples/sec Loss 3.0568 LearningRate 0.0024 Epoch: 18 Global Step: 77680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:57:05,812-Speed 11203.42 samples/sec Loss 3.0956 LearningRate 0.0023 Epoch: 18 Global Step: 77690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:57:09,433-Speed 11314.10 samples/sec Loss 3.0779 LearningRate 0.0023 Epoch: 18 Global Step: 77700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:57:13,012-Speed 11448.09 samples/sec Loss 3.0569 LearningRate 0.0023 Epoch: 18 Global Step: 77710 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:16,736-Speed 11003.75 samples/sec Loss 3.0445 LearningRate 0.0023 Epoch: 18 Global Step: 77720 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:20,747-Speed 10212.49 samples/sec Loss 3.0458 LearningRate 0.0023 Epoch: 18 Global Step: 77730 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:24,435-Speed 11109.69 samples/sec Loss 3.0551 LearningRate 0.0023 Epoch: 18 Global Step: 77740 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:28,366-Speed 10420.93 samples/sec Loss 3.0489 LearningRate 0.0023 Epoch: 18 Global Step: 77750 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:31,833-Speed 11818.47 samples/sec Loss 3.0410 LearningRate 0.0023 Epoch: 18 Global Step: 77760 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:35,463-Speed 11288.09 samples/sec Loss 3.0578 LearningRate 0.0023 Epoch: 18 Global Step: 77770 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:39,032-Speed 11478.20 samples/sec Loss 3.0666 LearningRate 0.0023 Epoch: 18 Global Step: 77780 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:43,159-Speed 9926.28 samples/sec Loss 3.0485 LearningRate 0.0023 Epoch: 18 Global Step: 77790 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:47,108-Speed 10376.45 samples/sec Loss 3.0484 LearningRate 0.0023 Epoch: 18 Global Step: 77800 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:50,763-Speed 11210.03 samples/sec Loss 3.0716 LearningRate 0.0022 Epoch: 18 Global Step: 77810 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:54,290-Speed 11616.17 samples/sec Loss 3.0163 LearningRate 0.0022 Epoch: 18 Global Step: 77820 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:57:58,026-Speed 10966.09 samples/sec Loss 3.0509 LearningRate 0.0022 Epoch: 18 Global Step: 77830 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:58:01,609-Speed 11437.85 samples/sec Loss 3.0593 LearningRate 0.0022 Epoch: 18 Global Step: 77840 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:58:05,156-Speed 11550.03 samples/sec Loss 3.0830 LearningRate 0.0022 Epoch: 18 Global Step: 77850 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:58:08,820-Speed 11181.61 samples/sec Loss 3.0494 LearningRate 0.0022 Epoch: 18 Global Step: 77860 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:58:12,458-Speed 11263.10 samples/sec Loss 3.0206 LearningRate 0.0022 Epoch: 18 Global Step: 77870 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:58:16,202-Speed 10944.70 samples/sec Loss 3.0748 LearningRate 0.0022 Epoch: 18 Global Step: 77880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:20,131-Speed 10427.46 samples/sec Loss 3.0545 LearningRate 0.0022 Epoch: 18 Global Step: 77890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:23,570-Speed 11914.19 samples/sec Loss 3.0324 LearningRate 0.0022 Epoch: 18 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:27,696-Speed 9931.23 samples/sec Loss 3.0468 LearningRate 0.0022 Epoch: 18 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:31,468-Speed 10866.32 samples/sec Loss 3.0173 LearningRate 0.0022 Epoch: 18 Global Step: 77920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:35,772-Speed 9517.61 samples/sec Loss 3.0269 LearningRate 0.0022 Epoch: 18 Global Step: 77930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:39,309-Speed 11582.52 samples/sec Loss 3.0767 LearningRate 0.0021 Epoch: 18 Global Step: 77940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:42,837-Speed 11615.42 samples/sec Loss 3.0405 LearningRate 0.0021 Epoch: 18 Global Step: 77950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:46,917-Speed 10042.52 samples/sec Loss 3.0777 LearningRate 0.0021 Epoch: 18 Global Step: 77960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:50,421-Speed 11693.31 samples/sec Loss 3.0403 LearningRate 0.0021 Epoch: 18 Global Step: 77970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:58:54,043-Speed 11316.29 samples/sec Loss 3.0737 LearningRate 0.0021 Epoch: 18 Global Step: 77980 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:58:57,777-Speed 10969.78 samples/sec Loss 3.0518 LearningRate 0.0021 Epoch: 18 Global Step: 77990 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:59:01,543-Speed 10882.86 samples/sec Loss 3.0407 LearningRate 0.0021 Epoch: 18 Global Step: 78000 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:59:05,078-Speed 11590.30 samples/sec Loss 3.0640 LearningRate 0.0021 Epoch: 18 Global Step: 78010 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:59:08,899-Speed 10721.48 samples/sec Loss 3.0460 LearningRate 0.0021 Epoch: 18 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:12,914-Speed 10205.61 samples/sec Loss 2.9973 LearningRate 0.0021 Epoch: 18 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:16,386-Speed 11800.42 samples/sec Loss 3.0619 LearningRate 0.0021 Epoch: 18 Global Step: 78040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:20,125-Speed 10959.18 samples/sec Loss 3.0139 LearningRate 0.0021 Epoch: 18 Global Step: 78050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:24,225-Speed 10002.43 samples/sec Loss 3.0208 LearningRate 0.0021 Epoch: 18 Global Step: 78060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:27,733-Speed 11677.32 samples/sec Loss 3.0389 LearningRate 0.0020 Epoch: 18 Global Step: 78070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:31,214-Speed 11776.67 samples/sec Loss 3.0559 LearningRate 0.0020 Epoch: 18 Global Step: 78080 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:34,857-Speed 11244.85 samples/sec Loss 3.0165 LearningRate 0.0020 Epoch: 18 Global Step: 78090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:38,329-Speed 11801.13 samples/sec Loss 3.0455 LearningRate 0.0020 Epoch: 18 Global Step: 78100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:41,911-Speed 11437.82 samples/sec Loss 3.0538 LearningRate 0.0020 Epoch: 18 Global Step: 78110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:45,546-Speed 11271.02 samples/sec Loss 3.0718 LearningRate 0.0020 Epoch: 18 Global Step: 78120 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:59:49,970-Speed 9260.69 samples/sec Loss 3.0672 LearningRate 0.0020 Epoch: 18 Global Step: 78130 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 05:59:53,648-Speed 11138.73 samples/sec Loss 3.0464 LearningRate 0.0020 Epoch: 18 Global Step: 78140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 05:59:57,828-Speed 9801.20 samples/sec Loss 3.0162 LearningRate 0.0020 Epoch: 18 Global Step: 78150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:01,502-Speed 11152.44 samples/sec Loss 3.0483 LearningRate 0.0020 Epoch: 18 Global Step: 78160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:05,059-Speed 11518.62 samples/sec Loss 3.0773 LearningRate 0.0020 Epoch: 18 Global Step: 78170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:08,826-Speed 10877.14 samples/sec Loss 3.0602 LearningRate 0.0020 Epoch: 18 Global Step: 78180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:13,151-Speed 9473.07 samples/sec Loss 3.0414 LearningRate 0.0020 Epoch: 18 Global Step: 78190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:16,778-Speed 11294.43 samples/sec Loss 3.0725 LearningRate 0.0019 Epoch: 18 Global Step: 78200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:20,395-Speed 11327.83 samples/sec Loss 3.0291 LearningRate 0.0019 Epoch: 18 Global Step: 78210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:24,336-Speed 10396.00 samples/sec Loss 3.0548 LearningRate 0.0019 Epoch: 18 Global Step: 78220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:28,068-Speed 10979.47 samples/sec Loss 3.0505 LearningRate 0.0019 Epoch: 18 Global Step: 78230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:00:31,842-Speed 10855.25 samples/sec Loss 3.0536 LearningRate 0.0019 Epoch: 18 Global Step: 78240 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:00:35,513-Speed 11160.33 samples/sec Loss 3.0433 LearningRate 0.0019 Epoch: 18 Global Step: 78250 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:00:39,165-Speed 11220.11 samples/sec Loss 3.0762 LearningRate 0.0019 Epoch: 18 Global Step: 78260 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:00:42,869-Speed 11062.15 samples/sec Loss 3.0247 LearningRate 0.0019 Epoch: 18 Global Step: 78270 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:00:46,998-Speed 9920.37 samples/sec Loss 3.0223 LearningRate 0.0019 Epoch: 18 Global Step: 78280 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:00:51,132-Speed 9910.54 samples/sec Loss 3.0822 LearningRate 0.0019 Epoch: 18 Global Step: 78290 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:00:55,485-Speed 9413.44 samples/sec Loss 3.0297 LearningRate 0.0019 Epoch: 18 Global Step: 78300 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:00:59,200-Speed 11027.64 samples/sec Loss 3.0249 LearningRate 0.0019 Epoch: 18 Global Step: 78310 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:01:03,271-Speed 10063.14 samples/sec Loss 3.0651 LearningRate 0.0019 Epoch: 18 Global Step: 78320 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:01:06,750-Speed 11775.88 samples/sec Loss 3.0074 LearningRate 0.0019 Epoch: 18 Global Step: 78330 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:01:10,568-Speed 10732.83 samples/sec Loss 3.0703 LearningRate 0.0018 Epoch: 18 Global Step: 78340 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:01:14,575-Speed 10225.63 samples/sec Loss 3.0401 LearningRate 0.0018 Epoch: 18 Global Step: 78350 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:01:18,528-Speed 10365.23 samples/sec Loss 3.0410 LearningRate 0.0018 Epoch: 18 Global Step: 78360 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:01:22,110-Speed 11436.82 samples/sec Loss 3.0207 LearningRate 0.0018 Epoch: 18 Global Step: 78370 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:01:25,820-Speed 11045.60 samples/sec Loss 3.0654 LearningRate 0.0018 Epoch: 18 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:29,662-Speed 10660.72 samples/sec Loss 3.0730 LearningRate 0.0018 Epoch: 18 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:33,163-Speed 11705.19 samples/sec Loss 3.0762 LearningRate 0.0018 Epoch: 18 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:36,636-Speed 11800.44 samples/sec Loss 3.0669 LearningRate 0.0018 Epoch: 18 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:40,167-Speed 11601.92 samples/sec Loss 3.0503 LearningRate 0.0018 Epoch: 18 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:43,800-Speed 11277.88 samples/sec Loss 3.0781 LearningRate 0.0018 Epoch: 18 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:47,401-Speed 11376.00 samples/sec Loss 3.0695 LearningRate 0.0018 Epoch: 18 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:51,023-Speed 11311.89 samples/sec Loss 3.0791 LearningRate 0.0018 Epoch: 18 Global Step: 78450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:54,617-Speed 11400.19 samples/sec Loss 3.0592 LearningRate 0.0018 Epoch: 18 Global Step: 78460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:01:59,317-Speed 8718.55 samples/sec Loss 3.0101 LearningRate 0.0018 Epoch: 18 Global Step: 78470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:02:02,888-Speed 11473.52 samples/sec Loss 3.0437 LearningRate 0.0017 Epoch: 18 Global Step: 78480 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:06,342-Speed 11861.81 samples/sec Loss 3.0760 LearningRate 0.0017 Epoch: 18 Global Step: 78490 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:09,874-Speed 11600.89 samples/sec Loss 3.0239 LearningRate 0.0017 Epoch: 18 Global Step: 78500 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:13,569-Speed 11086.74 samples/sec Loss 3.0672 LearningRate 0.0017 Epoch: 18 Global Step: 78510 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:17,665-Speed 10008.10 samples/sec Loss 3.0412 LearningRate 0.0017 Epoch: 18 Global Step: 78520 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:21,395-Speed 10983.91 samples/sec Loss 3.0739 LearningRate 0.0017 Epoch: 18 Global Step: 78530 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:25,249-Speed 10630.50 samples/sec Loss 3.0780 LearningRate 0.0017 Epoch: 18 Global Step: 78540 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:28,720-Speed 11806.08 samples/sec Loss 3.0580 LearningRate 0.0017 Epoch: 18 Global Step: 78550 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:32,356-Speed 11266.61 samples/sec Loss 3.0373 LearningRate 0.0017 Epoch: 18 Global Step: 78560 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:36,069-Speed 11035.08 samples/sec Loss 3.0622 LearningRate 0.0017 Epoch: 18 Global Step: 78570 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:39,795-Speed 10996.48 samples/sec Loss 3.0713 LearningRate 0.0017 Epoch: 18 Global Step: 78580 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:43,489-Speed 11093.37 samples/sec Loss 3.0600 LearningRate 0.0017 Epoch: 18 Global Step: 78590 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:46,977-Speed 11747.00 samples/sec Loss 3.0458 LearningRate 0.0017 Epoch: 18 Global Step: 78600 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:50,752-Speed 10854.10 samples/sec Loss 3.0651 LearningRate 0.0017 Epoch: 18 Global Step: 78610 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:54,513-Speed 10892.82 samples/sec Loss 3.0278 LearningRate 0.0016 Epoch: 18 Global Step: 78620 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:02:58,350-Speed 10677.23 samples/sec Loss 3.0665 LearningRate 0.0016 Epoch: 18 Global Step: 78630 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:03:01,757-Speed 12027.84 samples/sec Loss 3.0345 LearningRate 0.0016 Epoch: 18 Global Step: 78640 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:03:05,830-Speed 10059.07 samples/sec Loss 3.0587 LearningRate 0.0016 Epoch: 18 Global Step: 78650 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:03:09,318-Speed 11747.94 samples/sec Loss 3.0331 LearningRate 0.0016 Epoch: 18 Global Step: 78660 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:03:12,738-Speed 11982.19 samples/sec Loss 3.0673 LearningRate 0.0016 Epoch: 18 Global Step: 78670 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:03:16,547-Speed 10757.82 samples/sec Loss 3.0610 LearningRate 0.0016 Epoch: 18 Global Step: 78680 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:03:19,973-Speed 11957.35 samples/sec Loss 3.0622 LearningRate 0.0016 Epoch: 18 Global Step: 78690 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-01-17 06:03:24,131-Speed 9853.14 samples/sec Loss 3.0619 LearningRate 0.0016 Epoch: 18 Global Step: 78700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:27,696-Speed 11491.66 samples/sec Loss 3.0777 LearningRate 0.0016 Epoch: 18 Global Step: 78710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:31,399-Speed 11066.99 samples/sec Loss 3.0383 LearningRate 0.0016 Epoch: 18 Global Step: 78720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:34,845-Speed 11888.80 samples/sec Loss 3.0539 LearningRate 0.0016 Epoch: 18 Global Step: 78730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:38,295-Speed 11876.99 samples/sec Loss 3.0638 LearningRate 0.0016 Epoch: 18 Global Step: 78740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:41,912-Speed 11326.11 samples/sec Loss 3.0429 LearningRate 0.0016 Epoch: 18 Global Step: 78750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:45,423-Speed 11674.00 samples/sec Loss 3.0175 LearningRate 0.0016 Epoch: 18 Global Step: 78760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:48,929-Speed 11686.22 samples/sec Loss 3.0695 LearningRate 0.0015 Epoch: 18 Global Step: 78770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-17 06:03:52,944-Speed 10203.44 samples/sec Loss 3.0495 LearningRate 0.0015 Epoch: 18 Global Step: 78780 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:03:56,692-Speed 10929.62 samples/sec Loss 3.0742 LearningRate 0.0015 Epoch: 18 Global Step: 78790 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:00,176-Speed 11761.84 samples/sec Loss 3.0663 LearningRate 0.0015 Epoch: 18 Global Step: 78800 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:04:03,650-Speed 11792.61 samples/sec Loss 3.0614 LearningRate 0.0015 Epoch: 18 Global Step: 78810 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:07,887-Speed 9678.02 samples/sec Loss 3.0355 LearningRate 0.0015 Epoch: 18 Global Step: 78820 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:11,573-Speed 11115.28 samples/sec Loss 3.0629 LearningRate 0.0015 Epoch: 18 Global Step: 78830 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:15,189-Speed 11329.32 samples/sec Loss 3.0339 LearningRate 0.0015 Epoch: 18 Global Step: 78840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:18,983-Speed 10799.26 samples/sec Loss 3.0621 LearningRate 0.0015 Epoch: 18 Global Step: 78850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:22,742-Speed 10898.25 samples/sec Loss 3.0631 LearningRate 0.0015 Epoch: 18 Global Step: 78860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:26,555-Speed 10745.70 samples/sec Loss 3.0567 LearningRate 0.0015 Epoch: 18 Global Step: 78870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:31,154-Speed 8909.17 samples/sec Loss 3.0729 LearningRate 0.0015 Epoch: 18 Global Step: 78880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:34,824-Speed 11163.03 samples/sec Loss 3.0253 LearningRate 0.0015 Epoch: 18 Global Step: 78890 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:38,287-Speed 11832.51 samples/sec Loss 3.0720 LearningRate 0.0015 Epoch: 18 Global Step: 78900 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:04:41,870-Speed 11436.70 samples/sec Loss 3.0780 LearningRate 0.0015 Epoch: 18 Global Step: 78910 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:04:45,313-Speed 11902.02 samples/sec Loss 3.0536 LearningRate 0.0014 Epoch: 18 Global Step: 78920 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:04:49,042-Speed 10988.30 samples/sec Loss 3.0733 LearningRate 0.0014 Epoch: 18 Global Step: 78930 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:04:53,023-Speed 10290.65 samples/sec Loss 3.0753 LearningRate 0.0014 Epoch: 18 Global Step: 78940 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:04:56,730-Speed 11054.28 samples/sec Loss 3.0457 LearningRate 0.0014 Epoch: 18 Global Step: 78950 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:00,375-Speed 11240.78 samples/sec Loss 3.0201 LearningRate 0.0014 Epoch: 18 Global Step: 78960 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:03,999-Speed 11307.57 samples/sec Loss 3.0394 LearningRate 0.0014 Epoch: 18 Global Step: 78970 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:07,548-Speed 11543.78 samples/sec Loss 3.0709 LearningRate 0.0014 Epoch: 18 Global Step: 78980 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:11,813-Speed 9607.39 samples/sec Loss 3.0657 LearningRate 0.0014 Epoch: 18 Global Step: 78990 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:15,330-Speed 11650.19 samples/sec Loss 3.0160 LearningRate 0.0014 Epoch: 18 Global Step: 79000 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:18,837-Speed 11685.24 samples/sec Loss 3.0821 LearningRate 0.0014 Epoch: 18 Global Step: 79010 Fp16 Grad Scale: 524288 Required: 0 hours Training: 2022-01-17 06:05:22,415-Speed 11449.19 samples/sec Loss 3.0465 LearningRate 0.0014 Epoch: 18 Global Step: 79020 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:26,623-Speed 9736.50 samples/sec Loss 3.0636 LearningRate 0.0014 Epoch: 18 Global Step: 79030 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:30,121-Speed 11714.65 samples/sec Loss 3.0796 LearningRate 0.0014 Epoch: 18 Global Step: 79040 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:33,938-Speed 10735.47 samples/sec Loss 2.9900 LearningRate 0.0014 Epoch: 18 Global Step: 79050 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:37,570-Speed 11282.50 samples/sec Loss 3.0487 LearningRate 0.0014 Epoch: 18 Global Step: 79060 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:41,207-Speed 11262.35 samples/sec Loss 3.0307 LearningRate 0.0014 Epoch: 18 Global Step: 79070 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:44,716-Speed 11676.01 samples/sec Loss 3.0329 LearningRate 0.0013 Epoch: 18 Global Step: 79080 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:48,448-Speed 10978.64 samples/sec Loss 3.0787 LearningRate 0.0013 Epoch: 18 Global Step: 79090 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:51,991-Speed 11563.91 samples/sec Loss 3.0610 LearningRate 0.0013 Epoch: 18 Global Step: 79100 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:55,710-Speed 11017.40 samples/sec Loss 3.0782 LearningRate 0.0013 Epoch: 18 Global Step: 79110 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:05:59,349-Speed 11257.88 samples/sec Loss 3.0466 LearningRate 0.0013 Epoch: 18 Global Step: 79120 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:06:03,130-Speed 10833.87 samples/sec Loss 3.0446 LearningRate 0.0013 Epoch: 18 Global Step: 79130 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:06:06,830-Speed 11075.71 samples/sec Loss 3.0749 LearningRate 0.0013 Epoch: 18 Global Step: 79140 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:06:10,481-Speed 11221.82 samples/sec Loss 3.0835 LearningRate 0.0013 Epoch: 18 Global Step: 79150 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:06:13,966-Speed 11754.39 samples/sec Loss 3.0303 LearningRate 0.0013 Epoch: 18 Global Step: 79160 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:06:18,259-Speed 9542.93 samples/sec Loss 3.0195 LearningRate 0.0013 Epoch: 18 Global Step: 79170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:21,928-Speed 11167.00 samples/sec Loss 3.0454 LearningRate 0.0013 Epoch: 18 Global Step: 79180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:25,491-Speed 11499.01 samples/sec Loss 3.0273 LearningRate 0.0013 Epoch: 18 Global Step: 79190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:28,994-Speed 11715.72 samples/sec Loss 3.0438 LearningRate 0.0013 Epoch: 18 Global Step: 79200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:32,691-Speed 11086.07 samples/sec Loss 3.0907 LearningRate 0.0013 Epoch: 18 Global Step: 79210 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:36,246-Speed 11523.64 samples/sec Loss 3.0577 LearningRate 0.0013 Epoch: 18 Global Step: 79220 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:39,743-Speed 11717.97 samples/sec Loss 3.0351 LearningRate 0.0013 Epoch: 18 Global Step: 79230 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:43,562-Speed 10729.31 samples/sec Loss 3.0528 LearningRate 0.0013 Epoch: 18 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:47,062-Speed 11707.89 samples/sec Loss 3.0610 LearningRate 0.0012 Epoch: 18 Global Step: 79250 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:06:50,722-Speed 11195.22 samples/sec Loss 3.0579 LearningRate 0.0012 Epoch: 18 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:07:39,215-Speed 844.70 samples/sec Loss 2.9985 LearningRate 0.0012 Epoch: 19 Global Step: 79270 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:07:42,868-Speed 11215.75 samples/sec Loss 2.9447 LearningRate 0.0012 Epoch: 19 Global Step: 79280 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:07:47,011-Speed 9889.80 samples/sec Loss 2.9022 LearningRate 0.0012 Epoch: 19 Global Step: 79290 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:07:50,815-Speed 10769.97 samples/sec Loss 2.8992 LearningRate 0.0012 Epoch: 19 Global Step: 79300 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:07:54,390-Speed 11461.42 samples/sec Loss 2.9283 LearningRate 0.0012 Epoch: 19 Global Step: 79310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:07:58,330-Speed 10399.93 samples/sec Loss 2.9094 LearningRate 0.0012 Epoch: 19 Global Step: 79320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:01,928-Speed 11385.36 samples/sec Loss 2.9029 LearningRate 0.0012 Epoch: 19 Global Step: 79330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:05,678-Speed 10924.99 samples/sec Loss 2.9164 LearningRate 0.0012 Epoch: 19 Global Step: 79340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:09,281-Speed 11373.31 samples/sec Loss 2.9390 LearningRate 0.0012 Epoch: 19 Global Step: 79350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:13,018-Speed 10963.33 samples/sec Loss 2.8942 LearningRate 0.0012 Epoch: 19 Global Step: 79360 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:17,298-Speed 9571.99 samples/sec Loss 2.9060 LearningRate 0.0012 Epoch: 19 Global Step: 79370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:20,822-Speed 11626.25 samples/sec Loss 2.9701 LearningRate 0.0012 Epoch: 19 Global Step: 79380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:24,447-Speed 11305.71 samples/sec Loss 2.9171 LearningRate 0.0012 Epoch: 19 Global Step: 79390 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:27,971-Speed 11624.77 samples/sec Loss 2.9352 LearningRate 0.0012 Epoch: 19 Global Step: 79400 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:31,534-Speed 11497.93 samples/sec Loss 2.8822 LearningRate 0.0012 Epoch: 19 Global Step: 79410 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:08:35,016-Speed 11764.89 samples/sec Loss 2.9142 LearningRate 0.0011 Epoch: 19 Global Step: 79420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:38,812-Speed 10794.06 samples/sec Loss 2.9390 LearningRate 0.0011 Epoch: 19 Global Step: 79430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:42,739-Speed 10432.74 samples/sec Loss 2.9107 LearningRate 0.0011 Epoch: 19 Global Step: 79440 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:46,395-Speed 11209.81 samples/sec Loss 2.9623 LearningRate 0.0011 Epoch: 19 Global Step: 79450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:50,137-Speed 10947.96 samples/sec Loss 2.9247 LearningRate 0.0011 Epoch: 19 Global Step: 79460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:53,948-Speed 10748.89 samples/sec Loss 2.9014 LearningRate 0.0011 Epoch: 19 Global Step: 79470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:08:57,646-Speed 11081.02 samples/sec Loss 2.9244 LearningRate 0.0011 Epoch: 19 Global Step: 79480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:01,213-Speed 11484.54 samples/sec Loss 2.9357 LearningRate 0.0011 Epoch: 19 Global Step: 79490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:04,952-Speed 10958.69 samples/sec Loss 2.9412 LearningRate 0.0011 Epoch: 19 Global Step: 79500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:08,750-Speed 10786.36 samples/sec Loss 2.9386 LearningRate 0.0011 Epoch: 19 Global Step: 79510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:12,533-Speed 10829.72 samples/sec Loss 2.8973 LearningRate 0.0011 Epoch: 19 Global Step: 79520 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:09:16,579-Speed 10128.62 samples/sec Loss 2.9004 LearningRate 0.0011 Epoch: 19 Global Step: 79530 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:09:21,283-Speed 8708.59 samples/sec Loss 2.9243 LearningRate 0.0011 Epoch: 19 Global Step: 79540 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:25,029-Speed 10937.65 samples/sec Loss 2.9269 LearningRate 0.0011 Epoch: 19 Global Step: 79550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:28,959-Speed 10426.00 samples/sec Loss 2.9480 LearningRate 0.0011 Epoch: 19 Global Step: 79560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:32,479-Speed 11640.04 samples/sec Loss 2.9063 LearningRate 0.0011 Epoch: 19 Global Step: 79570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:36,057-Speed 11450.18 samples/sec Loss 2.9461 LearningRate 0.0011 Epoch: 19 Global Step: 79580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:40,087-Speed 10172.55 samples/sec Loss 2.9132 LearningRate 0.0011 Epoch: 19 Global Step: 79590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:43,646-Speed 11511.42 samples/sec Loss 2.9062 LearningRate 0.0010 Epoch: 19 Global Step: 79600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:47,281-Speed 11274.14 samples/sec Loss 2.9120 LearningRate 0.0010 Epoch: 19 Global Step: 79610 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:50,808-Speed 11614.26 samples/sec Loss 2.9465 LearningRate 0.0010 Epoch: 19 Global Step: 79620 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:54,779-Speed 10317.24 samples/sec Loss 2.9058 LearningRate 0.0010 Epoch: 19 Global Step: 79630 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:09:58,209-Speed 11944.62 samples/sec Loss 2.9556 LearningRate 0.0010 Epoch: 19 Global Step: 79640 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:10:01,764-Speed 11526.23 samples/sec Loss 2.8923 LearningRate 0.0010 Epoch: 19 Global Step: 79650 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:10:05,401-Speed 11264.13 samples/sec Loss 2.9370 LearningRate 0.0010 Epoch: 19 Global Step: 79660 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:10:08,991-Speed 11411.66 samples/sec Loss 2.9336 LearningRate 0.0010 Epoch: 19 Global Step: 79670 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:10:12,606-Speed 11332.20 samples/sec Loss 2.9306 LearningRate 0.0010 Epoch: 19 Global Step: 79680 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:10:16,268-Speed 11191.36 samples/sec Loss 2.9512 LearningRate 0.0010 Epoch: 19 Global Step: 79690 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:10:20,282-Speed 10206.53 samples/sec Loss 2.9138 LearningRate 0.0010 Epoch: 19 Global Step: 79700 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:23,825-Speed 11564.43 samples/sec Loss 2.9110 LearningRate 0.0010 Epoch: 19 Global Step: 79710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:27,568-Speed 10944.42 samples/sec Loss 2.9225 LearningRate 0.0010 Epoch: 19 Global Step: 79720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:31,222-Speed 11211.38 samples/sec Loss 2.8875 LearningRate 0.0010 Epoch: 19 Global Step: 79730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:34,993-Speed 10863.99 samples/sec Loss 2.9381 LearningRate 0.0010 Epoch: 19 Global Step: 79740 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:38,430-Speed 11925.01 samples/sec Loss 2.9215 LearningRate 0.0010 Epoch: 19 Global Step: 79750 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:42,769-Speed 9442.38 samples/sec Loss 2.9244 LearningRate 0.0010 Epoch: 19 Global Step: 79760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:46,153-Speed 12106.87 samples/sec Loss 2.8867 LearningRate 0.0010 Epoch: 19 Global Step: 79770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:49,504-Speed 12225.97 samples/sec Loss 2.9156 LearningRate 0.0010 Epoch: 19 Global Step: 79780 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:52,909-Speed 12033.69 samples/sec Loss 2.9230 LearningRate 0.0009 Epoch: 19 Global Step: 79790 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:10:56,323-Speed 12001.36 samples/sec Loss 2.9110 LearningRate 0.0009 Epoch: 19 Global Step: 79800 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:11:00,249-Speed 10432.87 samples/sec Loss 2.9173 LearningRate 0.0009 Epoch: 19 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:03,951-Speed 11067.08 samples/sec Loss 2.9175 LearningRate 0.0009 Epoch: 19 Global Step: 79820 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:07,343-Speed 12079.81 samples/sec Loss 2.8995 LearningRate 0.0009 Epoch: 19 Global Step: 79830 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:10,700-Speed 12203.80 samples/sec Loss 2.9177 LearningRate 0.0009 Epoch: 19 Global Step: 79840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:14,090-Speed 12086.05 samples/sec Loss 2.9161 LearningRate 0.0009 Epoch: 19 Global Step: 79850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:17,710-Speed 11320.16 samples/sec Loss 2.9321 LearningRate 0.0009 Epoch: 19 Global Step: 79860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:21,354-Speed 11240.47 samples/sec Loss 2.9200 LearningRate 0.0009 Epoch: 19 Global Step: 79870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:24,810-Speed 11856.39 samples/sec Loss 2.9177 LearningRate 0.0009 Epoch: 19 Global Step: 79880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:28,389-Speed 11447.50 samples/sec Loss 2.9488 LearningRate 0.0009 Epoch: 19 Global Step: 79890 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:32,212-Speed 10716.19 samples/sec Loss 2.8873 LearningRate 0.0009 Epoch: 19 Global Step: 79900 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:11:35,952-Speed 10953.32 samples/sec Loss 2.9772 LearningRate 0.0009 Epoch: 19 Global Step: 79910 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:11:39,573-Speed 11315.80 samples/sec Loss 2.9326 LearningRate 0.0009 Epoch: 19 Global Step: 79920 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:11:43,265-Speed 11098.66 samples/sec Loss 2.8983 LearningRate 0.0009 Epoch: 19 Global Step: 79930 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:11:46,949-Speed 11120.00 samples/sec Loss 2.9209 LearningRate 0.0009 Epoch: 19 Global Step: 79940 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:11:50,439-Speed 11741.62 samples/sec Loss 2.9135 LearningRate 0.0009 Epoch: 19 Global Step: 79950 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:11:53,886-Speed 11885.90 samples/sec Loss 2.9161 LearningRate 0.0009 Epoch: 19 Global Step: 79960 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:11:58,199-Speed 9497.79 samples/sec Loss 2.9186 LearningRate 0.0009 Epoch: 19 Global Step: 79970 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:12:01,795-Speed 11393.05 samples/sec Loss 2.9271 LearningRate 0.0008 Epoch: 19 Global Step: 79980 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:12:05,367-Speed 11468.56 samples/sec Loss 2.9059 LearningRate 0.0008 Epoch: 19 Global Step: 79990 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:12:08,780-Speed 12005.48 samples/sec Loss 2.9486 LearningRate 0.0008 Epoch: 19 Global Step: 80000 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:12:30,198-[lfw][80000]XNorm: 6.746882 Training: 2022-01-17 06:12:30,199-[lfw][80000]Accuracy-Flip: 0.99717+-0.00308 Training: 2022-01-17 06:12:30,199-[lfw][80000]Accuracy-Highest: 0.99733 Training: 2022-01-17 06:12:54,792-[cfp_fp][80000]XNorm: 5.745270 Training: 2022-01-17 06:12:54,792-[cfp_fp][80000]Accuracy-Flip: 0.97357+-0.00906 Training: 2022-01-17 06:12:54,793-[cfp_fp][80000]Accuracy-Highest: 0.97457 Training: 2022-01-17 06:13:15,936-[agedb_30][80000]XNorm: 6.467416 Training: 2022-01-17 06:13:15,937-[agedb_30][80000]Accuracy-Flip: 0.97417+-0.00642 Training: 2022-01-17 06:13:15,938-[agedb_30][80000]Accuracy-Highest: 0.97417 Training: 2022-01-17 06:13:19,305-Speed 580.79 samples/sec Loss 2.9196 LearningRate 0.0008 Epoch: 19 Global Step: 80010 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:13:22,706-Speed 12047.33 samples/sec Loss 2.8836 LearningRate 0.0008 Epoch: 19 Global Step: 80020 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:13:26,073-Speed 12167.71 samples/sec Loss 2.9306 LearningRate 0.0008 Epoch: 19 Global Step: 80030 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:13:29,469-Speed 12064.38 samples/sec Loss 2.9434 LearningRate 0.0008 Epoch: 19 Global Step: 80040 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:13:32,841-Speed 12148.11 samples/sec Loss 2.9582 LearningRate 0.0008 Epoch: 19 Global Step: 80050 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:13:36,179-Speed 12274.82 samples/sec Loss 2.9114 LearningRate 0.0008 Epoch: 19 Global Step: 80060 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:13:39,710-Speed 11604.33 samples/sec Loss 2.9236 LearningRate 0.0008 Epoch: 19 Global Step: 80070 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:13:43,486-Speed 10847.39 samples/sec Loss 2.9231 LearningRate 0.0008 Epoch: 19 Global Step: 80080 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:13:47,022-Speed 11586.82 samples/sec Loss 2.9134 LearningRate 0.0008 Epoch: 19 Global Step: 80090 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:13:50,672-Speed 11224.58 samples/sec Loss 2.8856 LearningRate 0.0008 Epoch: 19 Global Step: 80100 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:13:54,397-Speed 10999.49 samples/sec Loss 2.9547 LearningRate 0.0008 Epoch: 19 Global Step: 80110 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:13:57,965-Speed 11482.77 samples/sec Loss 2.9092 LearningRate 0.0008 Epoch: 19 Global Step: 80120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:01,691-Speed 10995.26 samples/sec Loss 2.8993 LearningRate 0.0008 Epoch: 19 Global Step: 80130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:05,068-Speed 12132.90 samples/sec Loss 2.9026 LearningRate 0.0008 Epoch: 19 Global Step: 80140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:08,499-Speed 11942.64 samples/sec Loss 2.9182 LearningRate 0.0008 Epoch: 19 Global Step: 80150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:11,934-Speed 11924.01 samples/sec Loss 2.9221 LearningRate 0.0008 Epoch: 19 Global Step: 80160 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:14:15,363-Speed 11949.01 samples/sec Loss 2.9054 LearningRate 0.0008 Epoch: 19 Global Step: 80170 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:14:18,804-Speed 11908.27 samples/sec Loss 2.9285 LearningRate 0.0008 Epoch: 19 Global Step: 80180 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:14:22,387-Speed 11435.37 samples/sec Loss 2.8925 LearningRate 0.0007 Epoch: 19 Global Step: 80190 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:14:25,970-Speed 11459.71 samples/sec Loss 2.9158 LearningRate 0.0007 Epoch: 19 Global Step: 80200 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:14:29,348-Speed 12127.37 samples/sec Loss 2.8842 LearningRate 0.0007 Epoch: 19 Global Step: 80210 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:14:32,821-Speed 11799.14 samples/sec Loss 2.9441 LearningRate 0.0007 Epoch: 19 Global Step: 80220 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:14:36,775-Speed 10361.74 samples/sec Loss 2.9396 LearningRate 0.0007 Epoch: 19 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:40,356-Speed 11438.68 samples/sec Loss 2.9119 LearningRate 0.0007 Epoch: 19 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:44,423-Speed 10075.67 samples/sec Loss 2.9326 LearningRate 0.0007 Epoch: 19 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:48,297-Speed 10574.68 samples/sec Loss 2.9509 LearningRate 0.0007 Epoch: 19 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:51,749-Speed 11869.35 samples/sec Loss 2.9032 LearningRate 0.0007 Epoch: 19 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:55,322-Speed 11467.78 samples/sec Loss 2.8998 LearningRate 0.0007 Epoch: 19 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:14:58,753-Speed 11941.94 samples/sec Loss 2.9152 LearningRate 0.0007 Epoch: 19 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:02,188-Speed 11928.23 samples/sec Loss 2.9335 LearningRate 0.0007 Epoch: 19 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:05,770-Speed 11436.51 samples/sec Loss 2.9196 LearningRate 0.0007 Epoch: 19 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:09,209-Speed 11913.37 samples/sec Loss 2.9167 LearningRate 0.0007 Epoch: 19 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:13,008-Speed 10783.37 samples/sec Loss 2.9329 LearningRate 0.0007 Epoch: 19 Global Step: 80330 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:15:16,554-Speed 11555.61 samples/sec Loss 2.8760 LearningRate 0.0007 Epoch: 19 Global Step: 80340 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:15:20,207-Speed 11215.15 samples/sec Loss 2.8987 LearningRate 0.0007 Epoch: 19 Global Step: 80350 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:15:23,998-Speed 10809.30 samples/sec Loss 2.9041 LearningRate 0.0007 Epoch: 19 Global Step: 80360 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:15:27,525-Speed 11615.75 samples/sec Loss 2.9293 LearningRate 0.0007 Epoch: 19 Global Step: 80370 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:15:30,983-Speed 11847.52 samples/sec Loss 2.8925 LearningRate 0.0007 Epoch: 19 Global Step: 80380 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:15:34,858-Speed 10574.89 samples/sec Loss 2.9219 LearningRate 0.0007 Epoch: 19 Global Step: 80390 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:39,180-Speed 9479.84 samples/sec Loss 2.8979 LearningRate 0.0007 Epoch: 19 Global Step: 80400 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:43,136-Speed 10355.46 samples/sec Loss 2.9204 LearningRate 0.0007 Epoch: 19 Global Step: 80410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:46,810-Speed 11151.75 samples/sec Loss 2.8558 LearningRate 0.0006 Epoch: 19 Global Step: 80420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:50,250-Speed 11909.13 samples/sec Loss 2.9464 LearningRate 0.0006 Epoch: 19 Global Step: 80430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:53,837-Speed 11421.15 samples/sec Loss 2.9343 LearningRate 0.0006 Epoch: 19 Global Step: 80440 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:15:57,376-Speed 11578.76 samples/sec Loss 2.9363 LearningRate 0.0006 Epoch: 19 Global Step: 80450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:01,259-Speed 10550.69 samples/sec Loss 2.9264 LearningRate 0.0006 Epoch: 19 Global Step: 80460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:05,060-Speed 10777.75 samples/sec Loss 2.9126 LearningRate 0.0006 Epoch: 19 Global Step: 80470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:08,701-Speed 11252.26 samples/sec Loss 2.8993 LearningRate 0.0006 Epoch: 19 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:12,403-Speed 11067.20 samples/sec Loss 2.9582 LearningRate 0.0006 Epoch: 19 Global Step: 80490 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:16:15,954-Speed 11539.12 samples/sec Loss 2.9331 LearningRate 0.0006 Epoch: 19 Global Step: 80500 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:16:19,355-Speed 12048.65 samples/sec Loss 2.9367 LearningRate 0.0006 Epoch: 19 Global Step: 80510 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:16:22,947-Speed 11406.62 samples/sec Loss 2.9178 LearningRate 0.0006 Epoch: 19 Global Step: 80520 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:16:26,599-Speed 11215.80 samples/sec Loss 2.9015 LearningRate 0.0006 Epoch: 19 Global Step: 80530 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:30,028-Speed 11951.55 samples/sec Loss 2.9261 LearningRate 0.0006 Epoch: 19 Global Step: 80540 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:33,904-Speed 10572.23 samples/sec Loss 2.9417 LearningRate 0.0006 Epoch: 19 Global Step: 80550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:37,689-Speed 10824.04 samples/sec Loss 2.9292 LearningRate 0.0006 Epoch: 19 Global Step: 80560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:42,399-Speed 8696.62 samples/sec Loss 2.9357 LearningRate 0.0006 Epoch: 19 Global Step: 80570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:45,834-Speed 11930.10 samples/sec Loss 2.8941 LearningRate 0.0006 Epoch: 19 Global Step: 80580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:49,259-Speed 11959.72 samples/sec Loss 2.8962 LearningRate 0.0006 Epoch: 19 Global Step: 80590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:52,799-Speed 11575.57 samples/sec Loss 2.9449 LearningRate 0.0006 Epoch: 19 Global Step: 80600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:16:56,358-Speed 11512.66 samples/sec Loss 2.9096 LearningRate 0.0006 Epoch: 19 Global Step: 80610 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:00,011-Speed 11215.19 samples/sec Loss 2.9184 LearningRate 0.0006 Epoch: 19 Global Step: 80620 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:03,464-Speed 11865.80 samples/sec Loss 2.8804 LearningRate 0.0006 Epoch: 19 Global Step: 80630 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:06,994-Speed 11604.36 samples/sec Loss 2.9666 LearningRate 0.0006 Epoch: 19 Global Step: 80640 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:10,789-Speed 10796.19 samples/sec Loss 2.9119 LearningRate 0.0006 Epoch: 19 Global Step: 80650 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:14,570-Speed 10836.03 samples/sec Loss 2.9030 LearningRate 0.0005 Epoch: 19 Global Step: 80660 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:18,087-Speed 11651.60 samples/sec Loss 2.9421 LearningRate 0.0005 Epoch: 19 Global Step: 80670 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:21,919-Speed 10691.79 samples/sec Loss 2.9370 LearningRate 0.0005 Epoch: 19 Global Step: 80680 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:25,712-Speed 10802.40 samples/sec Loss 2.9179 LearningRate 0.0005 Epoch: 19 Global Step: 80690 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:29,299-Speed 11421.44 samples/sec Loss 2.9457 LearningRate 0.0005 Epoch: 19 Global Step: 80700 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:32,986-Speed 11111.67 samples/sec Loss 2.9413 LearningRate 0.0005 Epoch: 19 Global Step: 80710 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:17:36,755-Speed 10869.02 samples/sec Loss 2.8999 LearningRate 0.0005 Epoch: 19 Global Step: 80720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:40,675-Speed 10451.93 samples/sec Loss 2.9450 LearningRate 0.0005 Epoch: 19 Global Step: 80730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:44,552-Speed 10566.91 samples/sec Loss 2.9174 LearningRate 0.0005 Epoch: 19 Global Step: 80740 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:48,122-Speed 11478.82 samples/sec Loss 2.8871 LearningRate 0.0005 Epoch: 19 Global Step: 80750 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:51,902-Speed 10839.34 samples/sec Loss 2.8891 LearningRate 0.0005 Epoch: 19 Global Step: 80760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:55,356-Speed 11862.80 samples/sec Loss 2.9278 LearningRate 0.0005 Epoch: 19 Global Step: 80770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:17:59,660-Speed 9517.96 samples/sec Loss 2.9387 LearningRate 0.0005 Epoch: 19 Global Step: 80780 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:18:03,356-Speed 11086.16 samples/sec Loss 2.9256 LearningRate 0.0005 Epoch: 19 Global Step: 80790 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:18:07,120-Speed 10884.59 samples/sec Loss 2.9290 LearningRate 0.0005 Epoch: 19 Global Step: 80800 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:18:11,009-Speed 10536.85 samples/sec Loss 2.9226 LearningRate 0.0005 Epoch: 19 Global Step: 80810 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:18:14,462-Speed 11864.74 samples/sec Loss 2.9083 LearningRate 0.0005 Epoch: 19 Global Step: 80820 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:18,448-Speed 10278.46 samples/sec Loss 2.9227 LearningRate 0.0005 Epoch: 19 Global Step: 80830 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:22,076-Speed 11291.10 samples/sec Loss 2.9208 LearningRate 0.0005 Epoch: 19 Global Step: 80840 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:25,582-Speed 11688.76 samples/sec Loss 2.9021 LearningRate 0.0005 Epoch: 19 Global Step: 80850 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:29,035-Speed 11862.85 samples/sec Loss 2.9201 LearningRate 0.0005 Epoch: 19 Global Step: 80860 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:32,469-Speed 11931.72 samples/sec Loss 2.9213 LearningRate 0.0005 Epoch: 19 Global Step: 80870 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:35,881-Speed 12006.80 samples/sec Loss 2.9256 LearningRate 0.0005 Epoch: 19 Global Step: 80880 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:39,297-Speed 11994.20 samples/sec Loss 2.8955 LearningRate 0.0005 Epoch: 19 Global Step: 80890 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:42,895-Speed 11386.39 samples/sec Loss 2.9307 LearningRate 0.0005 Epoch: 19 Global Step: 80900 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:46,501-Speed 11362.57 samples/sec Loss 2.9290 LearningRate 0.0005 Epoch: 19 Global Step: 80910 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:18:50,032-Speed 11603.77 samples/sec Loss 2.9365 LearningRate 0.0005 Epoch: 19 Global Step: 80920 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:18:53,732-Speed 11075.01 samples/sec Loss 2.8944 LearningRate 0.0004 Epoch: 19 Global Step: 80930 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:18:57,383-Speed 11219.95 samples/sec Loss 2.8931 LearningRate 0.0004 Epoch: 19 Global Step: 80940 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:01,375-Speed 10263.55 samples/sec Loss 2.9557 LearningRate 0.0004 Epoch: 19 Global Step: 80950 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:04,741-Speed 12173.00 samples/sec Loss 2.9107 LearningRate 0.0004 Epoch: 19 Global Step: 80960 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:08,295-Speed 11525.59 samples/sec Loss 2.9226 LearningRate 0.0004 Epoch: 19 Global Step: 80970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:11,942-Speed 11235.12 samples/sec Loss 2.9208 LearningRate 0.0004 Epoch: 19 Global Step: 80980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:16,565-Speed 8861.21 samples/sec Loss 2.9298 LearningRate 0.0004 Epoch: 19 Global Step: 80990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:19,994-Speed 11948.24 samples/sec Loss 2.9407 LearningRate 0.0004 Epoch: 19 Global Step: 81000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:23,339-Speed 12248.59 samples/sec Loss 2.9258 LearningRate 0.0004 Epoch: 19 Global Step: 81010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:19:26,734-Speed 12067.58 samples/sec Loss 2.9237 LearningRate 0.0004 Epoch: 19 Global Step: 81020 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:30,112-Speed 12131.25 samples/sec Loss 2.9733 LearningRate 0.0004 Epoch: 19 Global Step: 81030 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:33,456-Speed 12251.63 samples/sec Loss 2.9299 LearningRate 0.0004 Epoch: 19 Global Step: 81040 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:36,834-Speed 12129.13 samples/sec Loss 2.9226 LearningRate 0.0004 Epoch: 19 Global Step: 81050 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:40,247-Speed 12001.95 samples/sec Loss 2.9212 LearningRate 0.0004 Epoch: 19 Global Step: 81060 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:43,707-Speed 11844.19 samples/sec Loss 2.9464 LearningRate 0.0004 Epoch: 19 Global Step: 81070 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:47,318-Speed 11346.16 samples/sec Loss 2.9268 LearningRate 0.0004 Epoch: 19 Global Step: 81080 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:50,954-Speed 11266.78 samples/sec Loss 2.8991 LearningRate 0.0004 Epoch: 19 Global Step: 81090 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:54,375-Speed 11979.68 samples/sec Loss 2.9339 LearningRate 0.0004 Epoch: 19 Global Step: 81100 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:19:58,049-Speed 11151.07 samples/sec Loss 2.9387 LearningRate 0.0004 Epoch: 19 Global Step: 81110 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:02,214-Speed 9837.42 samples/sec Loss 2.9065 LearningRate 0.0004 Epoch: 19 Global Step: 81120 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:05,754-Speed 11574.71 samples/sec Loss 2.9125 LearningRate 0.0004 Epoch: 19 Global Step: 81130 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:09,398-Speed 11241.94 samples/sec Loss 2.9656 LearningRate 0.0004 Epoch: 19 Global Step: 81140 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:12,815-Speed 11989.31 samples/sec Loss 2.9019 LearningRate 0.0004 Epoch: 19 Global Step: 81150 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:16,422-Speed 11359.41 samples/sec Loss 2.9535 LearningRate 0.0004 Epoch: 19 Global Step: 81160 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:20,079-Speed 11202.57 samples/sec Loss 2.9137 LearningRate 0.0004 Epoch: 19 Global Step: 81170 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:23,531-Speed 11868.58 samples/sec Loss 2.9261 LearningRate 0.0004 Epoch: 19 Global Step: 81180 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:26,940-Speed 12018.61 samples/sec Loss 2.9220 LearningRate 0.0004 Epoch: 19 Global Step: 81190 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:30,504-Speed 11499.79 samples/sec Loss 2.9218 LearningRate 0.0004 Epoch: 19 Global Step: 81200 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:34,313-Speed 10754.90 samples/sec Loss 2.9090 LearningRate 0.0004 Epoch: 19 Global Step: 81210 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:37,971-Speed 11199.03 samples/sec Loss 2.9535 LearningRate 0.0003 Epoch: 19 Global Step: 81220 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:41,684-Speed 11032.80 samples/sec Loss 2.9305 LearningRate 0.0003 Epoch: 19 Global Step: 81230 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:45,426-Speed 10951.87 samples/sec Loss 2.9067 LearningRate 0.0003 Epoch: 19 Global Step: 81240 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:49,076-Speed 11222.57 samples/sec Loss 2.9642 LearningRate 0.0003 Epoch: 19 Global Step: 81250 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:52,863-Speed 10818.64 samples/sec Loss 2.9292 LearningRate 0.0003 Epoch: 19 Global Step: 81260 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:56,266-Speed 12036.32 samples/sec Loss 2.9390 LearningRate 0.0003 Epoch: 19 Global Step: 81270 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:20:59,835-Speed 11480.88 samples/sec Loss 2.9084 LearningRate 0.0003 Epoch: 19 Global Step: 81280 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:21:03,419-Speed 11433.14 samples/sec Loss 2.9523 LearningRate 0.0003 Epoch: 19 Global Step: 81290 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:21:07,207-Speed 10817.04 samples/sec Loss 2.9324 LearningRate 0.0003 Epoch: 19 Global Step: 81300 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:11,033-Speed 10707.61 samples/sec Loss 2.9209 LearningRate 0.0003 Epoch: 19 Global Step: 81310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:14,553-Speed 11638.22 samples/sec Loss 2.9300 LearningRate 0.0003 Epoch: 19 Global Step: 81320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:18,050-Speed 11713.48 samples/sec Loss 2.9352 LearningRate 0.0003 Epoch: 19 Global Step: 81330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:21,712-Speed 11190.44 samples/sec Loss 2.9210 LearningRate 0.0003 Epoch: 19 Global Step: 81340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:25,260-Speed 11546.33 samples/sec Loss 2.8830 LearningRate 0.0003 Epoch: 19 Global Step: 81350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:29,255-Speed 10255.21 samples/sec Loss 2.9484 LearningRate 0.0003 Epoch: 19 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:32,941-Speed 11116.80 samples/sec Loss 2.9034 LearningRate 0.0003 Epoch: 19 Global Step: 81370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:36,565-Speed 11304.82 samples/sec Loss 2.9229 LearningRate 0.0003 Epoch: 19 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:39,997-Speed 11938.82 samples/sec Loss 2.9416 LearningRate 0.0003 Epoch: 19 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:21:43,729-Speed 10979.01 samples/sec Loss 2.9150 LearningRate 0.0003 Epoch: 19 Global Step: 81400 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:21:47,574-Speed 10653.03 samples/sec Loss 2.9363 LearningRate 0.0003 Epoch: 19 Global Step: 81410 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:21:51,553-Speed 10297.64 samples/sec Loss 2.9016 LearningRate 0.0003 Epoch: 19 Global Step: 81420 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:21:55,010-Speed 11850.84 samples/sec Loss 2.9249 LearningRate 0.0003 Epoch: 19 Global Step: 81430 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:21:58,649-Speed 11259.05 samples/sec Loss 2.9180 LearningRate 0.0003 Epoch: 19 Global Step: 81440 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:22:02,607-Speed 10350.95 samples/sec Loss 2.9322 LearningRate 0.0003 Epoch: 19 Global Step: 81450 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:22:06,875-Speed 9599.55 samples/sec Loss 2.8993 LearningRate 0.0003 Epoch: 19 Global Step: 81460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:10,574-Speed 11075.35 samples/sec Loss 2.9390 LearningRate 0.0003 Epoch: 19 Global Step: 81470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:14,025-Speed 11874.95 samples/sec Loss 2.9253 LearningRate 0.0003 Epoch: 19 Global Step: 81480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:17,425-Speed 12048.74 samples/sec Loss 2.9156 LearningRate 0.0003 Epoch: 19 Global Step: 81490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:20,875-Speed 11874.95 samples/sec Loss 2.9729 LearningRate 0.0003 Epoch: 19 Global Step: 81500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:24,880-Speed 10228.90 samples/sec Loss 2.9525 LearningRate 0.0003 Epoch: 19 Global Step: 81510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:28,698-Speed 10732.08 samples/sec Loss 2.9517 LearningRate 0.0003 Epoch: 19 Global Step: 81520 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:32,385-Speed 11112.61 samples/sec Loss 2.9163 LearningRate 0.0003 Epoch: 19 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:35,808-Speed 11971.31 samples/sec Loss 2.8979 LearningRate 0.0003 Epoch: 19 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:39,289-Speed 11769.58 samples/sec Loss 2.9384 LearningRate 0.0003 Epoch: 19 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:43,174-Speed 10547.09 samples/sec Loss 2.9341 LearningRate 0.0003 Epoch: 19 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:46,744-Speed 11475.59 samples/sec Loss 2.8975 LearningRate 0.0002 Epoch: 19 Global Step: 81570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:50,375-Speed 11285.13 samples/sec Loss 2.9298 LearningRate 0.0002 Epoch: 19 Global Step: 81580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:54,070-Speed 11088.94 samples/sec Loss 2.9165 LearningRate 0.0002 Epoch: 19 Global Step: 81590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:22:57,566-Speed 11718.95 samples/sec Loss 2.9301 LearningRate 0.0002 Epoch: 19 Global Step: 81600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:01,217-Speed 11220.80 samples/sec Loss 2.8554 LearningRate 0.0002 Epoch: 19 Global Step: 81610 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:04,716-Speed 11711.77 samples/sec Loss 2.9201 LearningRate 0.0002 Epoch: 19 Global Step: 81620 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:08,689-Speed 10310.49 samples/sec Loss 2.9170 LearningRate 0.0002 Epoch: 19 Global Step: 81630 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:12,420-Speed 10984.06 samples/sec Loss 2.9336 LearningRate 0.0002 Epoch: 19 Global Step: 81640 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:16,344-Speed 10440.30 samples/sec Loss 2.9278 LearningRate 0.0002 Epoch: 19 Global Step: 81650 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:19,883-Speed 11575.07 samples/sec Loss 2.8912 LearningRate 0.0002 Epoch: 19 Global Step: 81660 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:23:23,495-Speed 11341.14 samples/sec Loss 2.9329 LearningRate 0.0002 Epoch: 19 Global Step: 81670 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:23:27,115-Speed 11318.05 samples/sec Loss 2.9419 LearningRate 0.0002 Epoch: 19 Global Step: 81680 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:23:30,907-Speed 10806.58 samples/sec Loss 2.9197 LearningRate 0.0002 Epoch: 19 Global Step: 81690 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:23:34,368-Speed 11839.22 samples/sec Loss 2.9263 LearningRate 0.0002 Epoch: 19 Global Step: 81700 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:23:38,141-Speed 10858.28 samples/sec Loss 2.9029 LearningRate 0.0002 Epoch: 19 Global Step: 81710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:42,271-Speed 9920.11 samples/sec Loss 2.9550 LearningRate 0.0002 Epoch: 19 Global Step: 81720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:45,764-Speed 11729.08 samples/sec Loss 2.8944 LearningRate 0.0002 Epoch: 19 Global Step: 81730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:49,291-Speed 11614.77 samples/sec Loss 2.9614 LearningRate 0.0002 Epoch: 19 Global Step: 81740 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:53,005-Speed 11029.79 samples/sec Loss 2.8759 LearningRate 0.0002 Epoch: 19 Global Step: 81750 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:56,537-Speed 11602.48 samples/sec Loss 2.9122 LearningRate 0.0002 Epoch: 19 Global Step: 81760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:23:59,940-Speed 12038.74 samples/sec Loss 2.9362 LearningRate 0.0002 Epoch: 19 Global Step: 81770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:03,490-Speed 11538.76 samples/sec Loss 2.9750 LearningRate 0.0002 Epoch: 19 Global Step: 81780 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:06,925-Speed 11931.57 samples/sec Loss 2.9679 LearningRate 0.0002 Epoch: 19 Global Step: 81790 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:11,141-Speed 9716.21 samples/sec Loss 2.8793 LearningRate 0.0002 Epoch: 19 Global Step: 81800 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:14,676-Speed 11590.82 samples/sec Loss 2.9200 LearningRate 0.0002 Epoch: 19 Global Step: 81810 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:24:18,423-Speed 10932.42 samples/sec Loss 2.9404 LearningRate 0.0002 Epoch: 19 Global Step: 81820 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:24:22,339-Speed 10463.89 samples/sec Loss 2.9153 LearningRate 0.0002 Epoch: 19 Global Step: 81830 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:25,946-Speed 11357.26 samples/sec Loss 2.9453 LearningRate 0.0002 Epoch: 19 Global Step: 81840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:29,607-Speed 11191.31 samples/sec Loss 2.9225 LearningRate 0.0002 Epoch: 19 Global Step: 81850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:33,103-Speed 11718.22 samples/sec Loss 2.9495 LearningRate 0.0002 Epoch: 19 Global Step: 81860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:36,518-Speed 11999.99 samples/sec Loss 2.9058 LearningRate 0.0002 Epoch: 19 Global Step: 81870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:40,377-Speed 10615.96 samples/sec Loss 2.9727 LearningRate 0.0002 Epoch: 19 Global Step: 81880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:43,901-Speed 11626.65 samples/sec Loss 2.9432 LearningRate 0.0002 Epoch: 19 Global Step: 81890 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:47,350-Speed 11877.37 samples/sec Loss 2.9294 LearningRate 0.0002 Epoch: 19 Global Step: 81900 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:50,850-Speed 11707.21 samples/sec Loss 2.9382 LearningRate 0.0002 Epoch: 19 Global Step: 81910 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:54,505-Speed 11208.68 samples/sec Loss 2.9031 LearningRate 0.0002 Epoch: 19 Global Step: 81920 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:24:58,292-Speed 10819.16 samples/sec Loss 2.9500 LearningRate 0.0002 Epoch: 19 Global Step: 81930 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:01,945-Speed 11214.67 samples/sec Loss 2.9362 LearningRate 0.0002 Epoch: 19 Global Step: 81940 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:05,720-Speed 10853.47 samples/sec Loss 2.8874 LearningRate 0.0002 Epoch: 19 Global Step: 81950 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:09,422-Speed 11067.07 samples/sec Loss 2.9597 LearningRate 0.0002 Epoch: 19 Global Step: 81960 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:14,026-Speed 8898.22 samples/sec Loss 2.9247 LearningRate 0.0002 Epoch: 19 Global Step: 81970 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:17,471-Speed 11894.93 samples/sec Loss 2.9227 LearningRate 0.0002 Epoch: 19 Global Step: 81980 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:20,939-Speed 11814.01 samples/sec Loss 2.9280 LearningRate 0.0001 Epoch: 19 Global Step: 81990 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:24,496-Speed 11515.54 samples/sec Loss 2.9410 LearningRate 0.0001 Epoch: 19 Global Step: 82000 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:25:28,334-Speed 10676.73 samples/sec Loss 2.9038 LearningRate 0.0001 Epoch: 19 Global Step: 82010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:32,051-Speed 11022.15 samples/sec Loss 2.8873 LearningRate 0.0001 Epoch: 19 Global Step: 82020 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:35,650-Speed 11382.76 samples/sec Loss 2.9279 LearningRate 0.0001 Epoch: 19 Global Step: 82030 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:39,442-Speed 10805.32 samples/sec Loss 2.8885 LearningRate 0.0001 Epoch: 19 Global Step: 82040 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:42,901-Speed 11846.11 samples/sec Loss 2.8901 LearningRate 0.0001 Epoch: 19 Global Step: 82050 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:46,383-Speed 11764.00 samples/sec Loss 2.9072 LearningRate 0.0001 Epoch: 19 Global Step: 82060 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:50,242-Speed 10615.76 samples/sec Loss 2.9313 LearningRate 0.0001 Epoch: 19 Global Step: 82070 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:53,806-Speed 11508.50 samples/sec Loss 2.9518 LearningRate 0.0001 Epoch: 19 Global Step: 82080 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:25:57,438-Speed 11283.02 samples/sec Loss 2.9394 LearningRate 0.0001 Epoch: 19 Global Step: 82090 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:00,945-Speed 11680.72 samples/sec Loss 2.9282 LearningRate 0.0001 Epoch: 19 Global Step: 82100 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:04,340-Speed 12068.36 samples/sec Loss 2.9160 LearningRate 0.0001 Epoch: 19 Global Step: 82110 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:26:07,816-Speed 11795.33 samples/sec Loss 2.9353 LearningRate 0.0001 Epoch: 19 Global Step: 82120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:11,303-Speed 11750.58 samples/sec Loss 2.9492 LearningRate 0.0001 Epoch: 19 Global Step: 82130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:15,444-Speed 9892.75 samples/sec Loss 2.9206 LearningRate 0.0001 Epoch: 19 Global Step: 82140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:18,911-Speed 11818.52 samples/sec Loss 2.9257 LearningRate 0.0001 Epoch: 19 Global Step: 82150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:22,770-Speed 10616.35 samples/sec Loss 2.9379 LearningRate 0.0001 Epoch: 19 Global Step: 82160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:26,244-Speed 11793.11 samples/sec Loss 2.9307 LearningRate 0.0001 Epoch: 19 Global Step: 82170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:26:29,798-Speed 11526.34 samples/sec Loss 2.9347 LearningRate 0.0001 Epoch: 19 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:33,582-Speed 10828.29 samples/sec Loss 2.9009 LearningRate 0.0001 Epoch: 19 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:37,059-Speed 11784.47 samples/sec Loss 2.8980 LearningRate 0.0001 Epoch: 19 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:40,987-Speed 10431.25 samples/sec Loss 2.9163 LearningRate 0.0001 Epoch: 19 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:44,559-Speed 11471.09 samples/sec Loss 2.8936 LearningRate 0.0001 Epoch: 19 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:48,240-Speed 11129.99 samples/sec Loss 2.9287 LearningRate 0.0001 Epoch: 19 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:51,731-Speed 11737.75 samples/sec Loss 2.9028 LearningRate 0.0001 Epoch: 19 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:55,207-Speed 11786.34 samples/sec Loss 2.9252 LearningRate 0.0001 Epoch: 19 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:26:58,745-Speed 11581.19 samples/sec Loss 2.9080 LearningRate 0.0001 Epoch: 19 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:27:02,328-Speed 11431.66 samples/sec Loss 2.8955 LearningRate 0.0001 Epoch: 19 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-17 06:27:05,905-Speed 11456.59 samples/sec Loss 2.9180 LearningRate 0.0001 Epoch: 19 Global Step: 82280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:09,473-Speed 11480.87 samples/sec Loss 2.9160 LearningRate 0.0001 Epoch: 19 Global Step: 82290 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:13,134-Speed 11192.15 samples/sec Loss 2.9211 LearningRate 0.0001 Epoch: 19 Global Step: 82300 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:16,621-Speed 11747.22 samples/sec Loss 2.9457 LearningRate 0.0001 Epoch: 19 Global Step: 82310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:20,134-Speed 11664.84 samples/sec Loss 2.9057 LearningRate 0.0001 Epoch: 19 Global Step: 82320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:24,016-Speed 10553.46 samples/sec Loss 2.9223 LearningRate 0.0001 Epoch: 19 Global Step: 82330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:27,639-Speed 11306.68 samples/sec Loss 2.8943 LearningRate 0.0001 Epoch: 19 Global Step: 82340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:31,282-Speed 11247.91 samples/sec Loss 2.9597 LearningRate 0.0001 Epoch: 19 Global Step: 82350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:34,698-Speed 11995.15 samples/sec Loss 2.9118 LearningRate 0.0001 Epoch: 19 Global Step: 82360 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:38,949-Speed 9636.64 samples/sec Loss 2.9243 LearningRate 0.0001 Epoch: 19 Global Step: 82370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:42,376-Speed 11958.83 samples/sec Loss 2.8990 LearningRate 0.0001 Epoch: 19 Global Step: 82380 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:27:46,101-Speed 10998.89 samples/sec Loss 2.9348 LearningRate 0.0001 Epoch: 19 Global Step: 82390 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:27:49,568-Speed 11820.35 samples/sec Loss 2.9196 LearningRate 0.0001 Epoch: 19 Global Step: 82400 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:27:53,090-Speed 11630.29 samples/sec Loss 2.9143 LearningRate 0.0001 Epoch: 19 Global Step: 82410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:27:56,508-Speed 11987.83 samples/sec Loss 2.9355 LearningRate 0.0001 Epoch: 19 Global Step: 82420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:00,091-Speed 11435.35 samples/sec Loss 2.9245 LearningRate 0.0001 Epoch: 19 Global Step: 82430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:03,579-Speed 11743.77 samples/sec Loss 2.8972 LearningRate 0.0001 Epoch: 19 Global Step: 82440 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:07,374-Speed 10796.57 samples/sec Loss 2.8889 LearningRate 0.0001 Epoch: 19 Global Step: 82450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:11,001-Speed 11295.76 samples/sec Loss 2.9449 LearningRate 0.0001 Epoch: 19 Global Step: 82460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:14,524-Speed 11629.48 samples/sec Loss 2.9277 LearningRate 0.0001 Epoch: 19 Global Step: 82470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:18,268-Speed 10941.48 samples/sec Loss 2.9577 LearningRate 0.0001 Epoch: 19 Global Step: 82480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:21,691-Speed 11970.35 samples/sec Loss 2.8902 LearningRate 0.0001 Epoch: 19 Global Step: 82490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:25,182-Speed 11736.74 samples/sec Loss 2.9134 LearningRate 0.0001 Epoch: 19 Global Step: 82500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:28,681-Speed 11710.85 samples/sec Loss 2.9147 LearningRate 0.0001 Epoch: 19 Global Step: 82510 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:28:32,312-Speed 11283.03 samples/sec Loss 2.9239 LearningRate 0.0001 Epoch: 19 Global Step: 82520 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:28:35,855-Speed 11562.97 samples/sec Loss 2.9068 LearningRate 0.0001 Epoch: 19 Global Step: 82530 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:28:40,014-Speed 9849.89 samples/sec Loss 2.9196 LearningRate 0.0001 Epoch: 19 Global Step: 82540 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:28:43,853-Speed 10672.54 samples/sec Loss 2.9003 LearningRate 0.0001 Epoch: 19 Global Step: 82550 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:28:47,453-Speed 11381.34 samples/sec Loss 2.9009 LearningRate 0.0001 Epoch: 19 Global Step: 82560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:50,924-Speed 11803.18 samples/sec Loss 2.9353 LearningRate 0.0001 Epoch: 19 Global Step: 82570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:54,642-Speed 11020.65 samples/sec Loss 2.9115 LearningRate 0.0001 Epoch: 19 Global Step: 82580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:28:58,145-Speed 11694.22 samples/sec Loss 2.9342 LearningRate 0.0001 Epoch: 19 Global Step: 82590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:01,578-Speed 11932.67 samples/sec Loss 2.9344 LearningRate 0.0001 Epoch: 19 Global Step: 82600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:05,211-Speed 11278.44 samples/sec Loss 2.9259 LearningRate 0.0000 Epoch: 19 Global Step: 82610 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:08,551-Speed 12265.86 samples/sec Loss 2.9316 LearningRate 0.0000 Epoch: 19 Global Step: 82620 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:12,081-Speed 11607.01 samples/sec Loss 2.9522 LearningRate 0.0000 Epoch: 19 Global Step: 82630 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:15,981-Speed 10505.45 samples/sec Loss 2.9065 LearningRate 0.0000 Epoch: 19 Global Step: 82640 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:19,460-Speed 11775.77 samples/sec Loss 2.9270 LearningRate 0.0000 Epoch: 19 Global Step: 82650 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:23,225-Speed 10880.41 samples/sec Loss 2.9038 LearningRate 0.0000 Epoch: 19 Global Step: 82660 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:29:27,033-Speed 10760.91 samples/sec Loss 2.9238 LearningRate 0.0000 Epoch: 19 Global Step: 82670 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:29:30,826-Speed 10802.51 samples/sec Loss 2.8955 LearningRate 0.0000 Epoch: 19 Global Step: 82680 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:29:34,452-Speed 11298.65 samples/sec Loss 2.9287 LearningRate 0.0000 Epoch: 19 Global Step: 82690 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:29:37,962-Speed 11672.25 samples/sec Loss 2.9080 LearningRate 0.0000 Epoch: 19 Global Step: 82700 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:29:41,809-Speed 10647.70 samples/sec Loss 2.9219 LearningRate 0.0000 Epoch: 19 Global Step: 82710 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:29:45,313-Speed 11695.83 samples/sec Loss 2.9472 LearningRate 0.0000 Epoch: 19 Global Step: 82720 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:29:49,123-Speed 10752.23 samples/sec Loss 2.9388 LearningRate 0.0000 Epoch: 19 Global Step: 82730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:52,825-Speed 11066.31 samples/sec Loss 2.9327 LearningRate 0.0000 Epoch: 19 Global Step: 82740 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:56,348-Speed 11629.38 samples/sec Loss 2.9399 LearningRate 0.0000 Epoch: 19 Global Step: 82750 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:29:59,795-Speed 11885.38 samples/sec Loss 2.9218 LearningRate 0.0000 Epoch: 19 Global Step: 82760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:03,333-Speed 11581.64 samples/sec Loss 2.9190 LearningRate 0.0000 Epoch: 19 Global Step: 82770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:07,272-Speed 10400.36 samples/sec Loss 2.9200 LearningRate 0.0000 Epoch: 19 Global Step: 82780 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:11,023-Speed 10923.57 samples/sec Loss 2.9344 LearningRate 0.0000 Epoch: 19 Global Step: 82790 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:15,143-Speed 9944.18 samples/sec Loss 2.9059 LearningRate 0.0000 Epoch: 19 Global Step: 82800 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:18,690-Speed 11548.89 samples/sec Loss 2.9034 LearningRate 0.0000 Epoch: 19 Global Step: 82810 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:22,572-Speed 10556.57 samples/sec Loss 2.9261 LearningRate 0.0000 Epoch: 19 Global Step: 82820 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:26,410-Speed 10674.69 samples/sec Loss 2.9328 LearningRate 0.0000 Epoch: 19 Global Step: 82830 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:29,975-Speed 11491.71 samples/sec Loss 2.8973 LearningRate 0.0000 Epoch: 19 Global Step: 82840 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:33,609-Speed 11273.95 samples/sec Loss 2.9172 LearningRate 0.0000 Epoch: 19 Global Step: 82850 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:37,345-Speed 10964.70 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 19 Global Step: 82860 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:40,855-Speed 11673.90 samples/sec Loss 2.8956 LearningRate 0.0000 Epoch: 19 Global Step: 82870 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:45,276-Speed 9267.46 samples/sec Loss 2.9186 LearningRate 0.0000 Epoch: 19 Global Step: 82880 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:48,702-Speed 11957.91 samples/sec Loss 2.9016 LearningRate 0.0000 Epoch: 19 Global Step: 82890 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:52,415-Speed 11034.68 samples/sec Loss 2.9157 LearningRate 0.0000 Epoch: 19 Global Step: 82900 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:30:56,338-Speed 10442.19 samples/sec Loss 2.9526 LearningRate 0.0000 Epoch: 19 Global Step: 82910 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:30:59,768-Speed 11945.42 samples/sec Loss 2.8992 LearningRate 0.0000 Epoch: 19 Global Step: 82920 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:03,361-Speed 11402.50 samples/sec Loss 2.9450 LearningRate 0.0000 Epoch: 19 Global Step: 82930 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:07,111-Speed 10925.14 samples/sec Loss 2.9629 LearningRate 0.0000 Epoch: 19 Global Step: 82940 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:11,042-Speed 10421.96 samples/sec Loss 2.9289 LearningRate 0.0000 Epoch: 19 Global Step: 82950 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:14,545-Speed 11695.54 samples/sec Loss 2.9299 LearningRate 0.0000 Epoch: 19 Global Step: 82960 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:18,261-Speed 11026.53 samples/sec Loss 2.9452 LearningRate 0.0000 Epoch: 19 Global Step: 82970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:22,082-Speed 10722.37 samples/sec Loss 2.9283 LearningRate 0.0000 Epoch: 19 Global Step: 82980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:25,807-Speed 10999.49 samples/sec Loss 2.8796 LearningRate 0.0000 Epoch: 19 Global Step: 82990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:29,659-Speed 10636.88 samples/sec Loss 2.9484 LearningRate 0.0000 Epoch: 19 Global Step: 83000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:31:33,225-Speed 11488.01 samples/sec Loss 2.8971 LearningRate 0.0000 Epoch: 19 Global Step: 83010 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:31:36,808-Speed 11432.14 samples/sec Loss 2.9450 LearningRate 0.0000 Epoch: 19 Global Step: 83020 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:31:40,393-Speed 11429.03 samples/sec Loss 2.9180 LearningRate 0.0000 Epoch: 19 Global Step: 83030 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:31:43,991-Speed 11389.40 samples/sec Loss 2.9583 LearningRate 0.0000 Epoch: 19 Global Step: 83040 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:31:47,509-Speed 11642.94 samples/sec Loss 2.9209 LearningRate 0.0000 Epoch: 19 Global Step: 83050 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:31:51,757-Speed 9645.12 samples/sec Loss 2.8841 LearningRate 0.0000 Epoch: 19 Global Step: 83060 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:31:55,222-Speed 11823.33 samples/sec Loss 2.9245 LearningRate 0.0000 Epoch: 19 Global Step: 83070 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:31:58,676-Speed 11864.30 samples/sec Loss 2.9365 LearningRate 0.0000 Epoch: 19 Global Step: 83080 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:32:02,180-Speed 11691.24 samples/sec Loss 2.9495 LearningRate 0.0000 Epoch: 19 Global Step: 83090 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:32:06,081-Speed 10505.14 samples/sec Loss 2.9010 LearningRate 0.0000 Epoch: 19 Global Step: 83100 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:32:10,166-Speed 10026.25 samples/sec Loss 2.8984 LearningRate 0.0000 Epoch: 19 Global Step: 83110 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:13,632-Speed 11824.38 samples/sec Loss 2.9208 LearningRate 0.0000 Epoch: 19 Global Step: 83120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:17,390-Speed 10900.21 samples/sec Loss 2.9163 LearningRate 0.0000 Epoch: 19 Global Step: 83130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:20,933-Speed 11565.74 samples/sec Loss 2.9214 LearningRate 0.0000 Epoch: 19 Global Step: 83140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:24,485-Speed 11535.24 samples/sec Loss 2.9069 LearningRate 0.0000 Epoch: 19 Global Step: 83150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:28,219-Speed 10981.05 samples/sec Loss 2.9612 LearningRate 0.0000 Epoch: 19 Global Step: 83160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:31,911-Speed 11095.76 samples/sec Loss 2.8934 LearningRate 0.0000 Epoch: 19 Global Step: 83170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:35,501-Speed 11412.40 samples/sec Loss 2.9220 LearningRate 0.0000 Epoch: 19 Global Step: 83180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:39,029-Speed 11612.18 samples/sec Loss 2.9481 LearningRate 0.0000 Epoch: 19 Global Step: 83190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:42,955-Speed 10438.97 samples/sec Loss 2.9090 LearningRate 0.0000 Epoch: 19 Global Step: 83200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:32:46,907-Speed 10363.77 samples/sec Loss 2.8980 LearningRate 0.0000 Epoch: 19 Global Step: 83210 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:32:50,421-Speed 11661.18 samples/sec Loss 2.9165 LearningRate 0.0000 Epoch: 19 Global Step: 83220 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:32:54,199-Speed 10843.01 samples/sec Loss 2.9322 LearningRate 0.0000 Epoch: 19 Global Step: 83230 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:32:57,860-Speed 11192.47 samples/sec Loss 2.9156 LearningRate 0.0000 Epoch: 19 Global Step: 83240 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:01,389-Speed 11609.03 samples/sec Loss 2.9477 LearningRate 0.0000 Epoch: 19 Global Step: 83250 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:04,871-Speed 11767.30 samples/sec Loss 2.9288 LearningRate 0.0000 Epoch: 19 Global Step: 83260 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:08,374-Speed 11695.24 samples/sec Loss 2.8925 LearningRate 0.0000 Epoch: 19 Global Step: 83270 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:12,043-Speed 11167.58 samples/sec Loss 2.9373 LearningRate 0.0000 Epoch: 19 Global Step: 83280 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:15,684-Speed 11250.50 samples/sec Loss 2.9355 LearningRate 0.0000 Epoch: 19 Global Step: 83290 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:19,144-Speed 11841.99 samples/sec Loss 2.9418 LearningRate 0.0000 Epoch: 19 Global Step: 83300 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:22,615-Speed 11804.52 samples/sec Loss 2.9175 LearningRate 0.0000 Epoch: 19 Global Step: 83310 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:26,245-Speed 11288.23 samples/sec Loss 2.9145 LearningRate 0.0000 Epoch: 19 Global Step: 83320 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:29,888-Speed 11244.13 samples/sec Loss 2.9226 LearningRate 0.0000 Epoch: 19 Global Step: 83330 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:34,175-Speed 9556.49 samples/sec Loss 2.9372 LearningRate 0.0000 Epoch: 19 Global Step: 83340 Fp16 Grad Scale: 262144 Required: 0 hours Training: 2022-01-17 06:33:37,882-Speed 11052.14 samples/sec Loss 2.9372 LearningRate 0.0000 Epoch: 19 Global Step: 83350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:33:41,605-Speed 11005.37 samples/sec Loss 2.9367 LearningRate 0.0000 Epoch: 19 Global Step: 83360 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:33:45,333-Speed 10989.66 samples/sec Loss 2.9785 LearningRate 0.0000 Epoch: 19 Global Step: 83370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:33:48,816-Speed 11761.32 samples/sec Loss 2.9266 LearningRate 0.0000 Epoch: 19 Global Step: 83380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:33:52,327-Speed 11671.99 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 19 Global Step: 83390 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:33:55,743-Speed 11993.36 samples/sec Loss 2.9383 LearningRate 0.0000 Epoch: 19 Global Step: 83400 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:33:59,570-Speed 10705.43 samples/sec Loss 2.9025 LearningRate 0.0000 Epoch: 19 Global Step: 83410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:34:03,156-Speed 11425.60 samples/sec Loss 2.9363 LearningRate 0.0000 Epoch: 19 Global Step: 83420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:34:07,241-Speed 10029.08 samples/sec Loss 2.9380 LearningRate 0.0000 Epoch: 19 Global Step: 83430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-17 06:34:10,609-Speed 12163.14 samples/sec Loss 2.9412 LearningRate 0.0000 Epoch: 19 Global Step: 83440 Fp16 Grad Scale: 131072 Required: -0 hours