Training: 2022-01-13 21:56:52,866-rank_id: 0 Training: 2022-01-13 21:57:15,756-rank_id: 0 Training: 2022-01-13 21:57:36,225-: loss cosface Training: 2022-01-13 21:57:36,225-: network mbf Training: 2022-01-13 21:57:36,225-: resume False Training: 2022-01-13 21:57:36,225-: output work_dirs/webface42m_mobilefacenet_pfc02_bs8k_16gpus Training: 2022-01-13 21:57:36,226-: embedding_size 512 Training: 2022-01-13 21:57:36,226-: sample_rate 0.2 Training: 2022-01-13 21:57:36,226-: fp16 True Training: 2022-01-13 21:57:36,226-: momentum 0.9 Training: 2022-01-13 21:57:36,226-: weight_decay 0.0001 Training: 2022-01-13 21:57:36,226-: batch_size 512 Training: 2022-01-13 21:57:36,226-: lr 0.4 Training: 2022-01-13 21:57:36,226-: dali True Training: 2022-01-13 21:57:36,226-: verbose 10000 Training: 2022-01-13 21:57:36,226-: frequent 10 Training: 2022-01-13 21:57:36,226-: score None Training: 2022-01-13 21:57:36,226-: rec /train_tmp/WebFace42M Training: 2022-01-13 21:57:36,226-: num_classes 2059906 Training: 2022-01-13 21:57:36,226-: num_image 42474557 Training: 2022-01-13 21:57:36,226-: num_epoch 20 Training: 2022-01-13 21:57:36,226-: warmup_epoch 2 Training: 2022-01-13 21:57:36,226-: val_targets [] Training: 2022-01-13 21:57:36,226-: warmup_step 10368 Training: 2022-01-13 21:57:36,226-: total_step 103680 Training: 2022-01-13 21:57:40,967-Reducer buckets have been rebuilt in this iteration. Training: 2022-01-13 21:57:49,698-Speed 18478.05 samples/sec Loss 42.4905 LearningRate 0.0008 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-01-13 21:57:54,123-Speed 18521.48 samples/sec Loss 42.4926 LearningRate 0.0012 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-01-13 21:57:58,536-Speed 18570.98 samples/sec Loss 42.4971 LearningRate 0.0015 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-01-13 21:58:02,932-Speed 18641.72 samples/sec Loss 42.4976 LearningRate 0.0019 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-01-13 21:58:07,308-Speed 18723.59 samples/sec Loss 42.4922 LearningRate 0.0023 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-01-13 21:58:11,683-Speed 18730.38 samples/sec Loss 42.4977 LearningRate 0.0027 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-01-13 21:58:16,104-Speed 18535.82 samples/sec Loss 42.5029 LearningRate 0.0031 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-01-13 21:58:20,513-Speed 18583.41 samples/sec Loss 42.5007 LearningRate 0.0035 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-01-13 21:58:24,910-Speed 18636.70 samples/sec Loss 42.5020 LearningRate 0.0039 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-01-13 21:58:29,324-Speed 18561.83 samples/sec Loss 42.4798 LearningRate 0.0042 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-13 21:58:33,742-Speed 18550.20 samples/sec Loss 42.4923 LearningRate 0.0046 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-13 21:58:38,129-Speed 18679.38 samples/sec Loss 42.4888 LearningRate 0.0050 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-13 21:58:42,533-Speed 18609.50 samples/sec Loss 42.4848 LearningRate 0.0054 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-01-13 21:58:46,936-Speed 18608.45 samples/sec Loss 42.4821 LearningRate 0.0058 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-13 21:58:51,345-Speed 18582.64 samples/sec Loss 42.4757 LearningRate 0.0062 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-13 21:58:55,734-Speed 18671.34 samples/sec Loss 42.4663 LearningRate 0.0066 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-13 21:59:00,177-Speed 18441.75 samples/sec Loss 42.4812 LearningRate 0.0069 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-13 21:59:04,723-Speed 18027.18 samples/sec Loss 42.4785 LearningRate 0.0073 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-13 21:59:09,113-Speed 18665.12 samples/sec Loss 42.4572 LearningRate 0.0077 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-01-13 21:59:13,539-Speed 18511.97 samples/sec Loss 42.4354 LearningRate 0.0081 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:17,935-Speed 18643.64 samples/sec Loss 42.4312 LearningRate 0.0085 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:23,350-Speed 15131.91 samples/sec Loss 42.4220 LearningRate 0.0089 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:27,812-Speed 18366.79 samples/sec Loss 42.4024 LearningRate 0.0093 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:32,259-Speed 18423.84 samples/sec Loss 42.3815 LearningRate 0.0096 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:36,678-Speed 18545.39 samples/sec Loss 42.3401 LearningRate 0.0100 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:41,068-Speed 18663.04 samples/sec Loss 42.2910 LearningRate 0.0104 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:45,476-Speed 18592.63 samples/sec Loss 42.2236 LearningRate 0.0108 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:49,866-Speed 18665.60 samples/sec Loss 42.1396 LearningRate 0.0112 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:54,328-Speed 18367.95 samples/sec Loss 42.0398 LearningRate 0.0116 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 21:59:58,739-Speed 18574.96 samples/sec Loss 41.8998 LearningRate 0.0120 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:03,153-Speed 18564.62 samples/sec Loss 41.7638 LearningRate 0.0123 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:07,570-Speed 18548.77 samples/sec Loss 41.6354 LearningRate 0.0127 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:12,226-Speed 17600.88 samples/sec Loss 41.4648 LearningRate 0.0131 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:16,643-Speed 18553.97 samples/sec Loss 41.3164 LearningRate 0.0135 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:21,069-Speed 18522.71 samples/sec Loss 41.1671 LearningRate 0.0139 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:25,477-Speed 18590.01 samples/sec Loss 41.0137 LearningRate 0.0143 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:30,042-Speed 17951.88 samples/sec Loss 40.8405 LearningRate 0.0147 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:34,477-Speed 18481.26 samples/sec Loss 40.7028 LearningRate 0.0150 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:38,891-Speed 18565.00 samples/sec Loss 40.5680 LearningRate 0.0154 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:43,358-Speed 18340.32 samples/sec Loss 40.4548 LearningRate 0.0158 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:47,821-Speed 18365.37 samples/sec Loss 40.3277 LearningRate 0.0162 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:52,254-Speed 18486.99 samples/sec Loss 40.2201 LearningRate 0.0166 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:00:56,679-Speed 18521.69 samples/sec Loss 40.1021 LearningRate 0.0170 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:01,127-Speed 18419.50 samples/sec Loss 40.0044 LearningRate 0.0174 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:05,520-Speed 18654.38 samples/sec Loss 39.9283 LearningRate 0.0177 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:09,978-Speed 18379.61 samples/sec Loss 39.8510 LearningRate 0.0181 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:14,531-Speed 17997.72 samples/sec Loss 39.7363 LearningRate 0.0185 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:19,017-Speed 18269.20 samples/sec Loss 39.6631 LearningRate 0.0189 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:23,468-Speed 18408.98 samples/sec Loss 39.6036 LearningRate 0.0193 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:27,864-Speed 18637.68 samples/sec Loss 39.5415 LearningRate 0.0197 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:32,331-Speed 18345.59 samples/sec Loss 39.4746 LearningRate 0.0201 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:36,777-Speed 18431.11 samples/sec Loss 39.4110 LearningRate 0.0204 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:41,206-Speed 18499.41 samples/sec Loss 39.3675 LearningRate 0.0208 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:45,616-Speed 18580.83 samples/sec Loss 39.3010 LearningRate 0.0212 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:50,094-Speed 18296.99 samples/sec Loss 39.2824 LearningRate 0.0216 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:01:54,522-Speed 18504.08 samples/sec Loss 39.2220 LearningRate 0.0220 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:01:58,932-Speed 18584.87 samples/sec Loss 39.1891 LearningRate 0.0224 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:03,314-Speed 18697.29 samples/sec Loss 39.1533 LearningRate 0.0228 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:07,721-Speed 18594.11 samples/sec Loss 39.1355 LearningRate 0.0231 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:12,174-Speed 18401.02 samples/sec Loss 39.0847 LearningRate 0.0235 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:16,634-Speed 18370.05 samples/sec Loss 39.0573 LearningRate 0.0239 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:21,821-Speed 15798.93 samples/sec Loss 39.0350 LearningRate 0.0243 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:26,270-Speed 18418.75 samples/sec Loss 39.0001 LearningRate 0.0247 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:30,665-Speed 18644.85 samples/sec Loss 38.9944 LearningRate 0.0251 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:35,058-Speed 18652.84 samples/sec Loss 38.9734 LearningRate 0.0255 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:02:39,503-Speed 18433.49 samples/sec Loss 38.9649 LearningRate 0.0258 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:02:43,937-Speed 18482.40 samples/sec Loss 38.9605 LearningRate 0.0262 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:02:48,330-Speed 18651.47 samples/sec Loss 38.9634 LearningRate 0.0266 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:02:52,751-Speed 18536.58 samples/sec Loss 38.9415 LearningRate 0.0270 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:02:57,191-Speed 18451.72 samples/sec Loss 38.9339 LearningRate 0.0274 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:01,590-Speed 18631.62 samples/sec Loss 38.9172 LearningRate 0.0278 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:06,038-Speed 18420.82 samples/sec Loss 38.9244 LearningRate 0.0282 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:10,451-Speed 18567.76 samples/sec Loss 38.9204 LearningRate 0.0285 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:14,853-Speed 18610.56 samples/sec Loss 38.9092 LearningRate 0.0289 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:19,361-Speed 18177.43 samples/sec Loss 38.9007 LearningRate 0.0293 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:23,770-Speed 18583.40 samples/sec Loss 38.9057 LearningRate 0.0297 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:03:28,156-Speed 18685.24 samples/sec Loss 38.9090 LearningRate 0.0301 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:32,628-Speed 18320.33 samples/sec Loss 38.9425 LearningRate 0.0305 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:39,192-Speed 12482.97 samples/sec Loss 38.9328 LearningRate 0.0309 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:43,647-Speed 18389.57 samples/sec Loss 38.9307 LearningRate 0.0312 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:03:48,042-Speed 18642.32 samples/sec Loss 38.9312 LearningRate 0.0316 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:03:52,466-Speed 18523.59 samples/sec Loss 38.9435 LearningRate 0.0320 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:03:56,862-Speed 18638.67 samples/sec Loss 38.9311 LearningRate 0.0324 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:01,247-Speed 18686.03 samples/sec Loss 38.9630 LearningRate 0.0328 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:05,818-Speed 17926.62 samples/sec Loss 38.9677 LearningRate 0.0332 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:10,322-Speed 18195.16 samples/sec Loss 38.9757 LearningRate 0.0336 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:14,783-Speed 18367.14 samples/sec Loss 38.9701 LearningRate 0.0340 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:19,228-Speed 18431.63 samples/sec Loss 38.9719 LearningRate 0.0343 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:23,683-Speed 18392.55 samples/sec Loss 38.9752 LearningRate 0.0347 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:28,191-Speed 18178.99 samples/sec Loss 38.9786 LearningRate 0.0351 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-01-13 22:04:32,624-Speed 18480.95 samples/sec Loss 39.0067 LearningRate 0.0355 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:04:37,037-Speed 18569.04 samples/sec Loss 39.0036 LearningRate 0.0359 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:04:41,437-Speed 18622.98 samples/sec Loss 39.0112 LearningRate 0.0363 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:04:45,826-Speed 18666.57 samples/sec Loss 39.0126 LearningRate 0.0367 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:04:50,243-Speed 18551.72 samples/sec Loss 39.0325 LearningRate 0.0370 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:04:54,673-Speed 18494.83 samples/sec Loss 39.0157 LearningRate 0.0374 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:04:59,075-Speed 18614.23 samples/sec Loss 39.0165 LearningRate 0.0378 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:03,515-Speed 18454.57 samples/sec Loss 39.0360 LearningRate 0.0382 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:07,923-Speed 18590.66 samples/sec Loss 39.0351 LearningRate 0.0386 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:12,492-Speed 17931.65 samples/sec Loss 39.0552 LearningRate 0.0390 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:16,879-Speed 18679.50 samples/sec Loss 39.0645 LearningRate 0.0394 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:21,331-Speed 18405.74 samples/sec Loss 39.0767 LearningRate 0.0397 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:25,764-Speed 18482.34 samples/sec Loss 39.0789 LearningRate 0.0401 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:30,151-Speed 18679.32 samples/sec Loss 39.0793 LearningRate 0.0405 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:34,563-Speed 18572.78 samples/sec Loss 39.0925 LearningRate 0.0409 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:38,977-Speed 18565.45 samples/sec Loss 39.0998 LearningRate 0.0413 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:43,457-Speed 18292.46 samples/sec Loss 39.0962 LearningRate 0.0417 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:47,857-Speed 18618.21 samples/sec Loss 39.1187 LearningRate 0.0421 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:52,295-Speed 18465.13 samples/sec Loss 39.1098 LearningRate 0.0424 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:05:56,707-Speed 18573.52 samples/sec Loss 39.1396 LearningRate 0.0428 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:01,114-Speed 18595.33 samples/sec Loss 39.1399 LearningRate 0.0432 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:06:05,522-Speed 18584.78 samples/sec Loss 39.1449 LearningRate 0.0436 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:10,177-Speed 17606.05 samples/sec Loss 39.1354 LearningRate 0.0440 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:14,707-Speed 18089.00 samples/sec Loss 39.1446 LearningRate 0.0444 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:19,217-Speed 18166.32 samples/sec Loss 39.1492 LearningRate 0.0448 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:24,096-Speed 16793.21 samples/sec Loss 39.1567 LearningRate 0.0451 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:28,550-Speed 18399.03 samples/sec Loss 39.1822 LearningRate 0.0455 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:32,947-Speed 18636.33 samples/sec Loss 39.1763 LearningRate 0.0459 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:37,411-Speed 18361.45 samples/sec Loss 39.1736 LearningRate 0.0463 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:41,811-Speed 18623.92 samples/sec Loss 39.1729 LearningRate 0.0467 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:46,215-Speed 18608.28 samples/sec Loss 39.1712 LearningRate 0.0471 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:06:50,646-Speed 18500.90 samples/sec Loss 39.1803 LearningRate 0.0475 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:06:55,026-Speed 18709.10 samples/sec Loss 39.1934 LearningRate 0.0478 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:06:59,468-Speed 18450.60 samples/sec Loss 39.1951 LearningRate 0.0482 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:03,864-Speed 18639.24 samples/sec Loss 39.2105 LearningRate 0.0486 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:08,264-Speed 18622.93 samples/sec Loss 39.2162 LearningRate 0.0490 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:12,689-Speed 18522.96 samples/sec Loss 39.2113 LearningRate 0.0494 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:17,109-Speed 18538.01 samples/sec Loss 39.2181 LearningRate 0.0498 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:21,525-Speed 18557.37 samples/sec Loss 39.2115 LearningRate 0.0502 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:25,989-Speed 18361.82 samples/sec Loss 39.2222 LearningRate 0.0505 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:30,398-Speed 18585.52 samples/sec Loss 39.2230 LearningRate 0.0509 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:34,773-Speed 18729.22 samples/sec Loss 39.2256 LearningRate 0.0513 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:40,042-Speed 15556.46 samples/sec Loss 39.2262 LearningRate 0.0517 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:07:44,433-Speed 18659.53 samples/sec Loss 39.2257 LearningRate 0.0521 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:07:48,813-Speed 18710.17 samples/sec Loss 39.2357 LearningRate 0.0525 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:07:53,223-Speed 18580.95 samples/sec Loss 39.2063 LearningRate 0.0529 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:07:57,618-Speed 18644.19 samples/sec Loss 39.2197 LearningRate 0.0532 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:08:02,007-Speed 18667.83 samples/sec Loss 39.2307 LearningRate 0.0536 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:06,387-Speed 18708.77 samples/sec Loss 39.2235 LearningRate 0.0540 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:10,883-Speed 18225.55 samples/sec Loss 39.2293 LearningRate 0.0544 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:15,306-Speed 18527.38 samples/sec Loss 39.2260 LearningRate 0.0548 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:19,739-Speed 18484.47 samples/sec Loss 39.2152 LearningRate 0.0552 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:24,121-Speed 18702.84 samples/sec Loss 39.2106 LearningRate 0.0556 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:29,553-Speed 15083.39 samples/sec Loss 39.2074 LearningRate 0.0559 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:33,943-Speed 18665.15 samples/sec Loss 39.2278 LearningRate 0.0563 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:38,405-Speed 18366.62 samples/sec Loss 39.2200 LearningRate 0.0567 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:42,851-Speed 18427.43 samples/sec Loss 39.2040 LearningRate 0.0571 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:08:47,279-Speed 18504.73 samples/sec Loss 39.2074 LearningRate 0.0575 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:08:51,689-Speed 18583.20 samples/sec Loss 39.2090 LearningRate 0.0579 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:08:56,190-Speed 18204.95 samples/sec Loss 39.2042 LearningRate 0.0583 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:09:00,586-Speed 18641.90 samples/sec Loss 39.1875 LearningRate 0.0586 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:05,041-Speed 18395.55 samples/sec Loss 39.1781 LearningRate 0.0590 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:09,467-Speed 18513.06 samples/sec Loss 39.1733 LearningRate 0.0594 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:13,874-Speed 18594.55 samples/sec Loss 39.1759 LearningRate 0.0598 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:18,282-Speed 18587.07 samples/sec Loss 39.1747 LearningRate 0.0602 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:22,690-Speed 18592.18 samples/sec Loss 39.1711 LearningRate 0.0606 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:27,094-Speed 18601.45 samples/sec Loss 39.1602 LearningRate 0.0610 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:31,500-Speed 18598.93 samples/sec Loss 39.1596 LearningRate 0.0613 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:35,932-Speed 18490.96 samples/sec Loss 39.1351 LearningRate 0.0617 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:40,370-Speed 18465.85 samples/sec Loss 39.1247 LearningRate 0.0621 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-13 22:09:44,771-Speed 18614.68 samples/sec Loss 39.1294 LearningRate 0.0625 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:09:49,188-Speed 18549.84 samples/sec Loss 39.1312 LearningRate 0.0629 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:09:53,622-Speed 18482.82 samples/sec Loss 39.1147 LearningRate 0.0633 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:09:58,097-Speed 18309.52 samples/sec Loss 39.1043 LearningRate 0.0637 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:03,731-Speed 14543.40 samples/sec Loss 39.0986 LearningRate 0.0640 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:08,136-Speed 18603.02 samples/sec Loss 39.0952 LearningRate 0.0644 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:12,573-Speed 18470.77 samples/sec Loss 39.0626 LearningRate 0.0648 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:17,022-Speed 18416.80 samples/sec Loss 39.0752 LearningRate 0.0652 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:21,460-Speed 18462.75 samples/sec Loss 39.0729 LearningRate 0.0656 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:25,886-Speed 18511.58 samples/sec Loss 39.0585 LearningRate 0.0660 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:30,308-Speed 18533.92 samples/sec Loss 39.0273 LearningRate 0.0664 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:10:34,772-Speed 18355.77 samples/sec Loss 39.0259 LearningRate 0.0667 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:39,174-Speed 18612.56 samples/sec Loss 39.0040 LearningRate 0.0671 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:43,577-Speed 18611.65 samples/sec Loss 39.0085 LearningRate 0.0675 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:48,011-Speed 18477.43 samples/sec Loss 39.0181 LearningRate 0.0679 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:52,448-Speed 18467.90 samples/sec Loss 38.9954 LearningRate 0.0683 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:10:56,833-Speed 18685.63 samples/sec Loss 38.9737 LearningRate 0.0687 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:11:01,227-Speed 18653.21 samples/sec Loss 38.9556 LearningRate 0.0691 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:11:05,669-Speed 18447.22 samples/sec Loss 38.9341 LearningRate 0.0694 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:11:10,094-Speed 18517.01 samples/sec Loss 38.9478 LearningRate 0.0698 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:11:14,513-Speed 18544.09 samples/sec Loss 38.9180 LearningRate 0.0702 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-13 22:11:19,150-Speed 17671.21 samples/sec Loss 38.9245 LearningRate 0.0706 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:23,609-Speed 18375.84 samples/sec Loss 38.8991 LearningRate 0.0710 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:27,994-Speed 18687.17 samples/sec Loss 38.8664 LearningRate 0.0714 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:33,980-Speed 13688.91 samples/sec Loss 38.8371 LearningRate 0.0718 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:38,449-Speed 18334.25 samples/sec Loss 38.8532 LearningRate 0.0721 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:42,849-Speed 18628.65 samples/sec Loss 38.8252 LearningRate 0.0725 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:47,257-Speed 18586.95 samples/sec Loss 38.8215 LearningRate 0.0729 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:51,634-Speed 18724.28 samples/sec Loss 38.8017 LearningRate 0.0733 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:11:56,055-Speed 18534.07 samples/sec Loss 38.7867 LearningRate 0.0737 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:12:00,447-Speed 18656.91 samples/sec Loss 38.7722 LearningRate 0.0741 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:12:04,825-Speed 18720.07 samples/sec Loss 38.7350 LearningRate 0.0745 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:09,228-Speed 18608.95 samples/sec Loss 38.7316 LearningRate 0.0748 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:13,636-Speed 18590.65 samples/sec Loss 38.6948 LearningRate 0.0752 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:18,075-Speed 18460.24 samples/sec Loss 38.6831 LearningRate 0.0756 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:22,508-Speed 18487.69 samples/sec Loss 38.6613 LearningRate 0.0760 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:26,954-Speed 18428.77 samples/sec Loss 38.6535 LearningRate 0.0764 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:31,314-Speed 18796.52 samples/sec Loss 38.6346 LearningRate 0.0768 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:35,797-Speed 18275.87 samples/sec Loss 38.6238 LearningRate 0.0772 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:40,241-Speed 18440.95 samples/sec Loss 38.5905 LearningRate 0.0775 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:44,647-Speed 18596.90 samples/sec Loss 38.5677 LearningRate 0.0779 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:49,103-Speed 18388.90 samples/sec Loss 38.5657 LearningRate 0.0783 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:12:58,330-Speed 8879.66 samples/sec Loss 38.5567 LearningRate 0.0787 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:02,761-Speed 18498.33 samples/sec Loss 38.5315 LearningRate 0.0791 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:07,171-Speed 18580.80 samples/sec Loss 38.5209 LearningRate 0.0795 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:12,439-Speed 15555.54 samples/sec Loss 38.5000 LearningRate 0.0799 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:16,956-Speed 18139.32 samples/sec Loss 38.4616 LearningRate 0.0802 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:21,445-Speed 18252.98 samples/sec Loss 38.4436 LearningRate 0.0806 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:25,870-Speed 18517.13 samples/sec Loss 38.4341 LearningRate 0.0810 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:30,362-Speed 18244.99 samples/sec Loss 38.3750 LearningRate 0.0814 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:34,785-Speed 18523.18 samples/sec Loss 38.4133 LearningRate 0.0818 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:39,195-Speed 18583.76 samples/sec Loss 38.3694 LearningRate 0.0822 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:44,443-Speed 15612.77 samples/sec Loss 38.3395 LearningRate 0.0826 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:48,842-Speed 18625.53 samples/sec Loss 38.3135 LearningRate 0.0829 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:53,235-Speed 18652.57 samples/sec Loss 38.2983 LearningRate 0.0833 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:13:57,626-Speed 18660.06 samples/sec Loss 38.2866 LearningRate 0.0837 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:02,029-Speed 18612.56 samples/sec Loss 38.2583 LearningRate 0.0841 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:06,454-Speed 18515.61 samples/sec Loss 38.2441 LearningRate 0.0845 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:10,850-Speed 18639.96 samples/sec Loss 38.2327 LearningRate 0.0849 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:15,287-Speed 18464.95 samples/sec Loss 38.1903 LearningRate 0.0853 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:19,685-Speed 18629.68 samples/sec Loss 38.1918 LearningRate 0.0856 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:24,090-Speed 18604.89 samples/sec Loss 38.1640 LearningRate 0.0860 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:28,575-Speed 18269.13 samples/sec Loss 38.1658 LearningRate 0.0864 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:33,773-Speed 15763.85 samples/sec Loss 38.1215 LearningRate 0.0868 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:38,203-Speed 18499.06 samples/sec Loss 38.0744 LearningRate 0.0872 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:42,626-Speed 18528.50 samples/sec Loss 38.0747 LearningRate 0.0876 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:47,016-Speed 18664.69 samples/sec Loss 38.0553 LearningRate 0.0880 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:51,439-Speed 18524.43 samples/sec Loss 38.0245 LearningRate 0.0883 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:14:55,847-Speed 18589.35 samples/sec Loss 37.9978 LearningRate 0.0887 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:00,566-Speed 17361.64 samples/sec Loss 37.9774 LearningRate 0.0891 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:05,015-Speed 18418.79 samples/sec Loss 37.9570 LearningRate 0.0895 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:09,410-Speed 18650.26 samples/sec Loss 37.9416 LearningRate 0.0899 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:13,867-Speed 18385.57 samples/sec Loss 37.9102 LearningRate 0.0903 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:18,289-Speed 18526.93 samples/sec Loss 37.8968 LearningRate 0.0907 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:22,735-Speed 18431.95 samples/sec Loss 37.8736 LearningRate 0.0910 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:27,144-Speed 18585.98 samples/sec Loss 37.8381 LearningRate 0.0914 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:31,571-Speed 18510.17 samples/sec Loss 37.8193 LearningRate 0.0918 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:35,985-Speed 18562.40 samples/sec Loss 37.7893 LearningRate 0.0922 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:15:41,035-Speed 16225.83 samples/sec Loss 37.7567 LearningRate 0.0926 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:15:45,454-Speed 18545.93 samples/sec Loss 37.7268 LearningRate 0.0930 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:15:50,473-Speed 16330.51 samples/sec Loss 37.7081 LearningRate 0.0934 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:15:56,613-Speed 13345.99 samples/sec Loss 37.6712 LearningRate 0.0938 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:01,331-Speed 17365.65 samples/sec Loss 37.6656 LearningRate 0.0941 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:05,723-Speed 18656.10 samples/sec Loss 37.6113 LearningRate 0.0945 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:10,139-Speed 18558.67 samples/sec Loss 37.6093 LearningRate 0.0949 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:14,603-Speed 18358.50 samples/sec Loss 37.5776 LearningRate 0.0953 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:19,026-Speed 18526.92 samples/sec Loss 37.5546 LearningRate 0.0957 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:23,444-Speed 18548.05 samples/sec Loss 37.5154 LearningRate 0.0961 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:27,833-Speed 18668.79 samples/sec Loss 37.4701 LearningRate 0.0965 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:16:32,248-Speed 18565.92 samples/sec Loss 37.4551 LearningRate 0.0968 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:16:36,679-Speed 18493.33 samples/sec Loss 37.4322 LearningRate 0.0972 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:16:41,127-Speed 18423.92 samples/sec Loss 37.4278 LearningRate 0.0976 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:45,540-Speed 18572.96 samples/sec Loss 37.3820 LearningRate 0.0980 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:49,966-Speed 18515.07 samples/sec Loss 37.3532 LearningRate 0.0984 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:54,432-Speed 18346.99 samples/sec Loss 37.3479 LearningRate 0.0988 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:16:58,851-Speed 18541.81 samples/sec Loss 37.2751 LearningRate 0.0992 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:03,275-Speed 18525.35 samples/sec Loss 37.2628 LearningRate 0.0995 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:07,652-Speed 18721.62 samples/sec Loss 37.2244 LearningRate 0.0999 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:12,052-Speed 18621.53 samples/sec Loss 37.2223 LearningRate 0.1003 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:16,430-Speed 18717.54 samples/sec Loss 37.1605 LearningRate 0.1007 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:20,815-Speed 18683.63 samples/sec Loss 37.1732 LearningRate 0.1011 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:25,310-Speed 18233.24 samples/sec Loss 37.1461 LearningRate 0.1015 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:17:29,746-Speed 18472.99 samples/sec Loss 37.0930 LearningRate 0.1019 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:34,174-Speed 18502.09 samples/sec Loss 37.0915 LearningRate 0.1022 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:38,619-Speed 18437.61 samples/sec Loss 37.0420 LearningRate 0.1026 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:43,022-Speed 18608.44 samples/sec Loss 37.0393 LearningRate 0.1030 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:47,434-Speed 18574.05 samples/sec Loss 36.9834 LearningRate 0.1034 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:51,843-Speed 18584.73 samples/sec Loss 36.9588 LearningRate 0.1038 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:17:56,253-Speed 18578.84 samples/sec Loss 36.9333 LearningRate 0.1042 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:01,084-Speed 16962.68 samples/sec Loss 36.8939 LearningRate 0.1046 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:05,527-Speed 18443.33 samples/sec Loss 36.8966 LearningRate 0.1049 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:09,952-Speed 18518.29 samples/sec Loss 36.8657 LearningRate 0.1053 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:14,351-Speed 18627.31 samples/sec Loss 36.8574 LearningRate 0.1057 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:18:18,732-Speed 18702.09 samples/sec Loss 36.8212 LearningRate 0.1061 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:23,134-Speed 18616.42 samples/sec Loss 36.7726 LearningRate 0.1065 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:27,535-Speed 18617.79 samples/sec Loss 36.7508 LearningRate 0.1069 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:31,964-Speed 18500.69 samples/sec Loss 36.7191 LearningRate 0.1073 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:36,416-Speed 18406.07 samples/sec Loss 36.6876 LearningRate 0.1076 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:40,843-Speed 18509.19 samples/sec Loss 36.6736 LearningRate 0.1080 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:45,257-Speed 18566.22 samples/sec Loss 36.6449 LearningRate 0.1084 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:49,653-Speed 18643.90 samples/sec Loss 36.6111 LearningRate 0.1088 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:54,070-Speed 18552.31 samples/sec Loss 36.5735 LearningRate 0.1092 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:18:58,502-Speed 18489.02 samples/sec Loss 36.5380 LearningRate 0.1096 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:02,942-Speed 18459.50 samples/sec Loss 36.4979 LearningRate 0.1100 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:19:07,357-Speed 18558.34 samples/sec Loss 36.4745 LearningRate 0.1103 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:11,775-Speed 18547.16 samples/sec Loss 36.4548 LearningRate 0.1107 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:17,251-Speed 14964.87 samples/sec Loss 36.4240 LearningRate 0.1111 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:21,653-Speed 18614.21 samples/sec Loss 36.3935 LearningRate 0.1115 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:26,045-Speed 18656.64 samples/sec Loss 36.3601 LearningRate 0.1119 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:30,433-Speed 18677.48 samples/sec Loss 36.3233 LearningRate 0.1123 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:34,855-Speed 18528.86 samples/sec Loss 36.3374 LearningRate 0.1127 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:39,255-Speed 18637.64 samples/sec Loss 36.2506 LearningRate 0.1130 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:43,670-Speed 18560.89 samples/sec Loss 36.2366 LearningRate 0.1134 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:48,084-Speed 18566.27 samples/sec Loss 36.2062 LearningRate 0.1138 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:19:52,478-Speed 18651.88 samples/sec Loss 36.1684 LearningRate 0.1142 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:19:56,895-Speed 18555.11 samples/sec Loss 36.1677 LearningRate 0.1146 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:20:01,272-Speed 18721.11 samples/sec Loss 36.1127 LearningRate 0.1150 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:05,711-Speed 18461.81 samples/sec Loss 36.1023 LearningRate 0.1154 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:10,127-Speed 18563.64 samples/sec Loss 36.0639 LearningRate 0.1157 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:14,512-Speed 18684.34 samples/sec Loss 36.0614 LearningRate 0.1161 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:18,914-Speed 18618.62 samples/sec Loss 35.9744 LearningRate 0.1165 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:23,314-Speed 18622.53 samples/sec Loss 35.9677 LearningRate 0.1169 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:27,737-Speed 18526.02 samples/sec Loss 35.9168 LearningRate 0.1173 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:32,099-Speed 18787.04 samples/sec Loss 35.9176 LearningRate 0.1177 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:36,471-Speed 18743.20 samples/sec Loss 35.8725 LearningRate 0.1181 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:40,866-Speed 18645.22 samples/sec Loss 35.7971 LearningRate 0.1184 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:45,288-Speed 18529.61 samples/sec Loss 35.7884 LearningRate 0.1188 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:49,702-Speed 18568.81 samples/sec Loss 35.7533 LearningRate 0.1192 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:54,185-Speed 18279.22 samples/sec Loss 35.7324 LearningRate 0.1196 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:20:58,621-Speed 18467.93 samples/sec Loss 35.7039 LearningRate 0.1200 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:02,999-Speed 18721.43 samples/sec Loss 35.6394 LearningRate 0.1204 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:07,427-Speed 18499.17 samples/sec Loss 35.6177 LearningRate 0.1208 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:13,476-Speed 13545.59 samples/sec Loss 35.5698 LearningRate 0.1211 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:17,943-Speed 18344.53 samples/sec Loss 35.5525 LearningRate 0.1215 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:22,404-Speed 18367.24 samples/sec Loss 35.4906 LearningRate 0.1219 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:26,797-Speed 18652.99 samples/sec Loss 35.4890 LearningRate 0.1223 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:31,308-Speed 18164.23 samples/sec Loss 35.4201 LearningRate 0.1227 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:21:35,695-Speed 18681.40 samples/sec Loss 35.3965 LearningRate 0.1231 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:40,131-Speed 18475.41 samples/sec Loss 35.3418 LearningRate 0.1235 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:44,585-Speed 18395.04 samples/sec Loss 35.3279 LearningRate 0.1238 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:49,010-Speed 18520.29 samples/sec Loss 35.2657 LearningRate 0.1242 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:53,502-Speed 18243.10 samples/sec Loss 35.2711 LearningRate 0.1246 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:21:57,901-Speed 18628.33 samples/sec Loss 35.2273 LearningRate 0.1250 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:02,314-Speed 18569.53 samples/sec Loss 35.1706 LearningRate 0.1254 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:06,719-Speed 18598.72 samples/sec Loss 35.1249 LearningRate 0.1258 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:11,114-Speed 18652.20 samples/sec Loss 35.0793 LearningRate 0.1262 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:15,532-Speed 18548.43 samples/sec Loss 35.0408 LearningRate 0.1265 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:19,966-Speed 18477.84 samples/sec Loss 34.9946 LearningRate 0.1269 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:24,386-Speed 18543.10 samples/sec Loss 34.9835 LearningRate 0.1273 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:28,822-Speed 18471.91 samples/sec Loss 34.9219 LearningRate 0.1277 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:33,279-Speed 18387.45 samples/sec Loss 34.9001 LearningRate 0.1281 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:37,753-Speed 18312.12 samples/sec Loss 34.8601 LearningRate 0.1285 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:42,150-Speed 18636.88 samples/sec Loss 34.7768 LearningRate 0.1289 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:47,483-Speed 15365.47 samples/sec Loss 34.7374 LearningRate 0.1292 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:51,888-Speed 18604.43 samples/sec Loss 34.7206 LearningRate 0.1296 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:22:56,300-Speed 18580.52 samples/sec Loss 34.6636 LearningRate 0.1300 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:00,678-Speed 18715.97 samples/sec Loss 34.6259 LearningRate 0.1304 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:05,096-Speed 18546.74 samples/sec Loss 34.5788 LearningRate 0.1308 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:23:09,508-Speed 18577.69 samples/sec Loss 34.5548 LearningRate 0.1312 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:13,894-Speed 18682.99 samples/sec Loss 34.4993 LearningRate 0.1316 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:18,300-Speed 18597.74 samples/sec Loss 34.4599 LearningRate 0.1319 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:22,709-Speed 18588.57 samples/sec Loss 34.4401 LearningRate 0.1323 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:27,118-Speed 18583.75 samples/sec Loss 34.3661 LearningRate 0.1327 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:31,534-Speed 18560.25 samples/sec Loss 34.3002 LearningRate 0.1331 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:35,932-Speed 18629.70 samples/sec Loss 34.2805 LearningRate 0.1335 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:40,324-Speed 18661.71 samples/sec Loss 34.2207 LearningRate 0.1339 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:44,721-Speed 18635.16 samples/sec Loss 34.1874 LearningRate 0.1343 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:49,099-Speed 18716.55 samples/sec Loss 34.1773 LearningRate 0.1346 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:53,518-Speed 18542.86 samples/sec Loss 34.1527 LearningRate 0.1350 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:23:57,895-Speed 18722.54 samples/sec Loss 34.0599 LearningRate 0.1354 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:02,298-Speed 18608.88 samples/sec Loss 34.0068 LearningRate 0.1358 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:06,689-Speed 18664.10 samples/sec Loss 33.9588 LearningRate 0.1362 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:11,099-Speed 18581.57 samples/sec Loss 33.9837 LearningRate 0.1366 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:15,516-Speed 18549.91 samples/sec Loss 33.8908 LearningRate 0.1370 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:19,901-Speed 18684.41 samples/sec Loss 33.8483 LearningRate 0.1373 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:24,279-Speed 18719.69 samples/sec Loss 33.7746 LearningRate 0.1377 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:28,685-Speed 18597.70 samples/sec Loss 33.7318 LearningRate 0.1381 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:33,061-Speed 18721.64 samples/sec Loss 33.7277 LearningRate 0.1385 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:37,482-Speed 18533.92 samples/sec Loss 33.6645 LearningRate 0.1389 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 524288 Required: 13 hours Training: 2022-01-13 22:24:41,904-Speed 18533.43 samples/sec Loss 33.6251 LearningRate 0.1393 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:46,303-Speed 18624.54 samples/sec Loss 33.5681 LearningRate 0.1397 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:50,691-Speed 18676.61 samples/sec Loss 33.5591 LearningRate 0.1400 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:55,070-Speed 18713.53 samples/sec Loss 33.4999 LearningRate 0.1404 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:24:59,464-Speed 18647.29 samples/sec Loss 33.4561 LearningRate 0.1408 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:25:03,872-Speed 18591.26 samples/sec Loss 33.3746 LearningRate 0.1412 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:25:08,281-Speed 18583.49 samples/sec Loss 33.3254 LearningRate 0.1416 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:25:12,708-Speed 18510.85 samples/sec Loss 33.3126 LearningRate 0.1420 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-13 22:25:17,132-Speed 18520.12 samples/sec Loss 33.2610 LearningRate 0.1424 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:21,519-Speed 18676.92 samples/sec Loss 33.2172 LearningRate 0.1427 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:25,949-Speed 18497.04 samples/sec Loss 33.1835 LearningRate 0.1431 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:30,393-Speed 18437.41 samples/sec Loss 33.1454 LearningRate 0.1435 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:34,782-Speed 18671.88 samples/sec Loss 33.1198 LearningRate 0.1439 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:39,175-Speed 18651.47 samples/sec Loss 33.0019 LearningRate 0.1443 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:43,573-Speed 18630.46 samples/sec Loss 32.9498 LearningRate 0.1447 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:47,956-Speed 18692.26 samples/sec Loss 32.9066 LearningRate 0.1451 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:52,339-Speed 18694.79 samples/sec Loss 32.8928 LearningRate 0.1454 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:25:56,743-Speed 18608.65 samples/sec Loss 32.8769 LearningRate 0.1458 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:01,187-Speed 18438.05 samples/sec Loss 32.7847 LearningRate 0.1462 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:05,612-Speed 18516.76 samples/sec Loss 32.7491 LearningRate 0.1466 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:10,012-Speed 18628.54 samples/sec Loss 32.6811 LearningRate 0.1470 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:14,472-Speed 18371.51 samples/sec Loss 32.6962 LearningRate 0.1474 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:18,865-Speed 18649.66 samples/sec Loss 32.6536 LearningRate 0.1478 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:23,265-Speed 18627.60 samples/sec Loss 32.5857 LearningRate 0.1481 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:27,679-Speed 18563.66 samples/sec Loss 32.5159 LearningRate 0.1485 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:32,075-Speed 18642.03 samples/sec Loss 32.4630 LearningRate 0.1489 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:36,450-Speed 18725.20 samples/sec Loss 32.3958 LearningRate 0.1493 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:40,860-Speed 18584.03 samples/sec Loss 32.3540 LearningRate 0.1497 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:45,243-Speed 18698.36 samples/sec Loss 32.3206 LearningRate 0.1501 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:49,641-Speed 18629.95 samples/sec Loss 32.2985 LearningRate 0.1505 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:26:59,338-Speed 8449.12 samples/sec Loss 32.2679 LearningRate 0.1508 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:03,737-Speed 18630.96 samples/sec Loss 32.1810 LearningRate 0.1512 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:08,143-Speed 18597.49 samples/sec Loss 32.1220 LearningRate 0.1516 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:12,526-Speed 18694.63 samples/sec Loss 32.1109 LearningRate 0.1520 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:16,902-Speed 18725.86 samples/sec Loss 32.0332 LearningRate 0.1524 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:21,315-Speed 18567.36 samples/sec Loss 32.0317 LearningRate 0.1528 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:25,733-Speed 18548.38 samples/sec Loss 31.9436 LearningRate 0.1532 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:30,115-Speed 18705.89 samples/sec Loss 31.9293 LearningRate 0.1535 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:34,521-Speed 18605.25 samples/sec Loss 31.9242 LearningRate 0.1539 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:38,931-Speed 18582.29 samples/sec Loss 31.8077 LearningRate 0.1543 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:43,315-Speed 18687.09 samples/sec Loss 31.7833 LearningRate 0.1547 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:47,727-Speed 18576.56 samples/sec Loss 31.7074 LearningRate 0.1551 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:52,114-Speed 18685.96 samples/sec Loss 31.6591 LearningRate 0.1555 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:27:56,535-Speed 18533.67 samples/sec Loss 31.6571 LearningRate 0.1559 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:00,969-Speed 18480.99 samples/sec Loss 31.5999 LearningRate 0.1562 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:05,368-Speed 18627.41 samples/sec Loss 31.5320 LearningRate 0.1566 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:09,793-Speed 18519.63 samples/sec Loss 31.4854 LearningRate 0.1570 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:14,205-Speed 18569.46 samples/sec Loss 31.4276 LearningRate 0.1574 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:18,595-Speed 18666.86 samples/sec Loss 31.4152 LearningRate 0.1578 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:22,990-Speed 18640.84 samples/sec Loss 31.3192 LearningRate 0.1582 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:27,385-Speed 18644.94 samples/sec Loss 31.3148 LearningRate 0.1586 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:31,783-Speed 18634.67 samples/sec Loss 31.1753 LearningRate 0.1590 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:36,189-Speed 18598.29 samples/sec Loss 31.1751 LearningRate 0.1593 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:40,611-Speed 18532.21 samples/sec Loss 31.1638 LearningRate 0.1597 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:44,999-Speed 18672.19 samples/sec Loss 31.1261 LearningRate 0.1601 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:49,449-Speed 18412.64 samples/sec Loss 31.0830 LearningRate 0.1605 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:53,891-Speed 18444.67 samples/sec Loss 30.9663 LearningRate 0.1609 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:28:58,344-Speed 18400.02 samples/sec Loss 30.9424 LearningRate 0.1613 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:02,770-Speed 18515.63 samples/sec Loss 30.8892 LearningRate 0.1617 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:07,170-Speed 18622.59 samples/sec Loss 30.8154 LearningRate 0.1620 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:11,618-Speed 18425.33 samples/sec Loss 30.8607 LearningRate 0.1624 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 524288 Required: 12 hours Training: 2022-01-13 22:29:16,063-Speed 18434.69 samples/sec Loss 30.7677 LearningRate 0.1628 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:20,509-Speed 18426.34 samples/sec Loss 30.7041 LearningRate 0.1632 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:24,927-Speed 18551.05 samples/sec Loss 30.6829 LearningRate 0.1636 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:29,382-Speed 18395.24 samples/sec Loss 30.6150 LearningRate 0.1640 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:33,824-Speed 18454.55 samples/sec Loss 30.5498 LearningRate 0.1644 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:38,230-Speed 18596.09 samples/sec Loss 30.4995 LearningRate 0.1647 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:42,629-Speed 18627.19 samples/sec Loss 30.4768 LearningRate 0.1651 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:47,036-Speed 18594.74 samples/sec Loss 30.3643 LearningRate 0.1655 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:51,426-Speed 18663.09 samples/sec Loss 30.3653 LearningRate 0.1659 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:29:55,826-Speed 18623.82 samples/sec Loss 30.3395 LearningRate 0.1663 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:00,220-Speed 18648.72 samples/sec Loss 30.2740 LearningRate 0.1667 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:04,628-Speed 18590.79 samples/sec Loss 30.2739 LearningRate 0.1671 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:09,045-Speed 18553.36 samples/sec Loss 30.1535 LearningRate 0.1674 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:13,432-Speed 18675.43 samples/sec Loss 30.1278 LearningRate 0.1678 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:17,838-Speed 18596.35 samples/sec Loss 30.0684 LearningRate 0.1682 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:22,248-Speed 18584.30 samples/sec Loss 30.0416 LearningRate 0.1686 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:26,656-Speed 18586.60 samples/sec Loss 29.9864 LearningRate 0.1690 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:31,084-Speed 18505.33 samples/sec Loss 29.9261 LearningRate 0.1694 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:35,479-Speed 18645.18 samples/sec Loss 29.8559 LearningRate 0.1698 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:39,879-Speed 18624.99 samples/sec Loss 29.8589 LearningRate 0.1701 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:44,297-Speed 18545.75 samples/sec Loss 29.7568 LearningRate 0.1705 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 524288 Required: 12 hours Training: 2022-01-13 22:30:48,693-Speed 18636.43 samples/sec Loss 29.7713 LearningRate 0.1709 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:53,116-Speed 18530.70 samples/sec Loss 29.6575 LearningRate 0.1713 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:30:57,521-Speed 18602.19 samples/sec Loss 29.6542 LearningRate 0.1717 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:01,968-Speed 18428.31 samples/sec Loss 29.5636 LearningRate 0.1721 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:06,360-Speed 18655.38 samples/sec Loss 29.5728 LearningRate 0.1725 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:10,740-Speed 18708.72 samples/sec Loss 29.5279 LearningRate 0.1728 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:15,184-Speed 18437.13 samples/sec Loss 29.4627 LearningRate 0.1732 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:19,614-Speed 18500.16 samples/sec Loss 29.4304 LearningRate 0.1736 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:24,023-Speed 18584.41 samples/sec Loss 29.2936 LearningRate 0.1740 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:28,405-Speed 18695.30 samples/sec Loss 29.3130 LearningRate 0.1744 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:32,798-Speed 18657.21 samples/sec Loss 29.2288 LearningRate 0.1748 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:37,218-Speed 18539.17 samples/sec Loss 29.1593 LearningRate 0.1752 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:41,626-Speed 18587.92 samples/sec Loss 29.0852 LearningRate 0.1755 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:46,047-Speed 18536.05 samples/sec Loss 29.1480 LearningRate 0.1759 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:50,469-Speed 18529.35 samples/sec Loss 29.0425 LearningRate 0.1763 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:54,883-Speed 18567.12 samples/sec Loss 29.0519 LearningRate 0.1767 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:31:59,274-Speed 18657.74 samples/sec Loss 28.9565 LearningRate 0.1771 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:03,691-Speed 18551.60 samples/sec Loss 28.8814 LearningRate 0.1775 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:08,078-Speed 18679.64 samples/sec Loss 28.8544 LearningRate 0.1779 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:12,471-Speed 18655.33 samples/sec Loss 28.8553 LearningRate 0.1782 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:16,871-Speed 18626.40 samples/sec Loss 28.7947 LearningRate 0.1786 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:21,282-Speed 18580.65 samples/sec Loss 28.7471 LearningRate 0.1790 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:25,715-Speed 18484.03 samples/sec Loss 28.7054 LearningRate 0.1794 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:30,144-Speed 18502.47 samples/sec Loss 28.6127 LearningRate 0.1798 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:34,575-Speed 18495.10 samples/sec Loss 28.5607 LearningRate 0.1802 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:38,992-Speed 18555.12 samples/sec Loss 28.5398 LearningRate 0.1806 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:43,454-Speed 18369.01 samples/sec Loss 28.4959 LearningRate 0.1809 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:47,881-Speed 18506.91 samples/sec Loss 28.4734 LearningRate 0.1813 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:52,292-Speed 18579.19 samples/sec Loss 28.3606 LearningRate 0.1817 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:32:56,684-Speed 18656.36 samples/sec Loss 28.3693 LearningRate 0.1821 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:01,073-Speed 18671.54 samples/sec Loss 28.2819 LearningRate 0.1825 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:05,483-Speed 18582.65 samples/sec Loss 28.2582 LearningRate 0.1829 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:09,885-Speed 18616.41 samples/sec Loss 28.2063 LearningRate 0.1833 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:14,283-Speed 18629.57 samples/sec Loss 28.1638 LearningRate 0.1836 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:18,722-Speed 18459.36 samples/sec Loss 28.0978 LearningRate 0.1840 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:23,131-Speed 18588.06 samples/sec Loss 28.0746 LearningRate 0.1844 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:27,525-Speed 18647.47 samples/sec Loss 27.9729 LearningRate 0.1848 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:31,965-Speed 18454.37 samples/sec Loss 27.9465 LearningRate 0.1852 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:36,374-Speed 18588.12 samples/sec Loss 27.8535 LearningRate 0.1856 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:40,786-Speed 18573.38 samples/sec Loss 27.8533 LearningRate 0.1860 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:45,225-Speed 18456.10 samples/sec Loss 27.8274 LearningRate 0.1863 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:49,627-Speed 18614.71 samples/sec Loss 27.7889 LearningRate 0.1867 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:54,027-Speed 18623.11 samples/sec Loss 27.7351 LearningRate 0.1871 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:33:58,441-Speed 18565.02 samples/sec Loss 27.6332 LearningRate 0.1875 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:02,845-Speed 18613.22 samples/sec Loss 27.5910 LearningRate 0.1879 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:07,246-Speed 18622.79 samples/sec Loss 27.5795 LearningRate 0.1883 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:11,634-Speed 18675.11 samples/sec Loss 27.5535 LearningRate 0.1887 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:16,042-Speed 18589.27 samples/sec Loss 27.4712 LearningRate 0.1890 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:20,447-Speed 18602.71 samples/sec Loss 27.4395 LearningRate 0.1894 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:24,878-Speed 18494.44 samples/sec Loss 27.3201 LearningRate 0.1898 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:29,258-Speed 18704.41 samples/sec Loss 27.3178 LearningRate 0.1902 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 524288 Required: 12 hours Training: 2022-01-13 22:34:33,693-Speed 18479.93 samples/sec Loss 27.2805 LearningRate 0.1906 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:38,554-Speed 16857.81 samples/sec Loss 27.1972 LearningRate 0.1910 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:42,980-Speed 18514.98 samples/sec Loss 27.2404 LearningRate 0.1914 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:47,387-Speed 18591.30 samples/sec Loss 27.1685 LearningRate 0.1917 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:51,847-Speed 18372.21 samples/sec Loss 27.0446 LearningRate 0.1921 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:34:56,228-Speed 18705.57 samples/sec Loss 27.0278 LearningRate 0.1925 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:00,635-Speed 18589.92 samples/sec Loss 26.9504 LearningRate 0.1929 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:05,041-Speed 18601.36 samples/sec Loss 26.9321 LearningRate 0.1933 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:09,415-Speed 18733.32 samples/sec Loss 26.8817 LearningRate 0.1937 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:13,812-Speed 18639.66 samples/sec Loss 26.8303 LearningRate 0.1941 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:18,206-Speed 18648.48 samples/sec Loss 26.8353 LearningRate 0.1944 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:22,608-Speed 18612.48 samples/sec Loss 26.7690 LearningRate 0.1948 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:27,015-Speed 18593.75 samples/sec Loss 26.6817 LearningRate 0.1952 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:31,406-Speed 18663.31 samples/sec Loss 26.6304 LearningRate 0.1956 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:35,801-Speed 18641.41 samples/sec Loss 26.5855 LearningRate 0.1960 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:40,209-Speed 18588.37 samples/sec Loss 26.5823 LearningRate 0.1964 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:44,613-Speed 18606.87 samples/sec Loss 26.5292 LearningRate 0.1968 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:48,967-Speed 18819.04 samples/sec Loss 26.5515 LearningRate 0.1971 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:53,364-Speed 18636.09 samples/sec Loss 26.4285 LearningRate 0.1975 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:35:57,790-Speed 18513.83 samples/sec Loss 26.3821 LearningRate 0.1979 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:02,236-Speed 18430.03 samples/sec Loss 26.3823 LearningRate 0.1983 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:06,643-Speed 18589.76 samples/sec Loss 26.2693 LearningRate 0.1987 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:11,112-Speed 18339.33 samples/sec Loss 26.2524 LearningRate 0.1991 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:15,536-Speed 18518.39 samples/sec Loss 26.2319 LearningRate 0.1995 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:19,957-Speed 18534.00 samples/sec Loss 26.1452 LearningRate 0.1998 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:38,539-Speed 4408.93 samples/sec Loss 26.0670 LearningRate 0.2002 Epoch: 1 Global Step: 5190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:42,950-Speed 18577.34 samples/sec Loss 26.1034 LearningRate 0.2006 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:47,351-Speed 18619.67 samples/sec Loss 25.9505 LearningRate 0.2010 Epoch: 1 Global Step: 5210 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:51,760-Speed 18583.79 samples/sec Loss 25.9217 LearningRate 0.2014 Epoch: 1 Global Step: 5220 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:36:56,145-Speed 18686.43 samples/sec Loss 25.9320 LearningRate 0.2018 Epoch: 1 Global Step: 5230 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:00,525-Speed 18705.71 samples/sec Loss 25.8327 LearningRate 0.2022 Epoch: 1 Global Step: 5240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:04,908-Speed 18696.40 samples/sec Loss 25.8208 LearningRate 0.2025 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:09,312-Speed 18609.63 samples/sec Loss 25.7827 LearningRate 0.2029 Epoch: 1 Global Step: 5260 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:13,700-Speed 18671.72 samples/sec Loss 25.7121 LearningRate 0.2033 Epoch: 1 Global Step: 5270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:18,080-Speed 18713.01 samples/sec Loss 25.6181 LearningRate 0.2037 Epoch: 1 Global Step: 5280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:22,495-Speed 18568.16 samples/sec Loss 25.6363 LearningRate 0.2041 Epoch: 1 Global Step: 5290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:26,904-Speed 18585.26 samples/sec Loss 25.5875 LearningRate 0.2045 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:31,307-Speed 18614.69 samples/sec Loss 25.5342 LearningRate 0.2049 Epoch: 1 Global Step: 5310 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:35,695-Speed 18670.63 samples/sec Loss 25.4716 LearningRate 0.2052 Epoch: 1 Global Step: 5320 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:40,087-Speed 18661.97 samples/sec Loss 25.4413 LearningRate 0.2056 Epoch: 1 Global Step: 5330 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:44,484-Speed 18634.16 samples/sec Loss 25.4158 LearningRate 0.2060 Epoch: 1 Global Step: 5340 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:48,906-Speed 18533.64 samples/sec Loss 25.3795 LearningRate 0.2064 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:53,324-Speed 18546.64 samples/sec Loss 25.3066 LearningRate 0.2068 Epoch: 1 Global Step: 5360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:37:57,692-Speed 18759.67 samples/sec Loss 25.2499 LearningRate 0.2072 Epoch: 1 Global Step: 5370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:38:02,086-Speed 18650.94 samples/sec Loss 25.2180 LearningRate 0.2076 Epoch: 1 Global Step: 5380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:38:06,475-Speed 18672.81 samples/sec Loss 25.2605 LearningRate 0.2079 Epoch: 1 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:10,900-Speed 18518.16 samples/sec Loss 25.1549 LearningRate 0.2083 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:15,314-Speed 18562.97 samples/sec Loss 25.0886 LearningRate 0.2087 Epoch: 1 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:19,718-Speed 18608.36 samples/sec Loss 25.0442 LearningRate 0.2091 Epoch: 1 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:24,122-Speed 18607.36 samples/sec Loss 24.9436 LearningRate 0.2095 Epoch: 1 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:28,565-Speed 18443.51 samples/sec Loss 24.9161 LearningRate 0.2099 Epoch: 1 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:32,966-Speed 18617.96 samples/sec Loss 24.9147 LearningRate 0.2103 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:37,352-Speed 18687.05 samples/sec Loss 24.9153 LearningRate 0.2106 Epoch: 1 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:41,820-Speed 18337.05 samples/sec Loss 24.8177 LearningRate 0.2110 Epoch: 1 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:46,208-Speed 18675.95 samples/sec Loss 24.7879 LearningRate 0.2114 Epoch: 1 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:38:50,697-Speed 18251.10 samples/sec Loss 24.7497 LearningRate 0.2118 Epoch: 1 Global Step: 5490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:38:55,114-Speed 18549.50 samples/sec Loss 24.6854 LearningRate 0.2122 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:38:59,487-Speed 18740.69 samples/sec Loss 24.6594 LearningRate 0.2126 Epoch: 1 Global Step: 5510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:03,886-Speed 18636.32 samples/sec Loss 24.5709 LearningRate 0.2130 Epoch: 1 Global Step: 5520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:08,291-Speed 18606.39 samples/sec Loss 24.5406 LearningRate 0.2133 Epoch: 1 Global Step: 5530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:12,693-Speed 18616.71 samples/sec Loss 24.5256 LearningRate 0.2137 Epoch: 1 Global Step: 5540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:17,152-Speed 18377.41 samples/sec Loss 24.4876 LearningRate 0.2141 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:21,571-Speed 18547.21 samples/sec Loss 24.4020 LearningRate 0.2145 Epoch: 1 Global Step: 5560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:25,980-Speed 18588.97 samples/sec Loss 24.4011 LearningRate 0.2149 Epoch: 1 Global Step: 5570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:30,396-Speed 18555.35 samples/sec Loss 24.2788 LearningRate 0.2153 Epoch: 1 Global Step: 5580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:34,807-Speed 18579.01 samples/sec Loss 24.2704 LearningRate 0.2157 Epoch: 1 Global Step: 5590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:39,173-Speed 18764.98 samples/sec Loss 24.2363 LearningRate 0.2160 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:43,553-Speed 18708.19 samples/sec Loss 24.1763 LearningRate 0.2164 Epoch: 1 Global Step: 5610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:47,937-Speed 18694.80 samples/sec Loss 24.2106 LearningRate 0.2168 Epoch: 1 Global Step: 5620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:52,343-Speed 18597.12 samples/sec Loss 24.0785 LearningRate 0.2172 Epoch: 1 Global Step: 5630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:39:56,723-Speed 18704.33 samples/sec Loss 24.0341 LearningRate 0.2176 Epoch: 1 Global Step: 5640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:40:07,330-Speed 7724.17 samples/sec Loss 24.0548 LearningRate 0.2180 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:40:11,753-Speed 18529.07 samples/sec Loss 24.0415 LearningRate 0.2184 Epoch: 1 Global Step: 5660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:40:16,163-Speed 18579.96 samples/sec Loss 23.9319 LearningRate 0.2188 Epoch: 1 Global Step: 5670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:40:20,574-Speed 18577.76 samples/sec Loss 23.8908 LearningRate 0.2191 Epoch: 1 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:24,952-Speed 18717.81 samples/sec Loss 23.8628 LearningRate 0.2195 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:29,369-Speed 18548.91 samples/sec Loss 23.8589 LearningRate 0.2199 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:33,769-Speed 18623.49 samples/sec Loss 23.7436 LearningRate 0.2203 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:38,239-Speed 18333.26 samples/sec Loss 23.7709 LearningRate 0.2207 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:42,640-Speed 18617.25 samples/sec Loss 23.6495 LearningRate 0.2211 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:47,092-Speed 18408.01 samples/sec Loss 23.6532 LearningRate 0.2215 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:51,874-Speed 17135.45 samples/sec Loss 23.5825 LearningRate 0.2218 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:40:56,394-Speed 18126.55 samples/sec Loss 23.4906 LearningRate 0.2222 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:41:00,808-Speed 18566.57 samples/sec Loss 23.5464 LearningRate 0.2226 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:41:05,202-Speed 18643.62 samples/sec Loss 23.3932 LearningRate 0.2230 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:09,591-Speed 18673.49 samples/sec Loss 23.4524 LearningRate 0.2234 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:14,043-Speed 18406.05 samples/sec Loss 23.3653 LearningRate 0.2238 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:18,497-Speed 18399.35 samples/sec Loss 23.3862 LearningRate 0.2242 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:22,921-Speed 18520.12 samples/sec Loss 23.2357 LearningRate 0.2245 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:27,348-Speed 18510.38 samples/sec Loss 23.2766 LearningRate 0.2249 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:31,772-Speed 18521.50 samples/sec Loss 23.1291 LearningRate 0.2253 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:36,162-Speed 18664.53 samples/sec Loss 23.1732 LearningRate 0.2257 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:40,547-Speed 18688.95 samples/sec Loss 23.1938 LearningRate 0.2261 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:44,923-Speed 18726.02 samples/sec Loss 23.0258 LearningRate 0.2265 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:49,308-Speed 18688.95 samples/sec Loss 23.0125 LearningRate 0.2269 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 524288 Required: 12 hours Training: 2022-01-13 22:41:53,693-Speed 18686.75 samples/sec Loss 22.9732 LearningRate 0.2272 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:41:58,096-Speed 18610.70 samples/sec Loss 22.9686 LearningRate 0.2276 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:42:02,516-Speed 18540.28 samples/sec Loss 22.9499 LearningRate 0.2280 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:42:06,925-Speed 18590.22 samples/sec Loss 22.8994 LearningRate 0.2284 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:42:11,324-Speed 18630.52 samples/sec Loss 22.8714 LearningRate 0.2288 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:42:15,713-Speed 18665.11 samples/sec Loss 22.8089 LearningRate 0.2292 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:42:20,113-Speed 18630.72 samples/sec Loss 22.7467 LearningRate 0.2296 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:42:24,519-Speed 18602.64 samples/sec Loss 22.6852 LearningRate 0.2299 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:42:28,955-Speed 18470.41 samples/sec Loss 22.6695 LearningRate 0.2303 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:42:33,376-Speed 18538.41 samples/sec Loss 22.6601 LearningRate 0.2307 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:42:37,776-Speed 18620.87 samples/sec Loss 22.6487 LearningRate 0.2311 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:42:42,151-Speed 18730.32 samples/sec Loss 22.5459 LearningRate 0.2315 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:42:46,562-Speed 18579.16 samples/sec Loss 22.5184 LearningRate 0.2319 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:42:50,972-Speed 18579.10 samples/sec Loss 22.4904 LearningRate 0.2323 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:42:55,385-Speed 18573.38 samples/sec Loss 22.4128 LearningRate 0.2326 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:42:59,782-Speed 18639.28 samples/sec Loss 22.3814 LearningRate 0.2330 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:43:04,280-Speed 18214.34 samples/sec Loss 22.4140 LearningRate 0.2334 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:43:08,658-Speed 18717.87 samples/sec Loss 22.3158 LearningRate 0.2338 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:43:13,036-Speed 18718.25 samples/sec Loss 22.3072 LearningRate 0.2342 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:43:17,406-Speed 18752.18 samples/sec Loss 22.3250 LearningRate 0.2346 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:21,790-Speed 18692.21 samples/sec Loss 22.2360 LearningRate 0.2350 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:26,198-Speed 18588.84 samples/sec Loss 22.2511 LearningRate 0.2353 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:30,582-Speed 18688.97 samples/sec Loss 22.1344 LearningRate 0.2357 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:34,971-Speed 18678.46 samples/sec Loss 22.0266 LearningRate 0.2361 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:39,390-Speed 18545.60 samples/sec Loss 22.0532 LearningRate 0.2365 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:43,839-Speed 18418.04 samples/sec Loss 22.0237 LearningRate 0.2369 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:48,260-Speed 18536.46 samples/sec Loss 21.9434 LearningRate 0.2373 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:52,712-Speed 18407.90 samples/sec Loss 22.0145 LearningRate 0.2377 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:43:57,099-Speed 18679.90 samples/sec Loss 21.9080 LearningRate 0.2380 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:44:01,537-Speed 18468.66 samples/sec Loss 21.8052 LearningRate 0.2384 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:05,939-Speed 18617.29 samples/sec Loss 21.8108 LearningRate 0.2388 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:10,347-Speed 18588.13 samples/sec Loss 21.7194 LearningRate 0.2392 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:14,758-Speed 18575.48 samples/sec Loss 21.7621 LearningRate 0.2396 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:19,137-Speed 18716.97 samples/sec Loss 21.6948 LearningRate 0.2400 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:23,545-Speed 18590.15 samples/sec Loss 21.6408 LearningRate 0.2404 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:27,971-Speed 18516.65 samples/sec Loss 21.6267 LearningRate 0.2407 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:32,402-Speed 18488.91 samples/sec Loss 21.6179 LearningRate 0.2411 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:36,813-Speed 18575.06 samples/sec Loss 21.5067 LearningRate 0.2415 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:41,240-Speed 18509.80 samples/sec Loss 21.4930 LearningRate 0.2419 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:45,992-Speed 17244.21 samples/sec Loss 21.4372 LearningRate 0.2423 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:44:50,404-Speed 18572.52 samples/sec Loss 21.4641 LearningRate 0.2427 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:54,824-Speed 18538.32 samples/sec Loss 21.4428 LearningRate 0.2431 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:44:59,261-Speed 18471.69 samples/sec Loss 21.3503 LearningRate 0.2434 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:03,688-Speed 18505.83 samples/sec Loss 21.3440 LearningRate 0.2438 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:08,393-Speed 17416.34 samples/sec Loss 21.3327 LearningRate 0.2442 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:13,359-Speed 16501.85 samples/sec Loss 21.2769 LearningRate 0.2446 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:17,800-Speed 18452.18 samples/sec Loss 21.2991 LearningRate 0.2450 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:22,224-Speed 18519.63 samples/sec Loss 21.1998 LearningRate 0.2454 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:26,674-Speed 18414.43 samples/sec Loss 21.2156 LearningRate 0.2458 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:31,099-Speed 18517.84 samples/sec Loss 21.1734 LearningRate 0.2461 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:45:35,531-Speed 18486.67 samples/sec Loss 21.1230 LearningRate 0.2465 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:45:40,006-Speed 18313.70 samples/sec Loss 21.0287 LearningRate 0.2469 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:45:44,446-Speed 18454.96 samples/sec Loss 21.1260 LearningRate 0.2473 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:45:48,855-Speed 18587.18 samples/sec Loss 20.9596 LearningRate 0.2477 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:45:53,268-Speed 18566.20 samples/sec Loss 20.9414 LearningRate 0.2481 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:45:57,680-Speed 18573.66 samples/sec Loss 20.8915 LearningRate 0.2485 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:46:02,412-Speed 17314.64 samples/sec Loss 20.8920 LearningRate 0.2488 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:08,072-Speed 14477.68 samples/sec Loss 20.8673 LearningRate 0.2492 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:12,490-Speed 18547.54 samples/sec Loss 20.8523 LearningRate 0.2496 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:16,938-Speed 18421.03 samples/sec Loss 20.8412 LearningRate 0.2500 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:21,340-Speed 18617.02 samples/sec Loss 20.7933 LearningRate 0.2504 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:25,719-Speed 18712.24 samples/sec Loss 20.7281 LearningRate 0.2508 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:30,114-Speed 18643.28 samples/sec Loss 20.7102 LearningRate 0.2512 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:34,509-Speed 18643.25 samples/sec Loss 20.7015 LearningRate 0.2515 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:38,908-Speed 18624.96 samples/sec Loss 20.6651 LearningRate 0.2519 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:43,271-Speed 18783.10 samples/sec Loss 20.5738 LearningRate 0.2523 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:46:47,697-Speed 18515.03 samples/sec Loss 20.5564 LearningRate 0.2527 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:46:52,141-Speed 18437.31 samples/sec Loss 20.5081 LearningRate 0.2531 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:46:56,536-Speed 18642.80 samples/sec Loss 20.5080 LearningRate 0.2535 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:00,941-Speed 18603.67 samples/sec Loss 20.5033 LearningRate 0.2539 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:06,299-Speed 15292.12 samples/sec Loss 20.4707 LearningRate 0.2542 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:10,679-Speed 18707.34 samples/sec Loss 20.4206 LearningRate 0.2546 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:15,063-Speed 18690.77 samples/sec Loss 20.3724 LearningRate 0.2550 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:19,445-Speed 18700.38 samples/sec Loss 20.3392 LearningRate 0.2554 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:23,853-Speed 18590.47 samples/sec Loss 20.3282 LearningRate 0.2558 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:28,253-Speed 18623.77 samples/sec Loss 20.2889 LearningRate 0.2562 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:47:32,670-Speed 18548.13 samples/sec Loss 20.2312 LearningRate 0.2566 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:47:37,086-Speed 18558.82 samples/sec Loss 20.1626 LearningRate 0.2569 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:47:41,469-Speed 18694.08 samples/sec Loss 20.1934 LearningRate 0.2573 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:47:45,861-Speed 18660.18 samples/sec Loss 20.1086 LearningRate 0.2577 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:47:50,272-Speed 18576.29 samples/sec Loss 20.1232 LearningRate 0.2581 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:47:54,696-Speed 18520.66 samples/sec Loss 20.0301 LearningRate 0.2585 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:47:59,088-Speed 18654.23 samples/sec Loss 19.9981 LearningRate 0.2589 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:03,493-Speed 18599.76 samples/sec Loss 20.0725 LearningRate 0.2593 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:07,892-Speed 18629.99 samples/sec Loss 19.9872 LearningRate 0.2596 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:12,282-Speed 18665.20 samples/sec Loss 19.9482 LearningRate 0.2600 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:16,755-Speed 18320.23 samples/sec Loss 19.8939 LearningRate 0.2604 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:48:21,140-Speed 18687.16 samples/sec Loss 19.8381 LearningRate 0.2608 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:25,514-Speed 18733.59 samples/sec Loss 19.8757 LearningRate 0.2612 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:30,572-Speed 16200.18 samples/sec Loss 19.8173 LearningRate 0.2616 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:34,980-Speed 18590.13 samples/sec Loss 19.7998 LearningRate 0.2620 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:39,389-Speed 18584.70 samples/sec Loss 19.7559 LearningRate 0.2623 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:43,790-Speed 18619.40 samples/sec Loss 19.7738 LearningRate 0.2627 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:48,203-Speed 18575.21 samples/sec Loss 19.7077 LearningRate 0.2631 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:52,582-Speed 18713.87 samples/sec Loss 19.6891 LearningRate 0.2635 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:48:57,003-Speed 18534.08 samples/sec Loss 19.6613 LearningRate 0.2639 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:49:01,812-Speed 17038.70 samples/sec Loss 19.5937 LearningRate 0.2643 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:49:06,243-Speed 18494.08 samples/sec Loss 19.5529 LearningRate 0.2647 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:49:10,648-Speed 18598.72 samples/sec Loss 19.5258 LearningRate 0.2650 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:49:15,071-Speed 18528.48 samples/sec Loss 19.5488 LearningRate 0.2654 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:49:19,537-Speed 18346.19 samples/sec Loss 19.4526 LearningRate 0.2658 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:49:23,925-Speed 18672.58 samples/sec Loss 19.4598 LearningRate 0.2662 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:49:28,350-Speed 18519.81 samples/sec Loss 19.4080 LearningRate 0.2666 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:49:32,753-Speed 18610.49 samples/sec Loss 19.4396 LearningRate 0.2670 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:49:37,159-Speed 18593.77 samples/sec Loss 19.3922 LearningRate 0.2674 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:49:41,550-Speed 18663.50 samples/sec Loss 19.3957 LearningRate 0.2677 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:49:45,954-Speed 18604.34 samples/sec Loss 19.2983 LearningRate 0.2681 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:49:50,437-Speed 18280.63 samples/sec Loss 19.3008 LearningRate 0.2685 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:49:56,343-Speed 13870.58 samples/sec Loss 19.2688 LearningRate 0.2689 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:50:01,004-Speed 17580.71 samples/sec Loss 19.2403 LearningRate 0.2693 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:50:05,408-Speed 18606.92 samples/sec Loss 19.1979 LearningRate 0.2697 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:50:09,837-Speed 18501.47 samples/sec Loss 19.2180 LearningRate 0.2701 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:50:14,231-Speed 18646.27 samples/sec Loss 19.1726 LearningRate 0.2704 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:50:18,631-Speed 18623.28 samples/sec Loss 19.1147 LearningRate 0.2708 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:50:23,032-Speed 18618.55 samples/sec Loss 19.1055 LearningRate 0.2712 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:27,435-Speed 18612.34 samples/sec Loss 19.0581 LearningRate 0.2716 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:31,865-Speed 18494.88 samples/sec Loss 19.0407 LearningRate 0.2720 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:36,325-Speed 18370.72 samples/sec Loss 19.0226 LearningRate 0.2724 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:40,748-Speed 18530.04 samples/sec Loss 18.9818 LearningRate 0.2728 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:45,196-Speed 18422.97 samples/sec Loss 18.9865 LearningRate 0.2731 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:49,617-Speed 18537.79 samples/sec Loss 18.9341 LearningRate 0.2735 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:54,033-Speed 18555.12 samples/sec Loss 18.9074 LearningRate 0.2739 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:50:58,409-Speed 18727.02 samples/sec Loss 18.8767 LearningRate 0.2743 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:02,800-Speed 18661.33 samples/sec Loss 18.7843 LearningRate 0.2747 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:07,202-Speed 18616.09 samples/sec Loss 18.8120 LearningRate 0.2751 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:51:11,642-Speed 18454.74 samples/sec Loss 18.7360 LearningRate 0.2755 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:51:16,038-Speed 18642.62 samples/sec Loss 18.7913 LearningRate 0.2758 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:51:20,438-Speed 18624.29 samples/sec Loss 18.7312 LearningRate 0.2762 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:51:24,876-Speed 18463.19 samples/sec Loss 18.7570 LearningRate 0.2766 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:29,265-Speed 18674.68 samples/sec Loss 18.6892 LearningRate 0.2770 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:33,646-Speed 18702.22 samples/sec Loss 18.6591 LearningRate 0.2774 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:38,063-Speed 18549.96 samples/sec Loss 18.5951 LearningRate 0.2778 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:42,500-Speed 18469.47 samples/sec Loss 18.5911 LearningRate 0.2782 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:46,919-Speed 18541.79 samples/sec Loss 18.6338 LearningRate 0.2785 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:51,419-Speed 18212.77 samples/sec Loss 18.5870 LearningRate 0.2789 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:51:55,811-Speed 18655.47 samples/sec Loss 18.6141 LearningRate 0.2793 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:00,198-Speed 18681.40 samples/sec Loss 18.4635 LearningRate 0.2797 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:04,660-Speed 18364.98 samples/sec Loss 18.3811 LearningRate 0.2801 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:09,068-Speed 18589.03 samples/sec Loss 18.4152 LearningRate 0.2805 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:52:13,490-Speed 18534.18 samples/sec Loss 18.4248 LearningRate 0.2809 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:52:17,898-Speed 18587.04 samples/sec Loss 18.3892 LearningRate 0.2812 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:22,315-Speed 18549.20 samples/sec Loss 18.3369 LearningRate 0.2816 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:26,723-Speed 18594.66 samples/sec Loss 18.3086 LearningRate 0.2820 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:31,107-Speed 18688.63 samples/sec Loss 18.3086 LearningRate 0.2824 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:35,509-Speed 18613.56 samples/sec Loss 18.2234 LearningRate 0.2828 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:39,932-Speed 18528.09 samples/sec Loss 18.2383 LearningRate 0.2832 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:44,358-Speed 18512.15 samples/sec Loss 18.2853 LearningRate 0.2836 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:48,793-Speed 18475.93 samples/sec Loss 18.2578 LearningRate 0.2840 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:53,209-Speed 18554.15 samples/sec Loss 18.1770 LearningRate 0.2843 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:52:57,591-Speed 18700.52 samples/sec Loss 18.1490 LearningRate 0.2847 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:01,986-Speed 18644.66 samples/sec Loss 18.1149 LearningRate 0.2851 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:06,373-Speed 18677.37 samples/sec Loss 18.0907 LearningRate 0.2855 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:10,773-Speed 18622.52 samples/sec Loss 18.0964 LearningRate 0.2859 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:15,162-Speed 18670.44 samples/sec Loss 18.0568 LearningRate 0.2863 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:19,616-Speed 18400.87 samples/sec Loss 18.0673 LearningRate 0.2867 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:23,985-Speed 18755.10 samples/sec Loss 18.0825 LearningRate 0.2870 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:28,407-Speed 18529.34 samples/sec Loss 18.0053 LearningRate 0.2874 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:32,869-Speed 18364.99 samples/sec Loss 18.0240 LearningRate 0.2878 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:37,301-Speed 18488.72 samples/sec Loss 17.9745 LearningRate 0.2882 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:41,751-Speed 18411.84 samples/sec Loss 17.9075 LearningRate 0.2886 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:46,172-Speed 18534.15 samples/sec Loss 17.8998 LearningRate 0.2890 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:53:50,578-Speed 18601.26 samples/sec Loss 17.8617 LearningRate 0.2894 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:53:54,961-Speed 18691.42 samples/sec Loss 17.9220 LearningRate 0.2897 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:53:59,373-Speed 18574.41 samples/sec Loss 17.8201 LearningRate 0.2901 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:54:03,784-Speed 18573.54 samples/sec Loss 17.8464 LearningRate 0.2905 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:54:10,780-Speed 11712.12 samples/sec Loss 17.8355 LearningRate 0.2909 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:54:15,266-Speed 18261.54 samples/sec Loss 17.7653 LearningRate 0.2913 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:54:19,698-Speed 18489.53 samples/sec Loss 17.7399 LearningRate 0.2917 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:54:24,212-Speed 18155.70 samples/sec Loss 17.7322 LearningRate 0.2921 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:28,629-Speed 18551.34 samples/sec Loss 17.7425 LearningRate 0.2924 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:33,054-Speed 18519.08 samples/sec Loss 17.6606 LearningRate 0.2928 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:37,479-Speed 18516.05 samples/sec Loss 17.6595 LearningRate 0.2932 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:41,906-Speed 18511.95 samples/sec Loss 17.6228 LearningRate 0.2936 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:46,292-Speed 18680.67 samples/sec Loss 17.6854 LearningRate 0.2940 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:50,688-Speed 18639.52 samples/sec Loss 17.6162 LearningRate 0.2944 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:55,087-Speed 18630.98 samples/sec Loss 17.5339 LearningRate 0.2948 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:54:59,477-Speed 18664.48 samples/sec Loss 17.5829 LearningRate 0.2951 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:55:03,886-Speed 18582.74 samples/sec Loss 17.4963 LearningRate 0.2955 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:55:08,312-Speed 18512.95 samples/sec Loss 17.5416 LearningRate 0.2959 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:12,745-Speed 18486.92 samples/sec Loss 17.4643 LearningRate 0.2963 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:17,128-Speed 18690.64 samples/sec Loss 17.4596 LearningRate 0.2967 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:21,553-Speed 18526.40 samples/sec Loss 17.4308 LearningRate 0.2971 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:25,968-Speed 18560.34 samples/sec Loss 17.3758 LearningRate 0.2975 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:30,374-Speed 18598.14 samples/sec Loss 17.3800 LearningRate 0.2978 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:34,755-Speed 18702.16 samples/sec Loss 17.3992 LearningRate 0.2982 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:39,168-Speed 18568.40 samples/sec Loss 17.3545 LearningRate 0.2986 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:55:43,570-Speed 18613.82 samples/sec Loss 17.3325 LearningRate 0.2990 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:55:47,991-Speed 18534.03 samples/sec Loss 17.3444 LearningRate 0.2994 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:55:52,390-Speed 18627.33 samples/sec Loss 17.2622 LearningRate 0.2998 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:55:56,809-Speed 18545.61 samples/sec Loss 17.2925 LearningRate 0.3002 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:56:01,275-Speed 18343.46 samples/sec Loss 17.3005 LearningRate 0.3005 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:56:05,669-Speed 18646.90 samples/sec Loss 17.2353 LearningRate 0.3009 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:56:10,100-Speed 18494.29 samples/sec Loss 17.2290 LearningRate 0.3013 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:56:14,497-Speed 18636.17 samples/sec Loss 17.1632 LearningRate 0.3017 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:56:18,906-Speed 18588.21 samples/sec Loss 17.1640 LearningRate 0.3021 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:56:23,286-Speed 18706.47 samples/sec Loss 17.1710 LearningRate 0.3025 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:56:27,667-Speed 18705.41 samples/sec Loss 17.1771 LearningRate 0.3029 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:56:32,061-Speed 18647.90 samples/sec Loss 17.0910 LearningRate 0.3032 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:56:36,469-Speed 18590.49 samples/sec Loss 17.1080 LearningRate 0.3036 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:56:42,133-Speed 14467.20 samples/sec Loss 17.0432 LearningRate 0.3040 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:56:46,537-Speed 18606.12 samples/sec Loss 17.1093 LearningRate 0.3044 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:56:50,939-Speed 18615.71 samples/sec Loss 17.0587 LearningRate 0.3048 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:56:55,348-Speed 18585.86 samples/sec Loss 16.9772 LearningRate 0.3052 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:56:59,776-Speed 18501.65 samples/sec Loss 16.9937 LearningRate 0.3056 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:57:04,162-Speed 18682.80 samples/sec Loss 17.0265 LearningRate 0.3059 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:57:08,580-Speed 18547.77 samples/sec Loss 16.8708 LearningRate 0.3063 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:57:12,988-Speed 18589.83 samples/sec Loss 16.9094 LearningRate 0.3067 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:57:18,176-Speed 15795.55 samples/sec Loss 16.8696 LearningRate 0.3071 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:22,796-Speed 17735.23 samples/sec Loss 16.8982 LearningRate 0.3075 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:29,308-Speed 12581.63 samples/sec Loss 16.9162 LearningRate 0.3079 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:33,702-Speed 18650.58 samples/sec Loss 16.8304 LearningRate 0.3083 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:38,291-Speed 17856.24 samples/sec Loss 16.8604 LearningRate 0.3086 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:42,685-Speed 18649.45 samples/sec Loss 16.8427 LearningRate 0.3090 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:47,117-Speed 18487.67 samples/sec Loss 16.8471 LearningRate 0.3094 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:51,524-Speed 18595.45 samples/sec Loss 16.8035 LearningRate 0.3098 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:57:57,070-Speed 14774.26 samples/sec Loss 16.7741 LearningRate 0.3102 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:58:01,496-Speed 18512.80 samples/sec Loss 16.7765 LearningRate 0.3106 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 22:58:05,908-Speed 18579.79 samples/sec Loss 16.7875 LearningRate 0.3110 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:10,321-Speed 18570.79 samples/sec Loss 16.7205 LearningRate 0.3113 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:14,719-Speed 18631.85 samples/sec Loss 16.6585 LearningRate 0.3117 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:19,157-Speed 18466.39 samples/sec Loss 16.6480 LearningRate 0.3121 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:23,564-Speed 18593.07 samples/sec Loss 16.6343 LearningRate 0.3125 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:27,932-Speed 18762.73 samples/sec Loss 16.5833 LearningRate 0.3129 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:32,305-Speed 18739.55 samples/sec Loss 16.6235 LearningRate 0.3133 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:37,105-Speed 17070.40 samples/sec Loss 16.6038 LearningRate 0.3137 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:42,205-Speed 16068.56 samples/sec Loss 16.5159 LearningRate 0.3140 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:46,643-Speed 18466.11 samples/sec Loss 16.5601 LearningRate 0.3144 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:58:51,081-Speed 18466.95 samples/sec Loss 16.6019 LearningRate 0.3148 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:58:55,483-Speed 18619.42 samples/sec Loss 16.5694 LearningRate 0.3152 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:59:00,285-Speed 17065.93 samples/sec Loss 16.4860 LearningRate 0.3156 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:05,145-Speed 16860.66 samples/sec Loss 16.5601 LearningRate 0.3160 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:09,549-Speed 18603.60 samples/sec Loss 16.5299 LearningRate 0.3164 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:13,969-Speed 18541.05 samples/sec Loss 16.4507 LearningRate 0.3167 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:18,406-Speed 18468.64 samples/sec Loss 16.4278 LearningRate 0.3171 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:22,797-Speed 18663.45 samples/sec Loss 16.3599 LearningRate 0.3175 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:27,195-Speed 18631.47 samples/sec Loss 16.3956 LearningRate 0.3179 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:31,586-Speed 18661.85 samples/sec Loss 16.3980 LearningRate 0.3183 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:35,991-Speed 18597.83 samples/sec Loss 16.3354 LearningRate 0.3187 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:40,399-Speed 18591.69 samples/sec Loss 16.3424 LearningRate 0.3191 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:44,810-Speed 18577.66 samples/sec Loss 16.3145 LearningRate 0.3194 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:59:49,244-Speed 18480.31 samples/sec Loss 16.3502 LearningRate 0.3198 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 22:59:53,670-Speed 18513.91 samples/sec Loss 16.3321 LearningRate 0.3202 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 22:59:58,058-Speed 18674.86 samples/sec Loss 16.2335 LearningRate 0.3206 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:02,474-Speed 18554.37 samples/sec Loss 16.2241 LearningRate 0.3210 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:06,864-Speed 18666.87 samples/sec Loss 16.2039 LearningRate 0.3214 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:11,243-Speed 18715.11 samples/sec Loss 16.2365 LearningRate 0.3218 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:15,630-Speed 18680.58 samples/sec Loss 16.1910 LearningRate 0.3221 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:20,016-Speed 18682.43 samples/sec Loss 16.1911 LearningRate 0.3225 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:24,438-Speed 18527.74 samples/sec Loss 16.2075 LearningRate 0.3229 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:28,825-Speed 18680.26 samples/sec Loss 16.1354 LearningRate 0.3233 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:33,211-Speed 18680.31 samples/sec Loss 16.1477 LearningRate 0.3237 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:00:37,592-Speed 18703.97 samples/sec Loss 16.1127 LearningRate 0.3241 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:00:41,945-Speed 18826.61 samples/sec Loss 16.0731 LearningRate 0.3245 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:00:46,435-Speed 18245.71 samples/sec Loss 16.1319 LearningRate 0.3248 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:00:50,825-Speed 18667.51 samples/sec Loss 16.0691 LearningRate 0.3252 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:00:55,197-Speed 18740.09 samples/sec Loss 16.0582 LearningRate 0.3256 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:00:59,606-Speed 18589.67 samples/sec Loss 16.0277 LearningRate 0.3260 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:01:04,000-Speed 18646.82 samples/sec Loss 16.0362 LearningRate 0.3264 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:01:08,379-Speed 18709.88 samples/sec Loss 16.0231 LearningRate 0.3268 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:01:12,767-Speed 18673.95 samples/sec Loss 16.0342 LearningRate 0.3272 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:01:17,191-Speed 18520.24 samples/sec Loss 15.9847 LearningRate 0.3275 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:01:21,604-Speed 18573.63 samples/sec Loss 15.9993 LearningRate 0.3279 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:01:25,980-Speed 18723.17 samples/sec Loss 15.9872 LearningRate 0.3283 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:01:30,368-Speed 18674.27 samples/sec Loss 15.9708 LearningRate 0.3287 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:01:34,798-Speed 18496.00 samples/sec Loss 15.9244 LearningRate 0.3291 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:01:39,180-Speed 18700.34 samples/sec Loss 15.9241 LearningRate 0.3295 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:01:43,585-Speed 18606.71 samples/sec Loss 15.8553 LearningRate 0.3299 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:01:47,999-Speed 18563.60 samples/sec Loss 15.8522 LearningRate 0.3302 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:01:52,360-Speed 18789.68 samples/sec Loss 15.8077 LearningRate 0.3306 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:01:56,758-Speed 18631.95 samples/sec Loss 15.8514 LearningRate 0.3310 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:01,154-Speed 18641.86 samples/sec Loss 15.7940 LearningRate 0.3314 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:05,568-Speed 18563.18 samples/sec Loss 15.8096 LearningRate 0.3318 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:09,940-Speed 18741.11 samples/sec Loss 15.8104 LearningRate 0.3322 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:14,340-Speed 18624.72 samples/sec Loss 15.8345 LearningRate 0.3326 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:18,778-Speed 18466.69 samples/sec Loss 15.8266 LearningRate 0.3329 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:23,159-Speed 18701.39 samples/sec Loss 15.7519 LearningRate 0.3333 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:27,557-Speed 18635.02 samples/sec Loss 15.7339 LearningRate 0.3337 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:31,937-Speed 18706.33 samples/sec Loss 15.7172 LearningRate 0.3341 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:36,347-Speed 18578.85 samples/sec Loss 15.6810 LearningRate 0.3345 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:40,768-Speed 18539.14 samples/sec Loss 15.7084 LearningRate 0.3349 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:45,668-Speed 16725.01 samples/sec Loss 15.6906 LearningRate 0.3353 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:50,063-Speed 18645.11 samples/sec Loss 15.6482 LearningRate 0.3356 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:54,430-Speed 18763.82 samples/sec Loss 15.6132 LearningRate 0.3360 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:02:58,813-Speed 18698.60 samples/sec Loss 15.6334 LearningRate 0.3364 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:03:03,225-Speed 18569.30 samples/sec Loss 15.6006 LearningRate 0.3368 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:07,622-Speed 18638.25 samples/sec Loss 15.6393 LearningRate 0.3372 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:12,032-Speed 18582.79 samples/sec Loss 15.6230 LearningRate 0.3376 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:16,413-Speed 18703.41 samples/sec Loss 15.5373 LearningRate 0.3380 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:20,835-Speed 18531.16 samples/sec Loss 15.5488 LearningRate 0.3383 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:25,228-Speed 18658.22 samples/sec Loss 15.5292 LearningRate 0.3387 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:29,645-Speed 18557.33 samples/sec Loss 15.4930 LearningRate 0.3391 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:34,063-Speed 18549.04 samples/sec Loss 15.5237 LearningRate 0.3395 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:38,487-Speed 18524.12 samples/sec Loss 15.4512 LearningRate 0.3399 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:42,884-Speed 18636.37 samples/sec Loss 15.5762 LearningRate 0.3403 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:03:47,309-Speed 18518.99 samples/sec Loss 15.5547 LearningRate 0.3407 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:03:51,696-Speed 18678.18 samples/sec Loss 15.5038 LearningRate 0.3410 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:03:56,090-Speed 18652.12 samples/sec Loss 15.4693 LearningRate 0.3414 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:00,460-Speed 18750.67 samples/sec Loss 15.4135 LearningRate 0.3418 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:04,845-Speed 18689.80 samples/sec Loss 15.4450 LearningRate 0.3422 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:09,234-Speed 18673.56 samples/sec Loss 15.4181 LearningRate 0.3426 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:13,866-Speed 17698.17 samples/sec Loss 15.3938 LearningRate 0.3430 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:18,251-Speed 18685.15 samples/sec Loss 15.4052 LearningRate 0.3434 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:22,635-Speed 18696.21 samples/sec Loss 15.3915 LearningRate 0.3438 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:27,528-Speed 16747.99 samples/sec Loss 15.3282 LearningRate 0.3441 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:32,089-Speed 17967.31 samples/sec Loss 15.3278 LearningRate 0.3445 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:36,505-Speed 18555.41 samples/sec Loss 15.3306 LearningRate 0.3449 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:40,889-Speed 18689.97 samples/sec Loss 15.2790 LearningRate 0.3453 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:04:45,273-Speed 18689.60 samples/sec Loss 15.2868 LearningRate 0.3457 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:04:49,697-Speed 18521.88 samples/sec Loss 15.3068 LearningRate 0.3461 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:04:54,095-Speed 18631.43 samples/sec Loss 15.2602 LearningRate 0.3465 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:04:58,497-Speed 18613.86 samples/sec Loss 15.2961 LearningRate 0.3468 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:02,935-Speed 18465.03 samples/sec Loss 15.2182 LearningRate 0.3472 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:07,357-Speed 18529.38 samples/sec Loss 15.1912 LearningRate 0.3476 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:11,722-Speed 18771.19 samples/sec Loss 15.2176 LearningRate 0.3480 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:18,353-Speed 12356.39 samples/sec Loss 15.2298 LearningRate 0.3484 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:22,795-Speed 18446.94 samples/sec Loss 15.2120 LearningRate 0.3488 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:27,209-Speed 18564.30 samples/sec Loss 15.2394 LearningRate 0.3492 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:32,575-Speed 15268.68 samples/sec Loss 15.1951 LearningRate 0.3495 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:37,372-Speed 17079.68 samples/sec Loss 15.1442 LearningRate 0.3499 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:41,815-Speed 18444.17 samples/sec Loss 15.1722 LearningRate 0.3503 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:05:46,500-Speed 17489.66 samples/sec Loss 15.1754 LearningRate 0.3507 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:50,890-Speed 18666.18 samples/sec Loss 15.1265 LearningRate 0.3511 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:05:55,796-Speed 16701.07 samples/sec Loss 15.0679 LearningRate 0.3515 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:00,209-Speed 18567.14 samples/sec Loss 15.1105 LearningRate 0.3519 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:04,630-Speed 18533.96 samples/sec Loss 15.0971 LearningRate 0.3522 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:09,025-Speed 18646.72 samples/sec Loss 15.0890 LearningRate 0.3526 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:13,420-Speed 18645.56 samples/sec Loss 15.0418 LearningRate 0.3530 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:17,811-Speed 18658.10 samples/sec Loss 15.0359 LearningRate 0.3534 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:22,791-Speed 16455.65 samples/sec Loss 15.0318 LearningRate 0.3538 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:27,195-Speed 18607.77 samples/sec Loss 15.0254 LearningRate 0.3542 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:31,602-Speed 18592.35 samples/sec Loss 15.0984 LearningRate 0.3546 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:06:35,995-Speed 18652.34 samples/sec Loss 15.1414 LearningRate 0.3549 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:06:40,381-Speed 18680.66 samples/sec Loss 15.0069 LearningRate 0.3553 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:06:44,790-Speed 18586.25 samples/sec Loss 14.9893 LearningRate 0.3557 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:06:49,169-Speed 18719.12 samples/sec Loss 14.9931 LearningRate 0.3561 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:53,592-Speed 18529.64 samples/sec Loss 14.9628 LearningRate 0.3565 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:06:58,587-Speed 16406.15 samples/sec Loss 14.9670 LearningRate 0.3569 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:07:03,065-Speed 18300.11 samples/sec Loss 14.9795 LearningRate 0.3573 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:07:07,544-Speed 18293.07 samples/sec Loss 14.9806 LearningRate 0.3576 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:07:12,014-Speed 18331.23 samples/sec Loss 14.8895 LearningRate 0.3580 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:07:16,479-Speed 18350.41 samples/sec Loss 14.9019 LearningRate 0.3584 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:07:20,991-Speed 18163.18 samples/sec Loss 14.8837 LearningRate 0.3588 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:07:25,496-Speed 18192.45 samples/sec Loss 14.8775 LearningRate 0.3592 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:07:29,972-Speed 18310.75 samples/sec Loss 14.8733 LearningRate 0.3596 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:07:34,473-Speed 18205.74 samples/sec Loss 14.8945 LearningRate 0.3600 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:07:38,967-Speed 18231.66 samples/sec Loss 14.8756 LearningRate 0.3603 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:07:43,465-Speed 18217.99 samples/sec Loss 14.8389 LearningRate 0.3607 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:07:47,969-Speed 18191.62 samples/sec Loss 14.7719 LearningRate 0.3611 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:07:52,522-Speed 17997.11 samples/sec Loss 14.7949 LearningRate 0.3615 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:07:57,000-Speed 18299.70 samples/sec Loss 14.8447 LearningRate 0.3619 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:08:01,503-Speed 18200.72 samples/sec Loss 14.7910 LearningRate 0.3623 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:08:05,966-Speed 18357.13 samples/sec Loss 14.8636 LearningRate 0.3627 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:08:10,431-Speed 18351.39 samples/sec Loss 14.7581 LearningRate 0.3630 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:08:14,925-Speed 18235.99 samples/sec Loss 14.8006 LearningRate 0.3634 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:19,389-Speed 18359.44 samples/sec Loss 14.7702 LearningRate 0.3638 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:23,883-Speed 18233.62 samples/sec Loss 14.7653 LearningRate 0.3642 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:28,377-Speed 18230.80 samples/sec Loss 14.6503 LearningRate 0.3646 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:32,881-Speed 18193.12 samples/sec Loss 14.7220 LearningRate 0.3650 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:37,418-Speed 18064.21 samples/sec Loss 14.7106 LearningRate 0.3654 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:41,883-Speed 18350.93 samples/sec Loss 14.7469 LearningRate 0.3657 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:46,240-Speed 18807.99 samples/sec Loss 14.7252 LearningRate 0.3661 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:08:50,613-Speed 18735.68 samples/sec Loss 14.6569 LearningRate 0.3665 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:08:55,025-Speed 18569.62 samples/sec Loss 14.6635 LearningRate 0.3669 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:08:59,446-Speed 18536.60 samples/sec Loss 14.6220 LearningRate 0.3673 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:03,862-Speed 18558.84 samples/sec Loss 14.6959 LearningRate 0.3677 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:08,261-Speed 18626.14 samples/sec Loss 14.6722 LearningRate 0.3681 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:12,689-Speed 18504.44 samples/sec Loss 14.5839 LearningRate 0.3684 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:17,115-Speed 18514.31 samples/sec Loss 14.6130 LearningRate 0.3688 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:21,555-Speed 18456.50 samples/sec Loss 14.6330 LearningRate 0.3692 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:25,968-Speed 18565.00 samples/sec Loss 14.6021 LearningRate 0.3696 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:30,358-Speed 18669.34 samples/sec Loss 14.6229 LearningRate 0.3700 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:09:34,781-Speed 18532.39 samples/sec Loss 14.5944 LearningRate 0.3704 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:09:39,212-Speed 18498.18 samples/sec Loss 14.6259 LearningRate 0.3708 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:09:43,675-Speed 18362.56 samples/sec Loss 14.5630 LearningRate 0.3711 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:09:48,070-Speed 18644.30 samples/sec Loss 14.5589 LearningRate 0.3715 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:09:52,468-Speed 18632.06 samples/sec Loss 14.5160 LearningRate 0.3719 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:09:56,854-Speed 18688.71 samples/sec Loss 14.5077 LearningRate 0.3723 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:10:01,277-Speed 18530.67 samples/sec Loss 14.4979 LearningRate 0.3727 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:10:05,678-Speed 18618.53 samples/sec Loss 14.5419 LearningRate 0.3731 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:10:10,054-Speed 18724.33 samples/sec Loss 14.5314 LearningRate 0.3735 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:10:14,463-Speed 18588.27 samples/sec Loss 14.5183 LearningRate 0.3738 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:10:18,872-Speed 18583.59 samples/sec Loss 14.4755 LearningRate 0.3742 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:10:23,312-Speed 18457.37 samples/sec Loss 14.4916 LearningRate 0.3746 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:10:27,685-Speed 18738.35 samples/sec Loss 14.4648 LearningRate 0.3750 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:10:32,080-Speed 18644.98 samples/sec Loss 14.4579 LearningRate 0.3754 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:10:36,522-Speed 18447.56 samples/sec Loss 14.4663 LearningRate 0.3758 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:10:40,947-Speed 18516.56 samples/sec Loss 14.4954 LearningRate 0.3762 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:10:45,346-Speed 18631.00 samples/sec Loss 14.4151 LearningRate 0.3765 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:10:49,754-Speed 18587.79 samples/sec Loss 14.4743 LearningRate 0.3769 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:10:54,137-Speed 18694.84 samples/sec Loss 14.3821 LearningRate 0.3773 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:10:58,502-Speed 18775.57 samples/sec Loss 14.3939 LearningRate 0.3777 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:02,880-Speed 18716.87 samples/sec Loss 14.4071 LearningRate 0.3781 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:07,277-Speed 18634.21 samples/sec Loss 14.3988 LearningRate 0.3785 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:11,689-Speed 18572.12 samples/sec Loss 14.4360 LearningRate 0.3789 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:11:16,088-Speed 18626.39 samples/sec Loss 14.3555 LearningRate 0.3792 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:11:20,513-Speed 18518.99 samples/sec Loss 14.4068 LearningRate 0.3796 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:11:24,883-Speed 18753.97 samples/sec Loss 14.3286 LearningRate 0.3800 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:29,282-Speed 18629.38 samples/sec Loss 14.3180 LearningRate 0.3804 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:33,681-Speed 18626.09 samples/sec Loss 14.3324 LearningRate 0.3808 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:38,090-Speed 18587.22 samples/sec Loss 14.3230 LearningRate 0.3812 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:42,506-Speed 18554.13 samples/sec Loss 14.3395 LearningRate 0.3816 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:46,890-Speed 18690.73 samples/sec Loss 14.3310 LearningRate 0.3819 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:51,510-Speed 17739.07 samples/sec Loss 14.3007 LearningRate 0.3823 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:11:55,917-Speed 18590.87 samples/sec Loss 14.2717 LearningRate 0.3827 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:00,289-Speed 18744.89 samples/sec Loss 14.2979 LearningRate 0.3831 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:04,634-Speed 18857.40 samples/sec Loss 14.2523 LearningRate 0.3835 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:09,009-Speed 18727.84 samples/sec Loss 14.2797 LearningRate 0.3839 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:12:13,427-Speed 18550.13 samples/sec Loss 14.3133 LearningRate 0.3843 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:17,816-Speed 18669.70 samples/sec Loss 14.2455 LearningRate 0.3846 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:22,192-Speed 18726.86 samples/sec Loss 14.2413 LearningRate 0.3850 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:26,553-Speed 18787.32 samples/sec Loss 14.2350 LearningRate 0.3854 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:30,977-Speed 18521.64 samples/sec Loss 14.2963 LearningRate 0.3858 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:35,378-Speed 18621.88 samples/sec Loss 14.2646 LearningRate 0.3862 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:39,750-Speed 18742.20 samples/sec Loss 14.2091 LearningRate 0.3866 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:44,136-Speed 18686.34 samples/sec Loss 14.1992 LearningRate 0.3870 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:48,576-Speed 18452.12 samples/sec Loss 14.1879 LearningRate 0.3873 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:52,957-Speed 18710.12 samples/sec Loss 14.1879 LearningRate 0.3877 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:12:57,335-Speed 18716.65 samples/sec Loss 14.1947 LearningRate 0.3881 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:13:01,715-Speed 18707.02 samples/sec Loss 14.1988 LearningRate 0.3885 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:13:06,107-Speed 18660.20 samples/sec Loss 14.1766 LearningRate 0.3889 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:10,495-Speed 18672.36 samples/sec Loss 14.1518 LearningRate 0.3893 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:14,899-Speed 18605.85 samples/sec Loss 14.1434 LearningRate 0.3897 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:19,849-Speed 16552.70 samples/sec Loss 14.1546 LearningRate 0.3900 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:24,269-Speed 18539.68 samples/sec Loss 14.1465 LearningRate 0.3904 Epoch: 1 Global Step: 10120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:28,675-Speed 18594.96 samples/sec Loss 14.1216 LearningRate 0.3908 Epoch: 1 Global Step: 10130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:33,111-Speed 18473.83 samples/sec Loss 14.0858 LearningRate 0.3912 Epoch: 1 Global Step: 10140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:37,527-Speed 18561.33 samples/sec Loss 14.0935 LearningRate 0.3916 Epoch: 1 Global Step: 10150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:41,909-Speed 18695.58 samples/sec Loss 14.1213 LearningRate 0.3920 Epoch: 1 Global Step: 10160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:46,311-Speed 18617.76 samples/sec Loss 14.1124 LearningRate 0.3924 Epoch: 1 Global Step: 10170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:13:50,695-Speed 18692.81 samples/sec Loss 14.0869 LearningRate 0.3927 Epoch: 1 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:13:55,058-Speed 18781.46 samples/sec Loss 14.0415 LearningRate 0.3931 Epoch: 1 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:13:59,484-Speed 18516.48 samples/sec Loss 14.0581 LearningRate 0.3935 Epoch: 1 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:03,857-Speed 18735.45 samples/sec Loss 14.1059 LearningRate 0.3939 Epoch: 1 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:08,296-Speed 18457.98 samples/sec Loss 14.0066 LearningRate 0.3943 Epoch: 1 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:12,750-Speed 18402.19 samples/sec Loss 14.0371 LearningRate 0.3947 Epoch: 1 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:17,144-Speed 18648.04 samples/sec Loss 14.0888 LearningRate 0.3951 Epoch: 1 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:21,546-Speed 18612.43 samples/sec Loss 14.0553 LearningRate 0.3954 Epoch: 1 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:25,983-Speed 18474.22 samples/sec Loss 14.0359 LearningRate 0.3958 Epoch: 1 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:30,408-Speed 18522.60 samples/sec Loss 14.0047 LearningRate 0.3962 Epoch: 1 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:34,817-Speed 18587.33 samples/sec Loss 14.0000 LearningRate 0.3966 Epoch: 1 Global Step: 10280 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:14:39,190-Speed 18736.13 samples/sec Loss 14.0418 LearningRate 0.3970 Epoch: 1 Global Step: 10290 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:14:43,594-Speed 18615.81 samples/sec Loss 14.0014 LearningRate 0.3974 Epoch: 1 Global Step: 10300 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:14:47,983-Speed 18666.34 samples/sec Loss 13.9946 LearningRate 0.3978 Epoch: 1 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:52,365-Speed 18699.74 samples/sec Loss 14.0167 LearningRate 0.3981 Epoch: 1 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:14:56,738-Speed 18742.43 samples/sec Loss 13.9796 LearningRate 0.3985 Epoch: 1 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:01,122-Speed 18693.44 samples/sec Loss 13.9596 LearningRate 0.3989 Epoch: 1 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:05,522-Speed 18626.29 samples/sec Loss 14.0055 LearningRate 0.3993 Epoch: 1 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:09,942-Speed 18538.79 samples/sec Loss 14.0696 LearningRate 0.3997 Epoch: 1 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:14,409-Speed 18346.25 samples/sec Loss 14.0486 LearningRate 0.4000 Epoch: 1 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:32,894-Speed 4432.26 samples/sec Loss 13.8806 LearningRate 0.3999 Epoch: 2 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:37,290-Speed 18645.54 samples/sec Loss 13.9724 LearningRate 0.3998 Epoch: 2 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:41,711-Speed 18533.80 samples/sec Loss 13.9115 LearningRate 0.3997 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:46,093-Speed 18698.95 samples/sec Loss 13.9088 LearningRate 0.3996 Epoch: 2 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:50,497-Speed 18608.81 samples/sec Loss 13.8927 LearningRate 0.3996 Epoch: 2 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:54,914-Speed 18554.07 samples/sec Loss 13.8869 LearningRate 0.3995 Epoch: 2 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:15:59,310-Speed 18641.06 samples/sec Loss 13.8721 LearningRate 0.3994 Epoch: 2 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:16:03,738-Speed 18502.73 samples/sec Loss 13.8732 LearningRate 0.3993 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:16:08,163-Speed 18515.58 samples/sec Loss 13.9085 LearningRate 0.3992 Epoch: 2 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:16:12,552-Speed 18672.03 samples/sec Loss 13.8205 LearningRate 0.3991 Epoch: 2 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:16:16,932-Speed 18716.03 samples/sec Loss 13.8412 LearningRate 0.3990 Epoch: 2 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:16:21,351-Speed 18547.60 samples/sec Loss 13.8275 LearningRate 0.3990 Epoch: 2 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:16:25,740-Speed 18667.31 samples/sec Loss 13.8461 LearningRate 0.3989 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:16:30,142-Speed 18615.75 samples/sec Loss 13.8217 LearningRate 0.3988 Epoch: 2 Global Step: 10510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:16:34,609-Speed 18346.19 samples/sec Loss 13.7942 LearningRate 0.3987 Epoch: 2 Global Step: 10520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:16:38,978-Speed 18758.27 samples/sec Loss 13.7909 LearningRate 0.3986 Epoch: 2 Global Step: 10530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:16:43,395-Speed 18552.07 samples/sec Loss 13.8201 LearningRate 0.3985 Epoch: 2 Global Step: 10540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:16:47,777-Speed 18701.41 samples/sec Loss 13.7928 LearningRate 0.3984 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:16:52,158-Speed 18703.62 samples/sec Loss 13.7789 LearningRate 0.3984 Epoch: 2 Global Step: 10560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:16:56,558-Speed 18621.40 samples/sec Loss 13.7790 LearningRate 0.3983 Epoch: 2 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:17:00,952-Speed 18647.47 samples/sec Loss 13.7549 LearningRate 0.3982 Epoch: 2 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:17:05,350-Speed 18630.91 samples/sec Loss 13.7950 LearningRate 0.3981 Epoch: 2 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:17:09,748-Speed 18630.48 samples/sec Loss 13.7126 LearningRate 0.3980 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:17:14,162-Speed 18564.26 samples/sec Loss 13.8396 LearningRate 0.3979 Epoch: 2 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:17:18,590-Speed 18504.62 samples/sec Loss 13.7204 LearningRate 0.3978 Epoch: 2 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:17:22,994-Speed 18606.03 samples/sec Loss 13.6964 LearningRate 0.3978 Epoch: 2 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:17:27,397-Speed 18611.48 samples/sec Loss 13.6955 LearningRate 0.3977 Epoch: 2 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:17:31,858-Speed 18366.90 samples/sec Loss 13.6615 LearningRate 0.3976 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:17:36,262-Speed 18602.72 samples/sec Loss 13.7138 LearningRate 0.3975 Epoch: 2 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:17:40,662-Speed 18622.22 samples/sec Loss 13.6944 LearningRate 0.3974 Epoch: 2 Global Step: 10670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:17:45,025-Speed 18782.26 samples/sec Loss 13.6582 LearningRate 0.3973 Epoch: 2 Global Step: 10680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:17:49,473-Speed 18422.09 samples/sec Loss 13.6722 LearningRate 0.3972 Epoch: 2 Global Step: 10690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:17:53,921-Speed 18420.69 samples/sec Loss 13.6540 LearningRate 0.3972 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:17:58,348-Speed 18507.31 samples/sec Loss 13.6531 LearningRate 0.3971 Epoch: 2 Global Step: 10710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:18:02,747-Speed 18630.84 samples/sec Loss 13.6231 LearningRate 0.3970 Epoch: 2 Global Step: 10720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:18:07,178-Speed 18490.02 samples/sec Loss 13.6507 LearningRate 0.3969 Epoch: 2 Global Step: 10730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:18:11,612-Speed 18483.73 samples/sec Loss 13.6607 LearningRate 0.3968 Epoch: 2 Global Step: 10740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:18:19,708-Speed 10120.57 samples/sec Loss 13.6533 LearningRate 0.3967 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:18:24,130-Speed 18531.01 samples/sec Loss 13.7110 LearningRate 0.3966 Epoch: 2 Global Step: 10760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-13 23:18:28,536-Speed 18597.26 samples/sec Loss 13.6866 LearningRate 0.3966 Epoch: 2 Global Step: 10770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:18:32,975-Speed 18458.95 samples/sec Loss 13.6574 LearningRate 0.3965 Epoch: 2 Global Step: 10780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:18:37,398-Speed 18526.84 samples/sec Loss 13.6495 LearningRate 0.3964 Epoch: 2 Global Step: 10790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:18:41,792-Speed 18650.34 samples/sec Loss 13.6170 LearningRate 0.3963 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:18:46,205-Speed 18569.87 samples/sec Loss 13.6246 LearningRate 0.3962 Epoch: 2 Global Step: 10810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:18:50,660-Speed 18393.51 samples/sec Loss 13.5331 LearningRate 0.3961 Epoch: 2 Global Step: 10820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:18:55,088-Speed 18507.77 samples/sec Loss 13.5681 LearningRate 0.3960 Epoch: 2 Global Step: 10830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:18:59,503-Speed 18559.20 samples/sec Loss 13.6053 LearningRate 0.3960 Epoch: 2 Global Step: 10840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:19:03,966-Speed 18361.94 samples/sec Loss 13.5241 LearningRate 0.3959 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:19:08,350-Speed 18688.23 samples/sec Loss 13.5914 LearningRate 0.3958 Epoch: 2 Global Step: 10860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:19:12,819-Speed 18336.11 samples/sec Loss 13.5180 LearningRate 0.3957 Epoch: 2 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:17,241-Speed 18527.46 samples/sec Loss 13.4976 LearningRate 0.3956 Epoch: 2 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:21,658-Speed 18556.85 samples/sec Loss 13.4978 LearningRate 0.3955 Epoch: 2 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:26,091-Speed 18482.35 samples/sec Loss 13.5381 LearningRate 0.3955 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:30,496-Speed 18599.32 samples/sec Loss 13.5367 LearningRate 0.3954 Epoch: 2 Global Step: 10910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:34,956-Speed 18370.54 samples/sec Loss 13.5537 LearningRate 0.3953 Epoch: 2 Global Step: 10920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:39,354-Speed 18635.47 samples/sec Loss 13.4802 LearningRate 0.3952 Epoch: 2 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:43,760-Speed 18595.99 samples/sec Loss 13.5354 LearningRate 0.3951 Epoch: 2 Global Step: 10940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:48,224-Speed 18355.08 samples/sec Loss 13.4897 LearningRate 0.3950 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:19:52,619-Speed 18647.20 samples/sec Loss 13.4072 LearningRate 0.3949 Epoch: 2 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:19:57,104-Speed 18271.11 samples/sec Loss 13.4709 LearningRate 0.3949 Epoch: 2 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:01,543-Speed 18461.71 samples/sec Loss 13.4221 LearningRate 0.3948 Epoch: 2 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:05,955-Speed 18571.06 samples/sec Loss 13.4430 LearningRate 0.3947 Epoch: 2 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:10,406-Speed 18409.51 samples/sec Loss 13.3901 LearningRate 0.3946 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:14,816-Speed 18580.43 samples/sec Loss 13.4086 LearningRate 0.3945 Epoch: 2 Global Step: 11010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:19,213-Speed 18638.41 samples/sec Loss 13.4040 LearningRate 0.3944 Epoch: 2 Global Step: 11020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:23,616-Speed 18609.37 samples/sec Loss 13.4452 LearningRate 0.3943 Epoch: 2 Global Step: 11030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:28,042-Speed 18515.93 samples/sec Loss 13.4224 LearningRate 0.3943 Epoch: 2 Global Step: 11040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:32,531-Speed 18252.89 samples/sec Loss 13.3749 LearningRate 0.3942 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:20:36,953-Speed 18529.17 samples/sec Loss 13.3466 LearningRate 0.3941 Epoch: 2 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:20:41,373-Speed 18537.17 samples/sec Loss 13.3712 LearningRate 0.3940 Epoch: 2 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:20:45,761-Speed 18676.32 samples/sec Loss 13.3658 LearningRate 0.3939 Epoch: 2 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:20:50,190-Speed 18502.98 samples/sec Loss 13.3049 LearningRate 0.3938 Epoch: 2 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:20:54,601-Speed 18572.46 samples/sec Loss 13.3086 LearningRate 0.3937 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:20:59,002-Speed 18621.04 samples/sec Loss 13.3393 LearningRate 0.3937 Epoch: 2 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:21:03,450-Speed 18424.62 samples/sec Loss 13.3488 LearningRate 0.3936 Epoch: 2 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:21:07,912-Speed 18360.77 samples/sec Loss 13.2974 LearningRate 0.3935 Epoch: 2 Global Step: 11130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:12,340-Speed 18507.11 samples/sec Loss 13.3119 LearningRate 0.3934 Epoch: 2 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:16,734-Speed 18646.94 samples/sec Loss 13.3454 LearningRate 0.3933 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:21,112-Speed 18719.65 samples/sec Loss 13.3374 LearningRate 0.3932 Epoch: 2 Global Step: 11160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:25,525-Speed 18565.61 samples/sec Loss 13.3317 LearningRate 0.3932 Epoch: 2 Global Step: 11170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:30,021-Speed 18225.45 samples/sec Loss 13.3286 LearningRate 0.3931 Epoch: 2 Global Step: 11180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:34,442-Speed 18534.55 samples/sec Loss 13.3319 LearningRate 0.3930 Epoch: 2 Global Step: 11190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:38,870-Speed 18505.29 samples/sec Loss 13.2872 LearningRate 0.3929 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:43,245-Speed 18730.05 samples/sec Loss 13.2674 LearningRate 0.3928 Epoch: 2 Global Step: 11210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:47,660-Speed 18556.56 samples/sec Loss 13.2014 LearningRate 0.3927 Epoch: 2 Global Step: 11220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:21:52,073-Speed 18571.25 samples/sec Loss 13.2176 LearningRate 0.3926 Epoch: 2 Global Step: 11230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:21:56,465-Speed 18657.27 samples/sec Loss 13.2568 LearningRate 0.3926 Epoch: 2 Global Step: 11240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:00,855-Speed 18663.79 samples/sec Loss 13.2612 LearningRate 0.3925 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:05,246-Speed 18662.18 samples/sec Loss 13.1972 LearningRate 0.3924 Epoch: 2 Global Step: 11260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:09,667-Speed 18534.83 samples/sec Loss 13.2240 LearningRate 0.3923 Epoch: 2 Global Step: 11270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:14,066-Speed 18627.98 samples/sec Loss 13.2286 LearningRate 0.3922 Epoch: 2 Global Step: 11280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:18,512-Speed 18427.17 samples/sec Loss 13.2263 LearningRate 0.3921 Epoch: 2 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:22,960-Speed 18430.07 samples/sec Loss 13.2147 LearningRate 0.3920 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:27,361-Speed 18620.51 samples/sec Loss 13.1614 LearningRate 0.3920 Epoch: 2 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:31,782-Speed 18539.43 samples/sec Loss 13.1554 LearningRate 0.3919 Epoch: 2 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:22:36,192-Speed 18581.25 samples/sec Loss 13.2054 LearningRate 0.3918 Epoch: 2 Global Step: 11330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:22:40,580-Speed 18675.39 samples/sec Loss 13.1750 LearningRate 0.3917 Epoch: 2 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:22:44,968-Speed 18675.30 samples/sec Loss 13.1773 LearningRate 0.3916 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:22:49,405-Speed 18466.61 samples/sec Loss 13.1667 LearningRate 0.3915 Epoch: 2 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:22:53,827-Speed 18533.12 samples/sec Loss 13.1033 LearningRate 0.3915 Epoch: 2 Global Step: 11370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:22:58,219-Speed 18653.30 samples/sec Loss 13.1626 LearningRate 0.3914 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:23:02,626-Speed 18601.78 samples/sec Loss 13.1121 LearningRate 0.3913 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:23:07,088-Speed 18365.81 samples/sec Loss 13.1068 LearningRate 0.3912 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:23:11,491-Speed 18610.82 samples/sec Loss 13.0975 LearningRate 0.3911 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:23:15,894-Speed 18614.06 samples/sec Loss 13.0647 LearningRate 0.3910 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-13 23:23:20,330-Speed 18472.50 samples/sec Loss 13.0610 LearningRate 0.3909 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:24,716-Speed 18681.74 samples/sec Loss 13.1168 LearningRate 0.3909 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:29,153-Speed 18467.76 samples/sec Loss 13.1264 LearningRate 0.3908 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:33,596-Speed 18439.73 samples/sec Loss 13.0564 LearningRate 0.3907 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:38,002-Speed 18598.86 samples/sec Loss 13.0726 LearningRate 0.3906 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:42,431-Speed 18503.90 samples/sec Loss 13.0281 LearningRate 0.3905 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:46,863-Speed 18487.10 samples/sec Loss 13.0732 LearningRate 0.3904 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:51,307-Speed 18441.31 samples/sec Loss 13.0960 LearningRate 0.3904 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:23:55,750-Speed 18441.71 samples/sec Loss 13.0995 LearningRate 0.3903 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:00,128-Speed 18715.11 samples/sec Loss 13.0380 LearningRate 0.3902 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:04,559-Speed 18493.58 samples/sec Loss 13.0614 LearningRate 0.3901 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:24:09,079-Speed 18127.97 samples/sec Loss 13.0541 LearningRate 0.3900 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:24:13,495-Speed 18559.20 samples/sec Loss 13.0017 LearningRate 0.3899 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:24:17,934-Speed 18459.77 samples/sec Loss 12.9999 LearningRate 0.3898 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:24:22,346-Speed 18570.23 samples/sec Loss 12.9638 LearningRate 0.3898 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-13 23:24:26,752-Speed 18599.70 samples/sec Loss 12.9915 LearningRate 0.3897 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:31,131-Speed 18713.49 samples/sec Loss 12.9889 LearningRate 0.3896 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:35,530-Speed 18628.98 samples/sec Loss 12.9652 LearningRate 0.3895 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:39,904-Speed 18735.40 samples/sec Loss 12.9262 LearningRate 0.3894 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:44,294-Speed 18664.65 samples/sec Loss 12.8922 LearningRate 0.3893 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:48,698-Speed 18606.69 samples/sec Loss 13.0281 LearningRate 0.3893 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:53,103-Speed 18603.43 samples/sec Loss 13.0252 LearningRate 0.3892 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-13 23:24:57,522-Speed 18540.16 samples/sec Loss 12.9533 LearningRate 0.3891 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:01,920-Speed 18635.54 samples/sec Loss 12.9695 LearningRate 0.3890 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:06,323-Speed 18611.40 samples/sec Loss 12.9209 LearningRate 0.3889 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:10,709-Speed 18683.29 samples/sec Loss 12.9166 LearningRate 0.3888 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:25:15,090-Speed 18700.72 samples/sec Loss 12.9092 LearningRate 0.3887 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:19,492-Speed 18612.06 samples/sec Loss 12.9425 LearningRate 0.3887 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:23,879-Speed 18678.93 samples/sec Loss 12.9378 LearningRate 0.3886 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:28,276-Speed 18637.91 samples/sec Loss 12.9219 LearningRate 0.3885 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:32,668-Speed 18658.05 samples/sec Loss 12.9927 LearningRate 0.3884 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:37,071-Speed 18609.24 samples/sec Loss 12.8773 LearningRate 0.3883 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:41,488-Speed 18548.81 samples/sec Loss 12.8997 LearningRate 0.3882 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:45,948-Speed 18372.47 samples/sec Loss 12.9089 LearningRate 0.3882 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:25:50,444-Speed 18225.45 samples/sec Loss 12.8306 LearningRate 0.3881 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:25:54,835-Speed 18662.01 samples/sec Loss 12.8657 LearningRate 0.3880 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:25:59,234-Speed 18629.79 samples/sec Loss 12.8909 LearningRate 0.3879 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:03,650-Speed 18554.85 samples/sec Loss 12.9105 LearningRate 0.3878 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:08,089-Speed 18458.67 samples/sec Loss 12.8891 LearningRate 0.3877 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:12,519-Speed 18497.22 samples/sec Loss 12.8341 LearningRate 0.3876 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:16,924-Speed 18603.66 samples/sec Loss 12.8741 LearningRate 0.3876 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:21,333-Speed 18592.79 samples/sec Loss 12.8242 LearningRate 0.3875 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:25,727-Speed 18654.63 samples/sec Loss 12.8229 LearningRate 0.3874 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:30,131-Speed 18607.24 samples/sec Loss 12.8224 LearningRate 0.3873 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:26:34,534-Speed 18611.93 samples/sec Loss 12.8015 LearningRate 0.3872 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:26:38,964-Speed 18493.70 samples/sec Loss 12.8246 LearningRate 0.3871 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:26:43,351-Speed 18678.18 samples/sec Loss 12.8240 LearningRate 0.3871 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:26:47,763-Speed 18572.84 samples/sec Loss 12.7751 LearningRate 0.3870 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:26:52,222-Speed 18379.84 samples/sec Loss 12.7769 LearningRate 0.3869 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:26:56,728-Speed 18182.95 samples/sec Loss 12.7948 LearningRate 0.3868 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:27:01,209-Speed 18294.78 samples/sec Loss 12.7596 LearningRate 0.3867 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:27:05,671-Speed 18363.61 samples/sec Loss 12.8312 LearningRate 0.3866 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:27:10,062-Speed 18663.41 samples/sec Loss 12.7424 LearningRate 0.3866 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:27:14,495-Speed 18483.28 samples/sec Loss 12.7153 LearningRate 0.3865 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:27:18,913-Speed 18546.08 samples/sec Loss 12.7090 LearningRate 0.3864 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:27:23,325-Speed 18575.76 samples/sec Loss 12.7438 LearningRate 0.3863 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:27:27,776-Speed 18411.27 samples/sec Loss 12.7006 LearningRate 0.3862 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:27:32,236-Speed 18373.17 samples/sec Loss 12.7377 LearningRate 0.3861 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:27:36,671-Speed 18479.25 samples/sec Loss 12.7566 LearningRate 0.3860 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:27:41,063-Speed 18659.03 samples/sec Loss 12.7351 LearningRate 0.3860 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:27:45,510-Speed 18426.31 samples/sec Loss 12.6723 LearningRate 0.3859 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:27:49,905-Speed 18641.89 samples/sec Loss 12.6903 LearningRate 0.3858 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:27:54,303-Speed 18632.78 samples/sec Loss 12.6038 LearningRate 0.3857 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:27:58,686-Speed 18696.00 samples/sec Loss 12.7222 LearningRate 0.3856 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:28:03,087-Speed 18622.44 samples/sec Loss 12.7078 LearningRate 0.3855 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:28:07,536-Speed 18417.75 samples/sec Loss 12.6565 LearningRate 0.3855 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:28:11,963-Speed 18508.21 samples/sec Loss 12.6321 LearningRate 0.3854 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:28:16,383-Speed 18540.08 samples/sec Loss 12.6530 LearningRate 0.3853 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:28:20,787-Speed 18606.73 samples/sec Loss 12.6683 LearningRate 0.3852 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:28:25,250-Speed 18362.31 samples/sec Loss 12.6125 LearningRate 0.3851 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:28:29,685-Speed 18473.52 samples/sec Loss 12.6481 LearningRate 0.3850 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:28:34,097-Speed 18574.37 samples/sec Loss 12.5733 LearningRate 0.3850 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:28:38,505-Speed 18589.08 samples/sec Loss 12.5782 LearningRate 0.3849 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:28:42,953-Speed 18423.92 samples/sec Loss 12.6022 LearningRate 0.3848 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:28:47,406-Speed 18408.46 samples/sec Loss 12.6365 LearningRate 0.3847 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:28:51,836-Speed 18495.07 samples/sec Loss 12.5972 LearningRate 0.3846 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:28:56,263-Speed 18509.66 samples/sec Loss 12.5781 LearningRate 0.3845 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:29:00,741-Speed 18303.32 samples/sec Loss 12.5606 LearningRate 0.3844 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:29:05,157-Speed 18554.39 samples/sec Loss 12.5405 LearningRate 0.3844 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:29:09,593-Speed 18476.06 samples/sec Loss 12.5426 LearningRate 0.3843 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:29:14,041-Speed 18421.90 samples/sec Loss 12.5791 LearningRate 0.3842 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:29:18,415-Speed 18736.90 samples/sec Loss 12.5964 LearningRate 0.3841 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:22,819-Speed 18607.65 samples/sec Loss 12.6110 LearningRate 0.3840 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:27,271-Speed 18402.73 samples/sec Loss 12.5190 LearningRate 0.3839 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:31,692-Speed 18537.47 samples/sec Loss 12.5019 LearningRate 0.3839 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:36,170-Speed 18300.48 samples/sec Loss 12.5565 LearningRate 0.3838 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:40,582-Speed 18569.56 samples/sec Loss 12.5355 LearningRate 0.3837 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:45,056-Speed 18314.86 samples/sec Loss 12.5882 LearningRate 0.3836 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:49,490-Speed 18479.78 samples/sec Loss 12.5776 LearningRate 0.3835 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:54,037-Speed 18018.23 samples/sec Loss 12.5213 LearningRate 0.3834 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:29:58,461-Speed 18521.67 samples/sec Loss 12.4617 LearningRate 0.3834 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:30:02,890-Speed 18500.49 samples/sec Loss 12.5032 LearningRate 0.3833 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:30:07,350-Speed 18372.08 samples/sec Loss 12.4711 LearningRate 0.3832 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:30:11,788-Speed 18464.55 samples/sec Loss 12.4895 LearningRate 0.3831 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:30:16,189-Speed 18620.00 samples/sec Loss 12.5574 LearningRate 0.3830 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:30:20,585-Speed 18638.60 samples/sec Loss 12.4802 LearningRate 0.3829 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:30:24,997-Speed 18572.95 samples/sec Loss 12.4934 LearningRate 0.3829 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:30:29,397-Speed 18624.09 samples/sec Loss 12.4638 LearningRate 0.3828 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:30:33,828-Speed 18490.37 samples/sec Loss 12.4264 LearningRate 0.3827 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:30:38,228-Speed 18622.71 samples/sec Loss 12.4442 LearningRate 0.3826 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:30:42,694-Speed 18345.35 samples/sec Loss 12.4966 LearningRate 0.3825 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:30:47,130-Speed 18470.86 samples/sec Loss 12.4699 LearningRate 0.3824 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:30:51,598-Speed 18340.05 samples/sec Loss 12.4802 LearningRate 0.3823 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:30:56,022-Speed 18524.34 samples/sec Loss 12.4397 LearningRate 0.3823 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:31:00,403-Speed 18701.12 samples/sec Loss 12.3727 LearningRate 0.3822 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:31:04,791-Speed 18676.12 samples/sec Loss 12.3726 LearningRate 0.3821 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:31:09,252-Speed 18367.25 samples/sec Loss 12.3960 LearningRate 0.3820 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:31:13,624-Speed 18743.99 samples/sec Loss 12.3802 LearningRate 0.3819 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:18,035-Speed 18573.25 samples/sec Loss 12.4584 LearningRate 0.3818 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:22,417-Speed 18701.82 samples/sec Loss 12.4351 LearningRate 0.3818 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:26,799-Speed 18701.71 samples/sec Loss 12.4607 LearningRate 0.3817 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:31,194-Speed 18645.07 samples/sec Loss 12.4080 LearningRate 0.3816 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:35,622-Speed 18504.02 samples/sec Loss 12.4701 LearningRate 0.3815 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:40,027-Speed 18599.82 samples/sec Loss 12.3388 LearningRate 0.3814 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:44,426-Speed 18627.93 samples/sec Loss 12.3545 LearningRate 0.3813 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:48,834-Speed 18585.20 samples/sec Loss 12.3204 LearningRate 0.3813 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:53,250-Speed 18560.99 samples/sec Loss 12.3653 LearningRate 0.3812 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:31:57,666-Speed 18552.11 samples/sec Loss 12.3410 LearningRate 0.3811 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:32:02,100-Speed 18481.74 samples/sec Loss 12.3567 LearningRate 0.3810 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:06,501-Speed 18614.20 samples/sec Loss 12.3560 LearningRate 0.3809 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:10,912-Speed 18577.71 samples/sec Loss 12.3944 LearningRate 0.3808 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:15,308-Speed 18642.46 samples/sec Loss 12.3342 LearningRate 0.3808 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:19,705-Speed 18636.99 samples/sec Loss 12.3135 LearningRate 0.3807 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:24,153-Speed 18426.78 samples/sec Loss 12.3337 LearningRate 0.3806 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:28,629-Speed 18308.80 samples/sec Loss 12.3432 LearningRate 0.3805 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:33,069-Speed 18457.75 samples/sec Loss 12.3157 LearningRate 0.3804 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:40,091-Speed 11668.53 samples/sec Loss 12.3012 LearningRate 0.3803 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:44,488-Speed 18634.27 samples/sec Loss 12.3191 LearningRate 0.3803 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:32:48,972-Speed 18283.33 samples/sec Loss 12.2897 LearningRate 0.3802 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:32:53,374-Speed 18617.02 samples/sec Loss 12.2944 LearningRate 0.3801 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:32:57,764-Speed 18668.29 samples/sec Loss 12.3188 LearningRate 0.3800 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:33:02,193-Speed 18502.36 samples/sec Loss 12.2833 LearningRate 0.3799 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:06,616-Speed 18541.20 samples/sec Loss 12.2854 LearningRate 0.3798 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:11,051-Speed 18473.77 samples/sec Loss 12.2506 LearningRate 0.3798 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:15,464-Speed 18568.88 samples/sec Loss 12.2329 LearningRate 0.3797 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:19,919-Speed 18394.75 samples/sec Loss 12.2790 LearningRate 0.3796 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:24,382-Speed 18360.33 samples/sec Loss 12.2632 LearningRate 0.3795 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:28,842-Speed 18372.42 samples/sec Loss 12.2808 LearningRate 0.3794 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:33,287-Speed 18438.06 samples/sec Loss 12.2412 LearningRate 0.3793 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:37,731-Speed 18440.85 samples/sec Loss 12.2210 LearningRate 0.3793 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:42,181-Speed 18412.13 samples/sec Loss 12.2382 LearningRate 0.3792 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:33:46,558-Speed 18725.76 samples/sec Loss 12.2869 LearningRate 0.3791 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:33:51,015-Speed 18383.01 samples/sec Loss 12.2877 LearningRate 0.3790 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:33:55,426-Speed 18581.68 samples/sec Loss 12.2116 LearningRate 0.3789 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:33:59,828-Speed 18614.86 samples/sec Loss 12.2353 LearningRate 0.3788 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:34:04,314-Speed 18266.65 samples/sec Loss 12.1521 LearningRate 0.3788 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:34:08,749-Speed 18478.71 samples/sec Loss 12.1652 LearningRate 0.3787 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:34:13,150-Speed 18623.19 samples/sec Loss 12.1408 LearningRate 0.3786 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:34:17,560-Speed 18587.25 samples/sec Loss 12.1887 LearningRate 0.3785 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:34:21,980-Speed 18541.09 samples/sec Loss 12.1877 LearningRate 0.3784 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:34:26,391-Speed 18576.70 samples/sec Loss 12.1815 LearningRate 0.3783 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:34:30,819-Speed 18503.58 samples/sec Loss 12.1720 LearningRate 0.3783 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:34:35,211-Speed 18655.80 samples/sec Loss 12.1564 LearningRate 0.3782 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:34:39,613-Speed 18615.04 samples/sec Loss 12.1859 LearningRate 0.3781 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:34:44,052-Speed 18463.14 samples/sec Loss 12.1164 LearningRate 0.3780 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:34:48,473-Speed 18528.91 samples/sec Loss 12.1743 LearningRate 0.3779 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:34:52,840-Speed 18769.79 samples/sec Loss 12.1758 LearningRate 0.3778 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:34:57,230-Speed 18661.93 samples/sec Loss 12.2012 LearningRate 0.3778 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:01,661-Speed 18493.60 samples/sec Loss 12.1297 LearningRate 0.3777 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:06,069-Speed 18589.00 samples/sec Loss 12.1469 LearningRate 0.3776 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:10,539-Speed 18330.21 samples/sec Loss 12.1395 LearningRate 0.3775 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:15,009-Speed 18332.15 samples/sec Loss 12.1568 LearningRate 0.3774 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:35:19,473-Speed 18357.42 samples/sec Loss 12.1484 LearningRate 0.3773 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:35:23,911-Speed 18466.63 samples/sec Loss 12.1189 LearningRate 0.3773 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:28,364-Speed 18401.75 samples/sec Loss 12.1118 LearningRate 0.3772 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:32,794-Speed 18501.64 samples/sec Loss 12.0649 LearningRate 0.3771 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:37,220-Speed 18513.12 samples/sec Loss 12.0673 LearningRate 0.3770 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:41,633-Speed 18564.43 samples/sec Loss 12.1197 LearningRate 0.3769 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:46,022-Speed 18674.61 samples/sec Loss 12.0923 LearningRate 0.3768 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:50,446-Speed 18525.78 samples/sec Loss 12.1062 LearningRate 0.3768 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:54,868-Speed 18531.99 samples/sec Loss 12.1255 LearningRate 0.3767 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:35:59,265-Speed 18637.47 samples/sec Loss 12.1018 LearningRate 0.3766 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:36:03,702-Speed 18465.63 samples/sec Loss 12.1022 LearningRate 0.3765 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:36:08,211-Speed 18178.13 samples/sec Loss 12.0817 LearningRate 0.3764 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:36:12,679-Speed 18340.10 samples/sec Loss 12.1199 LearningRate 0.3763 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:36:17,117-Speed 18463.69 samples/sec Loss 12.0642 LearningRate 0.3763 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:36:21,569-Speed 18402.59 samples/sec Loss 12.0120 LearningRate 0.3762 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:25,998-Speed 18503.23 samples/sec Loss 12.1082 LearningRate 0.3761 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:30,426-Speed 18503.65 samples/sec Loss 12.0496 LearningRate 0.3760 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:34,851-Speed 18521.42 samples/sec Loss 12.0577 LearningRate 0.3759 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:39,288-Speed 18468.07 samples/sec Loss 12.0471 LearningRate 0.3758 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:43,737-Speed 18418.54 samples/sec Loss 12.0306 LearningRate 0.3758 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:48,118-Speed 18703.25 samples/sec Loss 12.0706 LearningRate 0.3757 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:52,540-Speed 18533.18 samples/sec Loss 12.0104 LearningRate 0.3756 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:36:56,957-Speed 18551.08 samples/sec Loss 11.9871 LearningRate 0.3755 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:01,392-Speed 18473.57 samples/sec Loss 12.0392 LearningRate 0.3754 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:05,832-Speed 18456.96 samples/sec Loss 12.0380 LearningRate 0.3753 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:37:10,235-Speed 18610.04 samples/sec Loss 12.0120 LearningRate 0.3753 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:37:14,630-Speed 18664.22 samples/sec Loss 11.9764 LearningRate 0.3752 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:37:21,640-Speed 11688.17 samples/sec Loss 11.9901 LearningRate 0.3751 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:26,049-Speed 18586.04 samples/sec Loss 11.9983 LearningRate 0.3750 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:30,466-Speed 18553.80 samples/sec Loss 11.9747 LearningRate 0.3749 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:34,856-Speed 18669.61 samples/sec Loss 11.9501 LearningRate 0.3748 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:39,299-Speed 18449.97 samples/sec Loss 11.9936 LearningRate 0.3748 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:43,698-Speed 18627.64 samples/sec Loss 11.9631 LearningRate 0.3747 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:48,130-Speed 18490.29 samples/sec Loss 11.9741 LearningRate 0.3746 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:52,551-Speed 18536.89 samples/sec Loss 11.9470 LearningRate 0.3745 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:37:56,987-Speed 18472.44 samples/sec Loss 11.9538 LearningRate 0.3744 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:38:01,459-Speed 18324.73 samples/sec Loss 11.9646 LearningRate 0.3743 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:38:05,889-Speed 18496.84 samples/sec Loss 11.9479 LearningRate 0.3743 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:10,284-Speed 18646.66 samples/sec Loss 11.9618 LearningRate 0.3742 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:14,771-Speed 18261.79 samples/sec Loss 11.9637 LearningRate 0.3741 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:19,206-Speed 18474.65 samples/sec Loss 11.9097 LearningRate 0.3740 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:23,654-Speed 18428.87 samples/sec Loss 11.9361 LearningRate 0.3739 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:28,099-Speed 18435.30 samples/sec Loss 11.9129 LearningRate 0.3738 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:32,548-Speed 18420.96 samples/sec Loss 11.8560 LearningRate 0.3738 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:36,969-Speed 18532.82 samples/sec Loss 11.8679 LearningRate 0.3737 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:41,393-Speed 18520.18 samples/sec Loss 11.8948 LearningRate 0.3736 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:45,841-Speed 18428.03 samples/sec Loss 11.8787 LearningRate 0.3735 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:38:50,251-Speed 18584.27 samples/sec Loss 11.8946 LearningRate 0.3734 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:38:54,701-Speed 18414.70 samples/sec Loss 11.8400 LearningRate 0.3734 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:38:59,116-Speed 18562.59 samples/sec Loss 11.8914 LearningRate 0.3733 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:39:03,561-Speed 18433.75 samples/sec Loss 11.8747 LearningRate 0.3732 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:39:08,010-Speed 18415.07 samples/sec Loss 11.9006 LearningRate 0.3731 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:39:12,462-Speed 18402.43 samples/sec Loss 11.9008 LearningRate 0.3730 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:39:16,904-Speed 18450.42 samples/sec Loss 11.8314 LearningRate 0.3729 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:39:21,325-Speed 18535.09 samples/sec Loss 11.8752 LearningRate 0.3729 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:39:25,770-Speed 18433.48 samples/sec Loss 11.8952 LearningRate 0.3728 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:39:30,210-Speed 18456.24 samples/sec Loss 11.8091 LearningRate 0.3727 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:39:34,659-Speed 18418.34 samples/sec Loss 11.8416 LearningRate 0.3726 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:39:39,107-Speed 18418.96 samples/sec Loss 11.8585 LearningRate 0.3725 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:39:43,559-Speed 18411.99 samples/sec Loss 11.8129 LearningRate 0.3724 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:39:47,996-Speed 18470.57 samples/sec Loss 11.8231 LearningRate 0.3724 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:39:52,455-Speed 18381.44 samples/sec Loss 11.8261 LearningRate 0.3723 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:39:56,921-Speed 18349.54 samples/sec Loss 11.7752 LearningRate 0.3722 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:40:01,396-Speed 18314.21 samples/sec Loss 11.7900 LearningRate 0.3721 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:40:05,832-Speed 18471.43 samples/sec Loss 11.7934 LearningRate 0.3720 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:40:10,277-Speed 18431.70 samples/sec Loss 11.8352 LearningRate 0.3719 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:40:14,718-Speed 18453.34 samples/sec Loss 11.8237 LearningRate 0.3719 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:40:19,137-Speed 18542.41 samples/sec Loss 11.7978 LearningRate 0.3718 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:23,614-Speed 18303.88 samples/sec Loss 11.7490 LearningRate 0.3717 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:28,031-Speed 18553.93 samples/sec Loss 11.8414 LearningRate 0.3716 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:32,465-Speed 18478.74 samples/sec Loss 11.7316 LearningRate 0.3715 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:36,911-Speed 18434.02 samples/sec Loss 11.7964 LearningRate 0.3714 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:41,345-Speed 18478.22 samples/sec Loss 11.7809 LearningRate 0.3714 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:45,788-Speed 18444.54 samples/sec Loss 11.7475 LearningRate 0.3713 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:50,242-Speed 18396.19 samples/sec Loss 11.7559 LearningRate 0.3712 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:54,680-Speed 18466.53 samples/sec Loss 11.7573 LearningRate 0.3711 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:40:59,115-Speed 18473.15 samples/sec Loss 11.7396 LearningRate 0.3710 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:03,586-Speed 18328.84 samples/sec Loss 11.7535 LearningRate 0.3710 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:41:08,029-Speed 18441.64 samples/sec Loss 11.7573 LearningRate 0.3709 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:41:12,549-Speed 18129.63 samples/sec Loss 11.7312 LearningRate 0.3708 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:41:17,024-Speed 18312.26 samples/sec Loss 11.7143 LearningRate 0.3707 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:41:21,443-Speed 18540.81 samples/sec Loss 11.7079 LearningRate 0.3706 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:41:25,857-Speed 18564.19 samples/sec Loss 11.6891 LearningRate 0.3705 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:30,304-Speed 18426.85 samples/sec Loss 11.7405 LearningRate 0.3705 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:34,772-Speed 18343.96 samples/sec Loss 11.6918 LearningRate 0.3704 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:39,245-Speed 18319.62 samples/sec Loss 11.6676 LearningRate 0.3703 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:43,712-Speed 18340.81 samples/sec Loss 11.6709 LearningRate 0.3702 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:48,179-Speed 18345.08 samples/sec Loss 11.7163 LearningRate 0.3701 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:52,609-Speed 18496.46 samples/sec Loss 11.7068 LearningRate 0.3700 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:41:57,028-Speed 18543.48 samples/sec Loss 11.6904 LearningRate 0.3700 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:42:01,519-Speed 18245.25 samples/sec Loss 11.6797 LearningRate 0.3699 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:42:08,985-Speed 10973.03 samples/sec Loss 11.6427 LearningRate 0.3698 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:42:13,438-Speed 18400.37 samples/sec Loss 11.6762 LearningRate 0.3697 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:17,868-Speed 18497.62 samples/sec Loss 11.6517 LearningRate 0.3696 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:22,296-Speed 18507.90 samples/sec Loss 11.6248 LearningRate 0.3695 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:26,729-Speed 18481.59 samples/sec Loss 11.6856 LearningRate 0.3695 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:31,166-Speed 18470.29 samples/sec Loss 11.6776 LearningRate 0.3694 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:35,633-Speed 18343.46 samples/sec Loss 11.6635 LearningRate 0.3693 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:40,167-Speed 18072.48 samples/sec Loss 11.5977 LearningRate 0.3692 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:44,626-Speed 18378.10 samples/sec Loss 11.6203 LearningRate 0.3691 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:49,077-Speed 18410.21 samples/sec Loss 11.6478 LearningRate 0.3691 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:53,493-Speed 18556.69 samples/sec Loss 11.6319 LearningRate 0.3690 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:42:57,914-Speed 18531.49 samples/sec Loss 11.6412 LearningRate 0.3689 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:43:02,346-Speed 18498.30 samples/sec Loss 11.6110 LearningRate 0.3688 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:43:06,760-Speed 18573.25 samples/sec Loss 11.6194 LearningRate 0.3687 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:43:11,247-Speed 18261.27 samples/sec Loss 11.6855 LearningRate 0.3686 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:43:15,691-Speed 18442.85 samples/sec Loss 11.5920 LearningRate 0.3686 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:43:20,140-Speed 18417.25 samples/sec Loss 11.6192 LearningRate 0.3685 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:43:24,550-Speed 18579.55 samples/sec Loss 11.5570 LearningRate 0.3684 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:28,954-Speed 18610.05 samples/sec Loss 11.5694 LearningRate 0.3683 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:33,401-Speed 18431.06 samples/sec Loss 11.5928 LearningRate 0.3682 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:37,831-Speed 18494.56 samples/sec Loss 11.6147 LearningRate 0.3682 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:42,247-Speed 18552.62 samples/sec Loss 11.6136 LearningRate 0.3681 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:46,646-Speed 18628.60 samples/sec Loss 11.6234 LearningRate 0.3680 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:51,101-Speed 18393.41 samples/sec Loss 11.5067 LearningRate 0.3679 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:55,488-Speed 18678.80 samples/sec Loss 11.5540 LearningRate 0.3678 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:43:59,869-Speed 18702.20 samples/sec Loss 11.5803 LearningRate 0.3677 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:44:04,261-Speed 18660.36 samples/sec Loss 11.5271 LearningRate 0.3677 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:44:08,657-Speed 18644.20 samples/sec Loss 11.5308 LearningRate 0.3676 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:13,084-Speed 18513.64 samples/sec Loss 11.5570 LearningRate 0.3675 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:17,482-Speed 18634.99 samples/sec Loss 11.5259 LearningRate 0.3674 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:21,855-Speed 18735.54 samples/sec Loss 11.5257 LearningRate 0.3673 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:26,285-Speed 18493.45 samples/sec Loss 11.5276 LearningRate 0.3672 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:30,694-Speed 18592.21 samples/sec Loss 11.5572 LearningRate 0.3672 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:35,084-Speed 18667.92 samples/sec Loss 11.5183 LearningRate 0.3671 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:39,509-Speed 18517.21 samples/sec Loss 11.4901 LearningRate 0.3670 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:43,940-Speed 18493.66 samples/sec Loss 11.5752 LearningRate 0.3669 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:48,376-Speed 18474.38 samples/sec Loss 11.5290 LearningRate 0.3668 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:44:53,377-Speed 16384.11 samples/sec Loss 11.5481 LearningRate 0.3668 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:44:57,776-Speed 18628.22 samples/sec Loss 11.5260 LearningRate 0.3667 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:45:02,188-Speed 18570.79 samples/sec Loss 11.4724 LearningRate 0.3666 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:06,634-Speed 18433.14 samples/sec Loss 11.4953 LearningRate 0.3665 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:11,025-Speed 18658.02 samples/sec Loss 11.4931 LearningRate 0.3664 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:15,477-Speed 18412.93 samples/sec Loss 11.5626 LearningRate 0.3663 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:19,933-Speed 18393.90 samples/sec Loss 11.5493 LearningRate 0.3663 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:24,378-Speed 18442.02 samples/sec Loss 11.5199 LearningRate 0.3662 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:28,899-Speed 18129.30 samples/sec Loss 11.5388 LearningRate 0.3661 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:33,320-Speed 18535.82 samples/sec Loss 11.5013 LearningRate 0.3660 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:45:37,729-Speed 18584.04 samples/sec Loss 11.5439 LearningRate 0.3659 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:45:42,186-Speed 18384.89 samples/sec Loss 11.4828 LearningRate 0.3659 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:45:46,632-Speed 18435.99 samples/sec Loss 11.4970 LearningRate 0.3658 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:45:51,040-Speed 18591.65 samples/sec Loss 11.4740 LearningRate 0.3657 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:45:55,468-Speed 18510.78 samples/sec Loss 11.4424 LearningRate 0.3656 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:45:59,853-Speed 18684.93 samples/sec Loss 11.4471 LearningRate 0.3655 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:46:04,245-Speed 18658.94 samples/sec Loss 11.4196 LearningRate 0.3654 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:46:08,665-Speed 18541.22 samples/sec Loss 11.4510 LearningRate 0.3654 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:46:13,068-Speed 18609.22 samples/sec Loss 11.4404 LearningRate 0.3653 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:46:17,505-Speed 18468.01 samples/sec Loss 11.3965 LearningRate 0.3652 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:46:21,923-Speed 18547.10 samples/sec Loss 11.4080 LearningRate 0.3651 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:26,354-Speed 18495.06 samples/sec Loss 11.4115 LearningRate 0.3650 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:30,762-Speed 18588.71 samples/sec Loss 11.3700 LearningRate 0.3649 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:35,143-Speed 18707.03 samples/sec Loss 11.3707 LearningRate 0.3649 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:39,566-Speed 18523.80 samples/sec Loss 11.4116 LearningRate 0.3648 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:43,959-Speed 18652.00 samples/sec Loss 11.4016 LearningRate 0.3647 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:48,405-Speed 18432.46 samples/sec Loss 11.4191 LearningRate 0.3646 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:52,846-Speed 18452.81 samples/sec Loss 11.4127 LearningRate 0.3645 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:46:57,295-Speed 18419.26 samples/sec Loss 11.3891 LearningRate 0.3645 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:47:01,756-Speed 18369.09 samples/sec Loss 11.3683 LearningRate 0.3644 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:47:06,159-Speed 18609.05 samples/sec Loss 11.4430 LearningRate 0.3643 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:47:10,568-Speed 18584.05 samples/sec Loss 11.4410 LearningRate 0.3642 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:47:15,004-Speed 18472.35 samples/sec Loss 11.3602 LearningRate 0.3641 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:47:19,404-Speed 18623.79 samples/sec Loss 11.3430 LearningRate 0.3640 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:47:23,861-Speed 18385.55 samples/sec Loss 11.3815 LearningRate 0.3640 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:28,275-Speed 18567.39 samples/sec Loss 11.3729 LearningRate 0.3639 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:32,711-Speed 18476.00 samples/sec Loss 11.3271 LearningRate 0.3638 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:37,163-Speed 18407.41 samples/sec Loss 11.3676 LearningRate 0.3637 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:41,610-Speed 18423.55 samples/sec Loss 11.3560 LearningRate 0.3636 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:46,059-Speed 18416.39 samples/sec Loss 11.3938 LearningRate 0.3636 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:50,495-Speed 18474.49 samples/sec Loss 11.2879 LearningRate 0.3635 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:54,884-Speed 18669.24 samples/sec Loss 11.2974 LearningRate 0.3634 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:47:59,267-Speed 18695.61 samples/sec Loss 11.2801 LearningRate 0.3633 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:48:03,700-Speed 18482.41 samples/sec Loss 11.3150 LearningRate 0.3632 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:48:08,183-Speed 18279.93 samples/sec Loss 11.3684 LearningRate 0.3632 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:48:12,607-Speed 18524.96 samples/sec Loss 11.2997 LearningRate 0.3631 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:48:17,067-Speed 18371.22 samples/sec Loss 11.3381 LearningRate 0.3630 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:48:21,465-Speed 18629.99 samples/sec Loss 11.2983 LearningRate 0.3629 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:48:25,879-Speed 18566.35 samples/sec Loss 11.3230 LearningRate 0.3628 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:48:30,281-Speed 18615.10 samples/sec Loss 11.3697 LearningRate 0.3627 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:48:34,681-Speed 18624.87 samples/sec Loss 11.2999 LearningRate 0.3627 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:48:39,107-Speed 18513.54 samples/sec Loss 11.2546 LearningRate 0.3626 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:48:43,528-Speed 18538.99 samples/sec Loss 11.3710 LearningRate 0.3625 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:48:47,967-Speed 18467.58 samples/sec Loss 11.3168 LearningRate 0.3624 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:48:52,387-Speed 18538.40 samples/sec Loss 11.2503 LearningRate 0.3623 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:48:56,794-Speed 18590.96 samples/sec Loss 11.3001 LearningRate 0.3623 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:49:01,192-Speed 18630.61 samples/sec Loss 11.2981 LearningRate 0.3622 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:49:05,598-Speed 18601.01 samples/sec Loss 11.3008 LearningRate 0.3621 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:49:10,032-Speed 18480.97 samples/sec Loss 11.2748 LearningRate 0.3620 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:49:14,462-Speed 18495.23 samples/sec Loss 11.3152 LearningRate 0.3619 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:49:18,988-Speed 18102.96 samples/sec Loss 11.2908 LearningRate 0.3618 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:23,444-Speed 18391.11 samples/sec Loss 11.2681 LearningRate 0.3618 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:27,919-Speed 18314.75 samples/sec Loss 11.2266 LearningRate 0.3617 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:32,334-Speed 18561.32 samples/sec Loss 11.2307 LearningRate 0.3616 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:36,764-Speed 18502.84 samples/sec Loss 11.2382 LearningRate 0.3615 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:41,216-Speed 18411.81 samples/sec Loss 11.2956 LearningRate 0.3614 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:45,614-Speed 18628.87 samples/sec Loss 11.2529 LearningRate 0.3614 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:50,002-Speed 18674.36 samples/sec Loss 11.2475 LearningRate 0.3613 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:54,407-Speed 18603.11 samples/sec Loss 11.2375 LearningRate 0.3612 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:49:58,883-Speed 18304.72 samples/sec Loss 11.2709 LearningRate 0.3611 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:50:03,287-Speed 18608.48 samples/sec Loss 11.2570 LearningRate 0.3610 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:07,696-Speed 18587.54 samples/sec Loss 11.2301 LearningRate 0.3609 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:12,089-Speed 18653.91 samples/sec Loss 11.2149 LearningRate 0.3609 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:16,499-Speed 18579.92 samples/sec Loss 11.1978 LearningRate 0.3608 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:20,939-Speed 18459.66 samples/sec Loss 11.1679 LearningRate 0.3607 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:25,354-Speed 18560.88 samples/sec Loss 11.1431 LearningRate 0.3606 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:29,817-Speed 18358.54 samples/sec Loss 11.2280 LearningRate 0.3605 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:34,255-Speed 18469.22 samples/sec Loss 11.1227 LearningRate 0.3605 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:38,711-Speed 18388.38 samples/sec Loss 11.1927 LearningRate 0.3604 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:43,168-Speed 18385.98 samples/sec Loss 11.1498 LearningRate 0.3603 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:50:47,591-Speed 18532.13 samples/sec Loss 11.2079 LearningRate 0.3602 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:50:51,995-Speed 18606.63 samples/sec Loss 11.1870 LearningRate 0.3601 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:50:56,431-Speed 18471.89 samples/sec Loss 11.2057 LearningRate 0.3601 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:51:00,892-Speed 18370.50 samples/sec Loss 11.2063 LearningRate 0.3600 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:51:05,359-Speed 18342.41 samples/sec Loss 11.1793 LearningRate 0.3599 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:09,787-Speed 18509.34 samples/sec Loss 11.1468 LearningRate 0.3598 Epoch: 2 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:14,288-Speed 18202.22 samples/sec Loss 11.1729 LearningRate 0.3597 Epoch: 2 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:18,715-Speed 18509.76 samples/sec Loss 11.1913 LearningRate 0.3596 Epoch: 2 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:23,128-Speed 18571.07 samples/sec Loss 11.1525 LearningRate 0.3596 Epoch: 2 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:27,580-Speed 18402.85 samples/sec Loss 11.1505 LearningRate 0.3595 Epoch: 2 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:32,017-Speed 18476.05 samples/sec Loss 11.1454 LearningRate 0.3594 Epoch: 2 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:36,421-Speed 18607.78 samples/sec Loss 11.1234 LearningRate 0.3593 Epoch: 2 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:40,857-Speed 18472.93 samples/sec Loss 11.1628 LearningRate 0.3592 Epoch: 2 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:45,292-Speed 18477.03 samples/sec Loss 11.0831 LearningRate 0.3592 Epoch: 2 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:49,806-Speed 18154.32 samples/sec Loss 11.1133 LearningRate 0.3591 Epoch: 2 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:54,237-Speed 18499.51 samples/sec Loss 11.0799 LearningRate 0.3590 Epoch: 2 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:51:58,626-Speed 18673.95 samples/sec Loss 11.1677 LearningRate 0.3589 Epoch: 2 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:52:03,085-Speed 18381.15 samples/sec Loss 11.1224 LearningRate 0.3588 Epoch: 2 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:52:07,493-Speed 18592.26 samples/sec Loss 11.1567 LearningRate 0.3588 Epoch: 2 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:52:11,932-Speed 18460.66 samples/sec Loss 11.1589 LearningRate 0.3587 Epoch: 2 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:52:16,351-Speed 18539.89 samples/sec Loss 11.1596 LearningRate 0.3586 Epoch: 2 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:52:20,860-Speed 18177.82 samples/sec Loss 11.1101 LearningRate 0.3585 Epoch: 2 Global Step: 15340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:52:25,308-Speed 18425.36 samples/sec Loss 11.0907 LearningRate 0.3584 Epoch: 2 Global Step: 15350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:52:29,790-Speed 18281.42 samples/sec Loss 11.1244 LearningRate 0.3583 Epoch: 2 Global Step: 15360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:52:34,185-Speed 18644.31 samples/sec Loss 11.1287 LearningRate 0.3583 Epoch: 2 Global Step: 15370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:52:38,609-Speed 18526.04 samples/sec Loss 11.0467 LearningRate 0.3582 Epoch: 2 Global Step: 15380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:52:43,041-Speed 18485.19 samples/sec Loss 11.1338 LearningRate 0.3581 Epoch: 2 Global Step: 15390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:52:47,549-Speed 18180.57 samples/sec Loss 11.1298 LearningRate 0.3580 Epoch: 2 Global Step: 15400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:52:51,974-Speed 18515.58 samples/sec Loss 11.0076 LearningRate 0.3579 Epoch: 2 Global Step: 15410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:52:56,354-Speed 18709.01 samples/sec Loss 11.0921 LearningRate 0.3579 Epoch: 2 Global Step: 15420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:53:00,794-Speed 18455.30 samples/sec Loss 11.1316 LearningRate 0.3578 Epoch: 2 Global Step: 15430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:53:05,191-Speed 18636.04 samples/sec Loss 11.1302 LearningRate 0.3577 Epoch: 2 Global Step: 15440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-13 23:53:09,590-Speed 18626.77 samples/sec Loss 11.0761 LearningRate 0.3576 Epoch: 2 Global Step: 15450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:13,980-Speed 18668.00 samples/sec Loss 11.0566 LearningRate 0.3575 Epoch: 2 Global Step: 15460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:18,395-Speed 18559.97 samples/sec Loss 11.0069 LearningRate 0.3575 Epoch: 2 Global Step: 15470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:22,765-Speed 18760.03 samples/sec Loss 11.0618 LearningRate 0.3574 Epoch: 2 Global Step: 15480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:27,170-Speed 18607.77 samples/sec Loss 11.0398 LearningRate 0.3573 Epoch: 2 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:31,568-Speed 18630.61 samples/sec Loss 11.0650 LearningRate 0.3572 Epoch: 2 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:35,951-Speed 18695.84 samples/sec Loss 11.0219 LearningRate 0.3571 Epoch: 2 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:40,369-Speed 18549.81 samples/sec Loss 11.0583 LearningRate 0.3570 Epoch: 2 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:44,849-Speed 18290.30 samples/sec Loss 11.0866 LearningRate 0.3570 Epoch: 2 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:49,239-Speed 18674.84 samples/sec Loss 11.0796 LearningRate 0.3569 Epoch: 2 Global Step: 15540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:53:53,677-Speed 18461.05 samples/sec Loss 11.0583 LearningRate 0.3568 Epoch: 2 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:54:11,980-Speed 4476.13 samples/sec Loss 11.0253 LearningRate 0.3567 Epoch: 3 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:54:16,372-Speed 18659.23 samples/sec Loss 11.0281 LearningRate 0.3566 Epoch: 3 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:54:20,752-Speed 18709.73 samples/sec Loss 11.0433 LearningRate 0.3566 Epoch: 3 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:54:25,181-Speed 18500.54 samples/sec Loss 10.9564 LearningRate 0.3565 Epoch: 3 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:54:29,602-Speed 18535.65 samples/sec Loss 10.9363 LearningRate 0.3564 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:54:34,061-Speed 18378.08 samples/sec Loss 11.0171 LearningRate 0.3563 Epoch: 3 Global Step: 15610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:54:38,484-Speed 18530.35 samples/sec Loss 11.0003 LearningRate 0.3562 Epoch: 3 Global Step: 15620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:54:42,871-Speed 18678.19 samples/sec Loss 10.9911 LearningRate 0.3562 Epoch: 3 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:54:47,263-Speed 18665.08 samples/sec Loss 10.9721 LearningRate 0.3561 Epoch: 3 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:54:51,657-Speed 18650.28 samples/sec Loss 11.0056 LearningRate 0.3560 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:54:56,044-Speed 18685.46 samples/sec Loss 10.9937 LearningRate 0.3559 Epoch: 3 Global Step: 15660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:55:00,453-Speed 18586.17 samples/sec Loss 10.9872 LearningRate 0.3558 Epoch: 3 Global Step: 15670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:55:04,846-Speed 18651.15 samples/sec Loss 10.9766 LearningRate 0.3558 Epoch: 3 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:55:09,244-Speed 18634.04 samples/sec Loss 10.9726 LearningRate 0.3557 Epoch: 3 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:55:13,613-Speed 18755.65 samples/sec Loss 10.9524 LearningRate 0.3556 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:55:18,009-Speed 18638.00 samples/sec Loss 10.9823 LearningRate 0.3555 Epoch: 3 Global Step: 15710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:22,388-Speed 18715.78 samples/sec Loss 10.9793 LearningRate 0.3554 Epoch: 3 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:26,798-Speed 18578.54 samples/sec Loss 11.0114 LearningRate 0.3554 Epoch: 3 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:31,232-Speed 18478.78 samples/sec Loss 10.9691 LearningRate 0.3553 Epoch: 3 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:35,646-Speed 18565.08 samples/sec Loss 10.9874 LearningRate 0.3552 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:40,089-Speed 18442.85 samples/sec Loss 10.9622 LearningRate 0.3551 Epoch: 3 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:44,529-Speed 18457.95 samples/sec Loss 10.9716 LearningRate 0.3550 Epoch: 3 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:48,953-Speed 18520.87 samples/sec Loss 10.9917 LearningRate 0.3549 Epoch: 3 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:53,465-Speed 18165.65 samples/sec Loss 10.9443 LearningRate 0.3549 Epoch: 3 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:55:57,874-Speed 18590.20 samples/sec Loss 10.9727 LearningRate 0.3548 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:56:02,288-Speed 18580.31 samples/sec Loss 10.9904 LearningRate 0.3547 Epoch: 3 Global Step: 15810 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:56:06,688-Speed 18623.53 samples/sec Loss 10.9567 LearningRate 0.3546 Epoch: 3 Global Step: 15820 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:56:11,102-Speed 18566.01 samples/sec Loss 10.9359 LearningRate 0.3545 Epoch: 3 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:56:15,554-Speed 18407.72 samples/sec Loss 11.0108 LearningRate 0.3545 Epoch: 3 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:56:19,943-Speed 18668.23 samples/sec Loss 10.9675 LearningRate 0.3544 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:56:24,374-Speed 18497.94 samples/sec Loss 10.8928 LearningRate 0.3543 Epoch: 3 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:56:28,793-Speed 18543.63 samples/sec Loss 10.9274 LearningRate 0.3542 Epoch: 3 Global Step: 15870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:56:33,225-Speed 18489.29 samples/sec Loss 10.8850 LearningRate 0.3541 Epoch: 3 Global Step: 15880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:56:37,610-Speed 18687.99 samples/sec Loss 11.0134 LearningRate 0.3541 Epoch: 3 Global Step: 15890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:56:42,055-Speed 18434.15 samples/sec Loss 10.9823 LearningRate 0.3540 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:56:46,466-Speed 18574.01 samples/sec Loss 10.9195 LearningRate 0.3539 Epoch: 3 Global Step: 15910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:56:50,925-Speed 18378.05 samples/sec Loss 10.9491 LearningRate 0.3538 Epoch: 3 Global Step: 15920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:56:55,314-Speed 18669.61 samples/sec Loss 10.8968 LearningRate 0.3537 Epoch: 3 Global Step: 15930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:56:59,737-Speed 18524.98 samples/sec Loss 10.8892 LearningRate 0.3537 Epoch: 3 Global Step: 15940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:57:04,196-Speed 18377.57 samples/sec Loss 10.9284 LearningRate 0.3536 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:57:08,642-Speed 18432.69 samples/sec Loss 10.9162 LearningRate 0.3535 Epoch: 3 Global Step: 15960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:57:13,118-Speed 18305.04 samples/sec Loss 10.9148 LearningRate 0.3534 Epoch: 3 Global Step: 15970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:17,565-Speed 18426.97 samples/sec Loss 10.9059 LearningRate 0.3533 Epoch: 3 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:22,014-Speed 18418.09 samples/sec Loss 10.9271 LearningRate 0.3533 Epoch: 3 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:26,429-Speed 18564.98 samples/sec Loss 10.9252 LearningRate 0.3532 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:30,847-Speed 18546.67 samples/sec Loss 10.8687 LearningRate 0.3531 Epoch: 3 Global Step: 16010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:35,333-Speed 18262.36 samples/sec Loss 10.8576 LearningRate 0.3530 Epoch: 3 Global Step: 16020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:39,752-Speed 18547.66 samples/sec Loss 10.9018 LearningRate 0.3529 Epoch: 3 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:44,175-Speed 18524.38 samples/sec Loss 10.9417 LearningRate 0.3528 Epoch: 3 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:57:48,582-Speed 18596.69 samples/sec Loss 10.9120 LearningRate 0.3528 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:57:52,968-Speed 18679.34 samples/sec Loss 10.8725 LearningRate 0.3527 Epoch: 3 Global Step: 16060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:57:57,375-Speed 18599.84 samples/sec Loss 10.8839 LearningRate 0.3526 Epoch: 3 Global Step: 16070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:01,777-Speed 18609.66 samples/sec Loss 10.9043 LearningRate 0.3525 Epoch: 3 Global Step: 16080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:06,190-Speed 18570.89 samples/sec Loss 10.9010 LearningRate 0.3524 Epoch: 3 Global Step: 16090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:10,609-Speed 18543.85 samples/sec Loss 10.8267 LearningRate 0.3524 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:15,069-Speed 18372.11 samples/sec Loss 10.8595 LearningRate 0.3523 Epoch: 3 Global Step: 16110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:19,505-Speed 18468.99 samples/sec Loss 10.8855 LearningRate 0.3522 Epoch: 3 Global Step: 16120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:23,932-Speed 18513.56 samples/sec Loss 10.9053 LearningRate 0.3521 Epoch: 3 Global Step: 16130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:28,323-Speed 18658.63 samples/sec Loss 10.8759 LearningRate 0.3520 Epoch: 3 Global Step: 16140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:58:32,741-Speed 18543.10 samples/sec Loss 10.8872 LearningRate 0.3520 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:58:37,180-Speed 18460.28 samples/sec Loss 10.7978 LearningRate 0.3519 Epoch: 3 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:58:41,574-Speed 18648.78 samples/sec Loss 10.8662 LearningRate 0.3518 Epoch: 3 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:58:45,982-Speed 18599.95 samples/sec Loss 10.8820 LearningRate 0.3517 Epoch: 3 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:58:50,415-Speed 18483.97 samples/sec Loss 10.8539 LearningRate 0.3516 Epoch: 3 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:58:54,908-Speed 18238.83 samples/sec Loss 10.8533 LearningRate 0.3516 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:58:59,385-Speed 18300.98 samples/sec Loss 10.8470 LearningRate 0.3515 Epoch: 3 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:59:03,837-Speed 18402.66 samples/sec Loss 10.8363 LearningRate 0.3514 Epoch: 3 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:59:08,275-Speed 18470.80 samples/sec Loss 10.8001 LearningRate 0.3513 Epoch: 3 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:59:12,716-Speed 18455.07 samples/sec Loss 10.8268 LearningRate 0.3512 Epoch: 3 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-13 23:59:17,175-Speed 18376.22 samples/sec Loss 10.8299 LearningRate 0.3512 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-13 23:59:21,593-Speed 18552.43 samples/sec Loss 10.8297 LearningRate 0.3511 Epoch: 3 Global Step: 16260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:26,014-Speed 18536.08 samples/sec Loss 10.8278 LearningRate 0.3510 Epoch: 3 Global Step: 16270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:30,413-Speed 18627.75 samples/sec Loss 10.8270 LearningRate 0.3509 Epoch: 3 Global Step: 16280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:34,848-Speed 18475.03 samples/sec Loss 10.8097 LearningRate 0.3508 Epoch: 3 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:39,306-Speed 18382.63 samples/sec Loss 10.7906 LearningRate 0.3508 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:43,682-Speed 18724.19 samples/sec Loss 10.8221 LearningRate 0.3507 Epoch: 3 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:48,073-Speed 18660.57 samples/sec Loss 10.8100 LearningRate 0.3506 Epoch: 3 Global Step: 16320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:52,495-Speed 18530.18 samples/sec Loss 10.8121 LearningRate 0.3505 Epoch: 3 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-13 23:59:56,881-Speed 18687.86 samples/sec Loss 10.7900 LearningRate 0.3504 Epoch: 3 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:01,336-Speed 18398.69 samples/sec Loss 10.7932 LearningRate 0.3504 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:05,750-Speed 18568.98 samples/sec Loss 10.7684 LearningRate 0.3503 Epoch: 3 Global Step: 16360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:00:10,181-Speed 18495.41 samples/sec Loss 10.7988 LearningRate 0.3502 Epoch: 3 Global Step: 16370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:00:14,609-Speed 18505.99 samples/sec Loss 10.8022 LearningRate 0.3501 Epoch: 3 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:00:19,036-Speed 18511.71 samples/sec Loss 10.7841 LearningRate 0.3500 Epoch: 3 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:00:23,506-Speed 18334.39 samples/sec Loss 10.8066 LearningRate 0.3500 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:00:27,897-Speed 18661.67 samples/sec Loss 10.8321 LearningRate 0.3499 Epoch: 3 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:00:32,287-Speed 18663.13 samples/sec Loss 10.8396 LearningRate 0.3498 Epoch: 3 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:36,716-Speed 18508.52 samples/sec Loss 10.7605 LearningRate 0.3497 Epoch: 3 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:41,107-Speed 18661.58 samples/sec Loss 10.8088 LearningRate 0.3496 Epoch: 3 Global Step: 16440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:45,471-Speed 18775.65 samples/sec Loss 10.7534 LearningRate 0.3496 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:49,859-Speed 18676.86 samples/sec Loss 10.7466 LearningRate 0.3495 Epoch: 3 Global Step: 16460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:54,277-Speed 18549.76 samples/sec Loss 10.7867 LearningRate 0.3494 Epoch: 3 Global Step: 16470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:00:58,679-Speed 18610.25 samples/sec Loss 10.8071 LearningRate 0.3493 Epoch: 3 Global Step: 16480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:01:03,094-Speed 18559.33 samples/sec Loss 10.8476 LearningRate 0.3492 Epoch: 3 Global Step: 16490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:01:07,530-Speed 18477.07 samples/sec Loss 10.7853 LearningRate 0.3492 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:01:11,974-Speed 18435.50 samples/sec Loss 10.7729 LearningRate 0.3491 Epoch: 3 Global Step: 16510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:01:16,378-Speed 18609.28 samples/sec Loss 10.7314 LearningRate 0.3490 Epoch: 3 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:20,793-Speed 18566.53 samples/sec Loss 10.8080 LearningRate 0.3489 Epoch: 3 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:25,192-Speed 18631.40 samples/sec Loss 10.7297 LearningRate 0.3488 Epoch: 3 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:29,582-Speed 18669.55 samples/sec Loss 10.7727 LearningRate 0.3488 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:36,616-Speed 11648.98 samples/sec Loss 10.7521 LearningRate 0.3487 Epoch: 3 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:41,019-Speed 18609.83 samples/sec Loss 10.7662 LearningRate 0.3486 Epoch: 3 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:45,435-Speed 18564.33 samples/sec Loss 10.7550 LearningRate 0.3485 Epoch: 3 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:49,859-Speed 18523.16 samples/sec Loss 10.6976 LearningRate 0.3484 Epoch: 3 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:54,300-Speed 18453.36 samples/sec Loss 10.6824 LearningRate 0.3484 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:01:58,698-Speed 18629.57 samples/sec Loss 10.7717 LearningRate 0.3483 Epoch: 3 Global Step: 16610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:02:03,147-Speed 18415.98 samples/sec Loss 10.7254 LearningRate 0.3482 Epoch: 3 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:02:07,555-Speed 18592.83 samples/sec Loss 10.7492 LearningRate 0.3481 Epoch: 3 Global Step: 16630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:02:11,946-Speed 18662.68 samples/sec Loss 10.7523 LearningRate 0.3480 Epoch: 3 Global Step: 16640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:02:16,412-Speed 18346.82 samples/sec Loss 10.6592 LearningRate 0.3480 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:20,828-Speed 18556.26 samples/sec Loss 10.7145 LearningRate 0.3479 Epoch: 3 Global Step: 16660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:25,217-Speed 18670.23 samples/sec Loss 10.7938 LearningRate 0.3478 Epoch: 3 Global Step: 16670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:29,701-Speed 18272.46 samples/sec Loss 10.7522 LearningRate 0.3477 Epoch: 3 Global Step: 16680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:34,115-Speed 18561.45 samples/sec Loss 10.7178 LearningRate 0.3476 Epoch: 3 Global Step: 16690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:38,509-Speed 18646.90 samples/sec Loss 10.7118 LearningRate 0.3476 Epoch: 3 Global Step: 16700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:42,963-Speed 18399.10 samples/sec Loss 10.7298 LearningRate 0.3475 Epoch: 3 Global Step: 16710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:47,353-Speed 18662.70 samples/sec Loss 10.7126 LearningRate 0.3474 Epoch: 3 Global Step: 16720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:51,817-Speed 18354.58 samples/sec Loss 10.7109 LearningRate 0.3473 Epoch: 3 Global Step: 16730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:02:56,223-Speed 18598.81 samples/sec Loss 10.6998 LearningRate 0.3472 Epoch: 3 Global Step: 16740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:03:00,637-Speed 18559.47 samples/sec Loss 10.7198 LearningRate 0.3472 Epoch: 3 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:05,084-Speed 18427.53 samples/sec Loss 10.7215 LearningRate 0.3471 Epoch: 3 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:09,468-Speed 18690.32 samples/sec Loss 10.7291 LearningRate 0.3470 Epoch: 3 Global Step: 16770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:13,879-Speed 18574.41 samples/sec Loss 10.6919 LearningRate 0.3469 Epoch: 3 Global Step: 16780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:18,335-Speed 18390.47 samples/sec Loss 10.7434 LearningRate 0.3468 Epoch: 3 Global Step: 16790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:22,772-Speed 18465.77 samples/sec Loss 10.6852 LearningRate 0.3468 Epoch: 3 Global Step: 16800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:27,176-Speed 18606.25 samples/sec Loss 10.7303 LearningRate 0.3467 Epoch: 3 Global Step: 16810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:31,578-Speed 18612.41 samples/sec Loss 10.6926 LearningRate 0.3466 Epoch: 3 Global Step: 16820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:35,989-Speed 18574.93 samples/sec Loss 10.7501 LearningRate 0.3465 Epoch: 3 Global Step: 16830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:40,435-Speed 18431.19 samples/sec Loss 10.6951 LearningRate 0.3464 Epoch: 3 Global Step: 16840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:03:44,869-Speed 18483.63 samples/sec Loss 10.6825 LearningRate 0.3464 Epoch: 3 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:03:49,287-Speed 18547.87 samples/sec Loss 10.7156 LearningRate 0.3463 Epoch: 3 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:03:53,696-Speed 18584.93 samples/sec Loss 10.7255 LearningRate 0.3462 Epoch: 3 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:03:58,189-Speed 18236.22 samples/sec Loss 10.6769 LearningRate 0.3461 Epoch: 3 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:02,647-Speed 18378.11 samples/sec Loss 10.7040 LearningRate 0.3460 Epoch: 3 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:07,081-Speed 18480.82 samples/sec Loss 10.6772 LearningRate 0.3460 Epoch: 3 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:11,528-Speed 18426.50 samples/sec Loss 10.6642 LearningRate 0.3459 Epoch: 3 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:16,000-Speed 18323.86 samples/sec Loss 10.7010 LearningRate 0.3458 Epoch: 3 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:20,434-Speed 18479.60 samples/sec Loss 10.7104 LearningRate 0.3457 Epoch: 3 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:24,913-Speed 18297.25 samples/sec Loss 10.6366 LearningRate 0.3456 Epoch: 3 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:29,354-Speed 18449.18 samples/sec Loss 10.6867 LearningRate 0.3456 Epoch: 3 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:33,778-Speed 18525.33 samples/sec Loss 10.6656 LearningRate 0.3455 Epoch: 3 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:38,242-Speed 18355.58 samples/sec Loss 10.6604 LearningRate 0.3454 Epoch: 3 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:42,661-Speed 18543.08 samples/sec Loss 10.6735 LearningRate 0.3453 Epoch: 3 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:47,129-Speed 18339.70 samples/sec Loss 10.6623 LearningRate 0.3452 Epoch: 3 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:51,607-Speed 18296.53 samples/sec Loss 10.6596 LearningRate 0.3452 Epoch: 3 Global Step: 17000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:04:56,058-Speed 18411.51 samples/sec Loss 10.6751 LearningRate 0.3451 Epoch: 3 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:00,507-Speed 18414.08 samples/sec Loss 10.6664 LearningRate 0.3450 Epoch: 3 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:04,934-Speed 18506.94 samples/sec Loss 10.6763 LearningRate 0.3449 Epoch: 3 Global Step: 17030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:09,425-Speed 18246.00 samples/sec Loss 10.6625 LearningRate 0.3448 Epoch: 3 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:13,902-Speed 18304.02 samples/sec Loss 10.7043 LearningRate 0.3448 Epoch: 3 Global Step: 17050 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-14 00:05:18,305-Speed 18611.30 samples/sec Loss 10.6416 LearningRate 0.3447 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:22,700-Speed 18642.40 samples/sec Loss 10.5637 LearningRate 0.3446 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:27,101-Speed 18617.44 samples/sec Loss 10.6105 LearningRate 0.3445 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:31,543-Speed 18446.87 samples/sec Loss 10.6094 LearningRate 0.3444 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:35,950-Speed 18593.32 samples/sec Loss 10.6197 LearningRate 0.3444 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:40,379-Speed 18503.10 samples/sec Loss 10.6102 LearningRate 0.3443 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:44,839-Speed 18373.38 samples/sec Loss 10.6261 LearningRate 0.3442 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:49,217-Speed 18717.09 samples/sec Loss 10.6341 LearningRate 0.3441 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:53,647-Speed 18499.42 samples/sec Loss 10.6767 LearningRate 0.3440 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:05:58,054-Speed 18595.17 samples/sec Loss 10.6294 LearningRate 0.3440 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:02,443-Speed 18671.79 samples/sec Loss 10.6370 LearningRate 0.3439 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:06,821-Speed 18719.37 samples/sec Loss 10.5972 LearningRate 0.3438 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:11,228-Speed 18592.66 samples/sec Loss 10.6366 LearningRate 0.3437 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:15,705-Speed 18303.70 samples/sec Loss 10.6138 LearningRate 0.3437 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:20,101-Speed 18640.10 samples/sec Loss 10.6257 LearningRate 0.3436 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:24,499-Speed 18631.37 samples/sec Loss 10.5735 LearningRate 0.3435 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:28,943-Speed 18442.01 samples/sec Loss 10.5992 LearningRate 0.3434 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:33,372-Speed 18499.80 samples/sec Loss 10.6599 LearningRate 0.3433 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:37,855-Speed 18280.66 samples/sec Loss 10.6006 LearningRate 0.3433 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:42,353-Speed 18218.22 samples/sec Loss 10.5959 LearningRate 0.3432 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:46,819-Speed 18346.74 samples/sec Loss 10.6003 LearningRate 0.3431 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:51,237-Speed 18548.53 samples/sec Loss 10.5771 LearningRate 0.3430 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:06:55,706-Speed 18336.44 samples/sec Loss 10.5647 LearningRate 0.3429 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:07:00,112-Speed 18596.58 samples/sec Loss 10.6350 LearningRate 0.3429 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:04,552-Speed 18457.74 samples/sec Loss 10.6064 LearningRate 0.3428 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:08,957-Speed 18599.05 samples/sec Loss 10.6001 LearningRate 0.3427 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:13,383-Speed 18514.55 samples/sec Loss 10.5960 LearningRate 0.3426 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:17,819-Speed 18473.51 samples/sec Loss 10.5789 LearningRate 0.3425 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:22,222-Speed 18608.14 samples/sec Loss 10.5952 LearningRate 0.3425 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:26,633-Speed 18582.36 samples/sec Loss 10.5421 LearningRate 0.3424 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:31,112-Speed 18293.20 samples/sec Loss 10.6094 LearningRate 0.3423 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:35,493-Speed 18703.03 samples/sec Loss 10.5525 LearningRate 0.3422 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:39,889-Speed 18641.36 samples/sec Loss 10.6234 LearningRate 0.3421 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:07:44,329-Speed 18453.89 samples/sec Loss 10.5969 LearningRate 0.3421 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:07:48,717-Speed 18684.30 samples/sec Loss 10.5686 LearningRate 0.3420 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:07:53,194-Speed 18305.31 samples/sec Loss 10.5294 LearningRate 0.3419 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:07:57,583-Speed 18677.87 samples/sec Loss 10.5412 LearningRate 0.3418 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:01,988-Speed 18606.04 samples/sec Loss 10.5521 LearningRate 0.3417 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:06,426-Speed 18461.72 samples/sec Loss 10.5378 LearningRate 0.3417 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:10,842-Speed 18556.44 samples/sec Loss 10.5386 LearningRate 0.3416 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:15,266-Speed 18519.85 samples/sec Loss 10.5252 LearningRate 0.3415 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:19,680-Speed 18566.98 samples/sec Loss 10.5427 LearningRate 0.3414 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:24,145-Speed 18349.68 samples/sec Loss 10.5232 LearningRate 0.3413 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:28,577-Speed 18489.14 samples/sec Loss 10.5426 LearningRate 0.3413 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:32,994-Speed 18551.71 samples/sec Loss 10.5656 LearningRate 0.3412 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:08:37,413-Speed 18542.50 samples/sec Loss 10.5271 LearningRate 0.3411 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:08:41,830-Speed 18549.66 samples/sec Loss 10.5340 LearningRate 0.3410 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:08:46,300-Speed 18334.04 samples/sec Loss 10.5165 LearningRate 0.3410 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:08:50,699-Speed 18626.14 samples/sec Loss 10.5443 LearningRate 0.3409 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:08:55,158-Speed 18376.85 samples/sec Loss 10.5706 LearningRate 0.3408 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:08:59,640-Speed 18281.69 samples/sec Loss 10.5054 LearningRate 0.3407 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:04,052-Speed 18569.60 samples/sec Loss 10.5124 LearningRate 0.3406 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:08,465-Speed 18571.84 samples/sec Loss 10.5445 LearningRate 0.3406 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:12,867-Speed 18611.39 samples/sec Loss 10.5521 LearningRate 0.3405 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:17,271-Speed 18606.10 samples/sec Loss 10.5014 LearningRate 0.3404 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:21,648-Speed 18719.72 samples/sec Loss 10.5212 LearningRate 0.3403 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:09:26,038-Speed 18669.42 samples/sec Loss 10.5552 LearningRate 0.3402 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:09:30,468-Speed 18496.27 samples/sec Loss 10.5486 LearningRate 0.3402 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:34,876-Speed 18585.32 samples/sec Loss 10.4522 LearningRate 0.3401 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:39,285-Speed 18588.25 samples/sec Loss 10.4816 LearningRate 0.3400 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:43,659-Speed 18734.70 samples/sec Loss 10.4616 LearningRate 0.3399 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:48,150-Speed 18245.05 samples/sec Loss 10.5217 LearningRate 0.3398 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:52,612-Speed 18365.16 samples/sec Loss 10.4514 LearningRate 0.3398 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:09:57,053-Speed 18453.39 samples/sec Loss 10.4367 LearningRate 0.3397 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:01,455-Speed 18611.11 samples/sec Loss 10.4425 LearningRate 0.3396 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:05,857-Speed 18613.88 samples/sec Loss 10.4755 LearningRate 0.3395 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:10,269-Speed 18573.64 samples/sec Loss 10.5015 LearningRate 0.3395 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:14,677-Speed 18587.65 samples/sec Loss 10.4996 LearningRate 0.3394 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:10:19,108-Speed 18494.15 samples/sec Loss 10.4303 LearningRate 0.3393 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:23,558-Speed 18415.79 samples/sec Loss 10.4531 LearningRate 0.3392 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:27,962-Speed 18606.56 samples/sec Loss 10.4285 LearningRate 0.3391 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:32,363-Speed 18619.53 samples/sec Loss 10.5205 LearningRate 0.3391 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:36,728-Speed 18769.49 samples/sec Loss 10.4894 LearningRate 0.3390 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:41,189-Speed 18372.59 samples/sec Loss 10.5075 LearningRate 0.3389 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:45,578-Speed 18668.11 samples/sec Loss 10.4821 LearningRate 0.3388 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:49,984-Speed 18599.66 samples/sec Loss 10.4749 LearningRate 0.3387 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:54,377-Speed 18654.22 samples/sec Loss 10.4648 LearningRate 0.3387 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:10:58,768-Speed 18658.43 samples/sec Loss 10.4736 LearningRate 0.3386 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:03,168-Speed 18623.28 samples/sec Loss 10.4882 LearningRate 0.3385 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:11:07,599-Speed 18494.02 samples/sec Loss 10.4696 LearningRate 0.3384 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:12,051-Speed 18405.60 samples/sec Loss 10.4741 LearningRate 0.3383 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:16,464-Speed 18568.37 samples/sec Loss 10.4397 LearningRate 0.3383 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:20,927-Speed 18361.44 samples/sec Loss 10.4154 LearningRate 0.3382 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:25,368-Speed 18453.74 samples/sec Loss 10.4685 LearningRate 0.3381 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:29,775-Speed 18590.70 samples/sec Loss 10.4353 LearningRate 0.3380 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:34,192-Speed 18552.95 samples/sec Loss 10.4806 LearningRate 0.3380 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:38,573-Speed 18705.66 samples/sec Loss 10.4903 LearningRate 0.3379 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:42,969-Speed 18639.20 samples/sec Loss 10.4526 LearningRate 0.3378 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:47,411-Speed 18449.10 samples/sec Loss 10.4399 LearningRate 0.3377 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:11:51,775-Speed 18776.14 samples/sec Loss 10.4356 LearningRate 0.3376 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:11:56,171-Speed 18641.67 samples/sec Loss 10.4688 LearningRate 0.3376 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:12:00,578-Speed 18593.27 samples/sec Loss 10.4274 LearningRate 0.3375 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:05,001-Speed 18523.82 samples/sec Loss 10.4167 LearningRate 0.3374 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:09,427-Speed 18520.27 samples/sec Loss 10.4585 LearningRate 0.3373 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:13,816-Speed 18666.60 samples/sec Loss 10.5103 LearningRate 0.3372 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:18,214-Speed 18633.47 samples/sec Loss 10.4707 LearningRate 0.3372 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:22,602-Speed 18673.67 samples/sec Loss 10.4623 LearningRate 0.3371 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:26,990-Speed 18679.34 samples/sec Loss 10.4016 LearningRate 0.3370 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:31,392-Speed 18612.96 samples/sec Loss 10.4000 LearningRate 0.3369 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:35,874-Speed 18283.62 samples/sec Loss 10.4574 LearningRate 0.3369 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:40,293-Speed 18544.32 samples/sec Loss 10.4167 LearningRate 0.3368 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:44,739-Speed 18433.91 samples/sec Loss 10.4216 LearningRate 0.3367 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:49,169-Speed 18498.81 samples/sec Loss 10.4291 LearningRate 0.3366 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:53,554-Speed 18688.63 samples/sec Loss 10.3685 LearningRate 0.3365 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:12:58,009-Speed 18396.01 samples/sec Loss 10.4420 LearningRate 0.3365 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:13:02,517-Speed 18177.20 samples/sec Loss 10.4430 LearningRate 0.3364 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:13:13,912-Speed 7189.93 samples/sec Loss 10.4495 LearningRate 0.3363 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:13:18,377-Speed 18351.98 samples/sec Loss 10.3910 LearningRate 0.3362 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:13:22,805-Speed 18503.53 samples/sec Loss 10.4362 LearningRate 0.3361 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:13:27,217-Speed 18576.40 samples/sec Loss 10.4571 LearningRate 0.3361 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:13:31,635-Speed 18549.32 samples/sec Loss 10.4168 LearningRate 0.3360 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:13:36,134-Speed 18211.05 samples/sec Loss 10.3798 LearningRate 0.3359 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:13:40,561-Speed 18510.59 samples/sec Loss 10.3825 LearningRate 0.3358 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:13:44,963-Speed 18612.55 samples/sec Loss 10.4067 LearningRate 0.3357 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:13:49,398-Speed 18478.06 samples/sec Loss 10.3997 LearningRate 0.3357 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:13:53,871-Speed 18320.04 samples/sec Loss 10.4450 LearningRate 0.3356 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:13:58,260-Speed 18670.46 samples/sec Loss 10.4181 LearningRate 0.3355 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:02,756-Speed 18224.04 samples/sec Loss 10.3509 LearningRate 0.3354 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:07,185-Speed 18503.29 samples/sec Loss 10.3996 LearningRate 0.3354 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:11,614-Speed 18500.41 samples/sec Loss 10.2927 LearningRate 0.3353 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:16,061-Speed 18429.55 samples/sec Loss 10.4038 LearningRate 0.3352 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:20,478-Speed 18549.78 samples/sec Loss 10.3923 LearningRate 0.3351 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:24,954-Speed 18309.56 samples/sec Loss 10.3879 LearningRate 0.3350 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:29,463-Speed 18171.35 samples/sec Loss 10.3787 LearningRate 0.3350 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:33,895-Speed 18487.91 samples/sec Loss 10.3830 LearningRate 0.3349 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:38,357-Speed 18365.30 samples/sec Loss 10.3452 LearningRate 0.3348 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:42,795-Speed 18465.97 samples/sec Loss 10.3254 LearningRate 0.3347 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:14:47,221-Speed 18512.09 samples/sec Loss 10.3535 LearningRate 0.3347 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:14:51,636-Speed 18558.87 samples/sec Loss 10.3706 LearningRate 0.3346 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:14:56,079-Speed 18445.74 samples/sec Loss 10.3320 LearningRate 0.3345 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:00,494-Speed 18558.71 samples/sec Loss 10.3430 LearningRate 0.3344 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:04,940-Speed 18428.37 samples/sec Loss 10.3975 LearningRate 0.3343 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:09,365-Speed 18518.40 samples/sec Loss 10.4016 LearningRate 0.3343 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:13,831-Speed 18350.58 samples/sec Loss 10.3403 LearningRate 0.3342 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:18,275-Speed 18438.26 samples/sec Loss 10.3610 LearningRate 0.3341 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:22,665-Speed 18665.19 samples/sec Loss 10.3672 LearningRate 0.3340 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:27,079-Speed 18560.41 samples/sec Loss 10.3608 LearningRate 0.3339 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:31,573-Speed 18234.68 samples/sec Loss 10.3473 LearningRate 0.3339 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:36,020-Speed 18429.93 samples/sec Loss 10.3822 LearningRate 0.3338 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:15:40,472-Speed 18407.02 samples/sec Loss 10.3498 LearningRate 0.3337 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:15:44,894-Speed 18525.90 samples/sec Loss 10.3885 LearningRate 0.3336 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:49,328-Speed 18484.17 samples/sec Loss 10.3503 LearningRate 0.3336 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:53,750-Speed 18530.77 samples/sec Loss 10.3341 LearningRate 0.3335 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:15:58,168-Speed 18544.01 samples/sec Loss 10.3701 LearningRate 0.3334 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:02,613-Speed 18434.70 samples/sec Loss 10.3494 LearningRate 0.3333 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:07,070-Speed 18387.24 samples/sec Loss 10.3577 LearningRate 0.3332 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:11,491-Speed 18532.52 samples/sec Loss 10.2600 LearningRate 0.3332 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:15,891-Speed 18621.77 samples/sec Loss 10.3252 LearningRate 0.3331 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:20,315-Speed 18526.06 samples/sec Loss 10.3658 LearningRate 0.3330 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:24,761-Speed 18431.36 samples/sec Loss 10.3601 LearningRate 0.3329 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:29,204-Speed 18437.75 samples/sec Loss 10.3721 LearningRate 0.3328 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:16:33,608-Speed 18610.50 samples/sec Loss 10.3139 LearningRate 0.3328 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:16:38,068-Speed 18369.53 samples/sec Loss 10.3556 LearningRate 0.3327 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:16:42,505-Speed 18468.52 samples/sec Loss 10.2798 LearningRate 0.3326 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:46,908-Speed 18609.57 samples/sec Loss 10.3201 LearningRate 0.3325 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:16:51,288-Speed 18709.87 samples/sec Loss 10.3270 LearningRate 0.3325 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:16:55,677-Speed 18668.07 samples/sec Loss 10.3242 LearningRate 0.3324 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:00,100-Speed 18524.71 samples/sec Loss 10.2680 LearningRate 0.3323 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:04,506-Speed 18602.59 samples/sec Loss 10.3000 LearningRate 0.3322 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:08,970-Speed 18355.58 samples/sec Loss 10.2890 LearningRate 0.3321 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:13,446-Speed 18306.05 samples/sec Loss 10.2762 LearningRate 0.3321 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:17,855-Speed 18587.96 samples/sec Loss 10.3484 LearningRate 0.3320 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:22,238-Speed 18698.83 samples/sec Loss 10.3099 LearningRate 0.3319 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:26,670-Speed 18490.01 samples/sec Loss 10.3117 LearningRate 0.3318 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:31,087-Speed 18555.69 samples/sec Loss 10.3142 LearningRate 0.3318 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:17:35,483-Speed 18638.67 samples/sec Loss 10.2695 LearningRate 0.3317 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:17:39,893-Speed 18580.51 samples/sec Loss 10.3239 LearningRate 0.3316 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:17:44,320-Speed 18509.91 samples/sec Loss 10.3120 LearningRate 0.3315 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:17:48,742-Speed 18533.04 samples/sec Loss 10.3027 LearningRate 0.3314 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:17:53,141-Speed 18626.49 samples/sec Loss 10.2123 LearningRate 0.3314 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:17:57,580-Speed 18456.96 samples/sec Loss 10.2816 LearningRate 0.3313 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:18:01,961-Speed 18708.55 samples/sec Loss 10.2903 LearningRate 0.3312 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:18:06,344-Speed 18695.46 samples/sec Loss 10.2842 LearningRate 0.3311 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:18:10,763-Speed 18544.72 samples/sec Loss 10.3030 LearningRate 0.3311 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:18:15,134-Speed 18746.72 samples/sec Loss 10.2525 LearningRate 0.3310 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:18:19,554-Speed 18537.53 samples/sec Loss 10.2526 LearningRate 0.3309 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:24,028-Speed 18318.07 samples/sec Loss 10.2405 LearningRate 0.3308 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:28,440-Speed 18572.45 samples/sec Loss 10.2269 LearningRate 0.3307 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:32,835-Speed 18642.59 samples/sec Loss 10.2464 LearningRate 0.3307 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:37,290-Speed 18392.98 samples/sec Loss 10.3247 LearningRate 0.3306 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:41,770-Speed 18292.73 samples/sec Loss 10.3308 LearningRate 0.3305 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:46,195-Speed 18514.03 samples/sec Loss 10.2563 LearningRate 0.3304 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:50,590-Speed 18646.31 samples/sec Loss 10.2513 LearningRate 0.3304 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:55,001-Speed 18576.86 samples/sec Loss 10.2659 LearningRate 0.3303 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:18:59,444-Speed 18441.48 samples/sec Loss 10.2535 LearningRate 0.3302 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:19:03,841-Speed 18643.47 samples/sec Loss 10.2656 LearningRate 0.3301 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:08,292-Speed 18409.81 samples/sec Loss 10.2908 LearningRate 0.3300 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:12,701-Speed 18586.51 samples/sec Loss 10.2593 LearningRate 0.3300 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:17,125-Speed 18520.59 samples/sec Loss 10.2770 LearningRate 0.3299 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:21,553-Speed 18504.91 samples/sec Loss 10.2150 LearningRate 0.3298 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:25,982-Speed 18504.34 samples/sec Loss 10.2603 LearningRate 0.3297 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:30,365-Speed 18693.08 samples/sec Loss 10.1709 LearningRate 0.3297 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:34,739-Speed 18739.16 samples/sec Loss 10.2910 LearningRate 0.3296 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:39,130-Speed 18658.49 samples/sec Loss 10.2417 LearningRate 0.3295 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:43,520-Speed 18666.72 samples/sec Loss 10.2099 LearningRate 0.3294 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-01-14 00:19:47,922-Speed 18615.32 samples/sec Loss 10.1930 LearningRate 0.3293 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:19:52,369-Speed 18427.61 samples/sec Loss 10.1632 LearningRate 0.3293 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:19:56,761-Speed 18656.49 samples/sec Loss 10.1874 LearningRate 0.3292 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:01,171-Speed 18584.12 samples/sec Loss 10.2076 LearningRate 0.3291 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:05,591-Speed 18539.04 samples/sec Loss 10.1599 LearningRate 0.3290 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:10,009-Speed 18548.25 samples/sec Loss 10.2731 LearningRate 0.3290 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:14,421-Speed 18572.56 samples/sec Loss 10.2249 LearningRate 0.3289 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:18,889-Speed 18338.55 samples/sec Loss 10.2449 LearningRate 0.3288 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:23,315-Speed 18515.82 samples/sec Loss 10.2089 LearningRate 0.3287 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:27,721-Speed 18594.92 samples/sec Loss 10.2151 LearningRate 0.3286 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-14 00:20:32,125-Speed 18609.39 samples/sec Loss 10.2480 LearningRate 0.3286 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:20:36,506-Speed 18698.58 samples/sec Loss 10.2549 LearningRate 0.3285 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:20:40,966-Speed 18371.64 samples/sec Loss 10.2192 LearningRate 0.3284 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:20:45,401-Speed 18479.59 samples/sec Loss 10.2253 LearningRate 0.3283 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:20:49,795-Speed 18648.53 samples/sec Loss 10.2446 LearningRate 0.3283 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:20:54,191-Speed 18649.23 samples/sec Loss 10.2352 LearningRate 0.3282 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:20:58,607-Speed 18562.30 samples/sec Loss 10.1829 LearningRate 0.3281 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:21:03,026-Speed 18545.88 samples/sec Loss 10.1571 LearningRate 0.3280 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:21:07,458-Speed 18483.10 samples/sec Loss 10.1882 LearningRate 0.3279 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:21:11,855-Speed 18641.86 samples/sec Loss 10.1609 LearningRate 0.3279 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:21:16,234-Speed 18714.87 samples/sec Loss 10.2006 LearningRate 0.3278 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:20,625-Speed 18664.07 samples/sec Loss 10.1634 LearningRate 0.3277 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:25,051-Speed 18514.63 samples/sec Loss 10.1420 LearningRate 0.3276 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:29,501-Speed 18409.76 samples/sec Loss 10.1875 LearningRate 0.3276 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:33,906-Speed 18615.35 samples/sec Loss 10.2182 LearningRate 0.3275 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:38,383-Speed 18299.02 samples/sec Loss 10.1860 LearningRate 0.3274 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:42,843-Speed 18373.30 samples/sec Loss 10.1980 LearningRate 0.3273 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:47,224-Speed 18703.56 samples/sec Loss 10.1733 LearningRate 0.3272 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:51,621-Speed 18637.73 samples/sec Loss 10.2064 LearningRate 0.3272 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:21:55,997-Speed 18724.00 samples/sec Loss 10.1611 LearningRate 0.3271 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:22:00,418-Speed 18538.71 samples/sec Loss 10.1838 LearningRate 0.3270 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-14 00:22:04,862-Speed 18437.78 samples/sec Loss 10.1407 LearningRate 0.3269 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-14 00:22:09,263-Speed 18622.29 samples/sec Loss 10.1802 LearningRate 0.3269 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:22:13,643-Speed 18715.25 samples/sec Loss 10.1905 LearningRate 0.3268 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:22:18,046-Speed 18607.81 samples/sec Loss 10.1804 LearningRate 0.3267 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:22:22,457-Speed 18578.15 samples/sec Loss 10.1809 LearningRate 0.3266 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:22:26,882-Speed 18516.14 samples/sec Loss 10.1483 LearningRate 0.3265 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:22:31,375-Speed 18236.92 samples/sec Loss 10.1120 LearningRate 0.3265 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-14 00:22:35,777-Speed 18616.89 samples/sec Loss 10.1824 LearningRate 0.3264 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:22:40,165-Speed 18675.83 samples/sec Loss 10.1593 LearningRate 0.3263 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:22:44,618-Speed 18400.01 samples/sec Loss 10.1297 LearningRate 0.3262 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:22:49,083-Speed 18352.19 samples/sec Loss 10.1256 LearningRate 0.3262 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-14 00:22:53,469-Speed 18682.89 samples/sec Loss 10.1508 LearningRate 0.3261 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:22:57,923-Speed 18398.90 samples/sec Loss 10.2119 LearningRate 0.3260 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:02,367-Speed 18441.94 samples/sec Loss 10.0923 LearningRate 0.3259 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:06,753-Speed 18681.35 samples/sec Loss 10.0960 LearningRate 0.3258 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:11,205-Speed 18405.55 samples/sec Loss 10.1135 LearningRate 0.3258 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:15,717-Speed 18162.23 samples/sec Loss 10.0706 LearningRate 0.3257 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:20,128-Speed 18577.06 samples/sec Loss 10.1643 LearningRate 0.3256 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:23:24,552-Speed 18524.24 samples/sec Loss 10.1415 LearningRate 0.3255 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:23:28,990-Speed 18464.02 samples/sec Loss 10.1781 LearningRate 0.3255 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:23:33,406-Speed 18555.42 samples/sec Loss 10.1459 LearningRate 0.3254 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:23:37,827-Speed 18535.71 samples/sec Loss 10.1630 LearningRate 0.3253 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:23:42,238-Speed 18577.37 samples/sec Loss 10.1492 LearningRate 0.3252 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:46,651-Speed 18568.57 samples/sec Loss 10.1327 LearningRate 0.3252 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:51,079-Speed 18505.66 samples/sec Loss 10.1519 LearningRate 0.3251 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:55,487-Speed 18590.72 samples/sec Loss 10.0987 LearningRate 0.3250 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:23:59,933-Speed 18432.70 samples/sec Loss 10.1257 LearningRate 0.3249 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:24:04,326-Speed 18652.49 samples/sec Loss 10.1761 LearningRate 0.3248 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:24:08,700-Speed 18734.22 samples/sec Loss 10.1023 LearningRate 0.3248 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:24:13,140-Speed 18456.49 samples/sec Loss 10.1184 LearningRate 0.3247 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:24:17,596-Speed 18389.07 samples/sec Loss 10.1168 LearningRate 0.3246 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:24:22,122-Speed 18105.11 samples/sec Loss 10.1225 LearningRate 0.3245 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:24:26,588-Speed 18346.52 samples/sec Loss 10.1296 LearningRate 0.3245 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:24:31,025-Speed 18470.61 samples/sec Loss 10.1072 LearningRate 0.3244 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:24:35,428-Speed 18610.26 samples/sec Loss 10.0995 LearningRate 0.3243 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:24:39,903-Speed 18311.16 samples/sec Loss 10.1432 LearningRate 0.3242 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:24:44,351-Speed 18424.40 samples/sec Loss 10.0883 LearningRate 0.3241 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:24:48,726-Speed 18727.77 samples/sec Loss 10.0803 LearningRate 0.3241 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:24:53,144-Speed 18548.51 samples/sec Loss 10.0651 LearningRate 0.3240 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:24:57,624-Speed 18291.71 samples/sec Loss 10.1110 LearningRate 0.3239 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:02,036-Speed 18580.09 samples/sec Loss 10.1264 LearningRate 0.3238 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:06,403-Speed 18765.38 samples/sec Loss 10.1218 LearningRate 0.3238 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:10,793-Speed 18668.27 samples/sec Loss 10.1010 LearningRate 0.3237 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:15,216-Speed 18524.63 samples/sec Loss 10.0784 LearningRate 0.3236 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:19,679-Speed 18359.08 samples/sec Loss 10.0632 LearningRate 0.3235 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:24,088-Speed 18588.30 samples/sec Loss 10.1043 LearningRate 0.3235 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:28,505-Speed 18552.41 samples/sec Loss 10.0274 LearningRate 0.3234 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:32,947-Speed 18444.93 samples/sec Loss 10.0878 LearningRate 0.3233 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:37,387-Speed 18458.91 samples/sec Loss 10.1026 LearningRate 0.3232 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:25:41,759-Speed 18749.55 samples/sec Loss 10.0784 LearningRate 0.3231 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:46,166-Speed 18591.21 samples/sec Loss 10.0935 LearningRate 0.3231 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:50,545-Speed 18709.32 samples/sec Loss 10.0699 LearningRate 0.3230 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:54,917-Speed 18744.77 samples/sec Loss 10.0661 LearningRate 0.3229 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:25:59,310-Speed 18647.54 samples/sec Loss 10.1425 LearningRate 0.3228 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:03,717-Speed 18597.19 samples/sec Loss 10.0465 LearningRate 0.3228 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:08,111-Speed 18644.96 samples/sec Loss 10.0988 LearningRate 0.3227 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:12,547-Speed 18474.23 samples/sec Loss 10.0528 LearningRate 0.3226 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:16,951-Speed 18602.94 samples/sec Loss 10.0663 LearningRate 0.3225 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:21,341-Speed 18668.25 samples/sec Loss 10.0725 LearningRate 0.3225 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:25,801-Speed 18373.21 samples/sec Loss 10.0209 LearningRate 0.3224 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:26:30,195-Speed 18645.96 samples/sec Loss 10.0927 LearningRate 0.3223 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:34,650-Speed 18394.13 samples/sec Loss 10.0481 LearningRate 0.3222 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:39,063-Speed 18571.05 samples/sec Loss 10.0564 LearningRate 0.3221 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:43,451-Speed 18674.78 samples/sec Loss 10.0223 LearningRate 0.3221 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:47,868-Speed 18551.73 samples/sec Loss 9.9835 LearningRate 0.3220 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:52,281-Speed 18565.98 samples/sec Loss 10.0055 LearningRate 0.3219 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:26:56,672-Speed 18663.49 samples/sec Loss 10.0653 LearningRate 0.3218 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:27:01,094-Speed 18531.12 samples/sec Loss 10.0880 LearningRate 0.3218 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:27:05,511-Speed 18553.09 samples/sec Loss 10.0643 LearningRate 0.3217 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:27:09,919-Speed 18586.22 samples/sec Loss 10.0696 LearningRate 0.3216 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:27:14,333-Speed 18566.82 samples/sec Loss 10.0598 LearningRate 0.3215 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:18,762-Speed 18501.64 samples/sec Loss 10.0251 LearningRate 0.3215 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:23,162-Speed 18622.53 samples/sec Loss 10.0491 LearningRate 0.3214 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:27,556-Speed 18647.12 samples/sec Loss 9.9982 LearningRate 0.3213 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:31,975-Speed 18547.36 samples/sec Loss 10.0494 LearningRate 0.3212 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:36,372-Speed 18631.14 samples/sec Loss 10.1050 LearningRate 0.3211 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:40,795-Speed 18529.63 samples/sec Loss 10.0618 LearningRate 0.3211 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:45,174-Speed 18708.82 samples/sec Loss 10.0001 LearningRate 0.3210 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:49,610-Speed 18474.80 samples/sec Loss 9.9717 LearningRate 0.3209 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:54,008-Speed 18631.11 samples/sec Loss 9.9974 LearningRate 0.3208 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:27:58,419-Speed 18576.16 samples/sec Loss 10.0554 LearningRate 0.3208 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-14 00:28:02,822-Speed 18615.74 samples/sec Loss 10.0564 LearningRate 0.3207 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:07,215-Speed 18648.33 samples/sec Loss 10.0192 LearningRate 0.3206 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:11,611-Speed 18643.59 samples/sec Loss 10.0538 LearningRate 0.3205 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:16,003-Speed 18653.12 samples/sec Loss 9.9888 LearningRate 0.3205 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:20,394-Speed 18660.75 samples/sec Loss 9.9763 LearningRate 0.3204 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:24,786-Speed 18659.12 samples/sec Loss 9.9789 LearningRate 0.3203 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:29,205-Speed 18544.37 samples/sec Loss 10.0098 LearningRate 0.3202 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:33,596-Speed 18663.60 samples/sec Loss 10.0424 LearningRate 0.3201 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:38,026-Speed 18499.29 samples/sec Loss 10.0287 LearningRate 0.3201 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:28:42,383-Speed 18808.22 samples/sec Loss 9.9991 LearningRate 0.3200 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:28:46,761-Speed 18719.10 samples/sec Loss 10.0437 LearningRate 0.3199 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:28:51,157-Speed 18642.63 samples/sec Loss 10.0392 LearningRate 0.3198 Epoch: 3 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:28:55,538-Speed 18709.53 samples/sec Loss 10.0097 LearningRate 0.3198 Epoch: 3 Global Step: 20250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:28:59,910-Speed 18742.35 samples/sec Loss 9.9534 LearningRate 0.3197 Epoch: 3 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:04,313-Speed 18610.97 samples/sec Loss 9.9656 LearningRate 0.3196 Epoch: 3 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:08,763-Speed 18412.77 samples/sec Loss 9.9716 LearningRate 0.3195 Epoch: 3 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:13,209-Speed 18435.63 samples/sec Loss 9.9213 LearningRate 0.3195 Epoch: 3 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:17,603-Speed 18648.80 samples/sec Loss 9.9907 LearningRate 0.3194 Epoch: 3 Global Step: 20300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:22,016-Speed 18568.45 samples/sec Loss 10.0008 LearningRate 0.3193 Epoch: 3 Global Step: 20310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:26,420-Speed 18606.71 samples/sec Loss 9.9722 LearningRate 0.3192 Epoch: 3 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:29:30,854-Speed 18476.70 samples/sec Loss 9.9598 LearningRate 0.3192 Epoch: 3 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:35,245-Speed 18664.74 samples/sec Loss 9.9938 LearningRate 0.3191 Epoch: 3 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:39,681-Speed 18472.76 samples/sec Loss 10.0059 LearningRate 0.3190 Epoch: 3 Global Step: 20350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:44,081-Speed 18623.14 samples/sec Loss 9.9167 LearningRate 0.3189 Epoch: 3 Global Step: 20360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:48,473-Speed 18655.49 samples/sec Loss 9.9815 LearningRate 0.3188 Epoch: 3 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:52,870-Speed 18634.97 samples/sec Loss 10.0103 LearningRate 0.3188 Epoch: 3 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:29:57,282-Speed 18578.05 samples/sec Loss 9.9746 LearningRate 0.3187 Epoch: 3 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:01,719-Speed 18466.22 samples/sec Loss 9.9801 LearningRate 0.3186 Epoch: 3 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:06,231-Speed 18162.04 samples/sec Loss 9.9461 LearningRate 0.3185 Epoch: 3 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:10,661-Speed 18497.58 samples/sec Loss 9.9824 LearningRate 0.3185 Epoch: 3 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:15,088-Speed 18509.04 samples/sec Loss 9.9788 LearningRate 0.3184 Epoch: 3 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:19,502-Speed 18565.44 samples/sec Loss 9.9655 LearningRate 0.3183 Epoch: 3 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:23,978-Speed 18304.73 samples/sec Loss 10.0061 LearningRate 0.3182 Epoch: 3 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:28,389-Speed 18576.70 samples/sec Loss 9.9646 LearningRate 0.3182 Epoch: 3 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:32,789-Speed 18624.99 samples/sec Loss 9.9826 LearningRate 0.3181 Epoch: 3 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:37,200-Speed 18575.57 samples/sec Loss 9.9362 LearningRate 0.3180 Epoch: 3 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:41,603-Speed 18609.84 samples/sec Loss 9.9553 LearningRate 0.3179 Epoch: 3 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:46,018-Speed 18559.20 samples/sec Loss 9.9402 LearningRate 0.3179 Epoch: 3 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:50,432-Speed 18561.30 samples/sec Loss 9.9882 LearningRate 0.3178 Epoch: 3 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:54,887-Speed 18394.98 samples/sec Loss 9.9646 LearningRate 0.3177 Epoch: 3 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:30:59,291-Speed 18603.92 samples/sec Loss 9.9511 LearningRate 0.3176 Epoch: 3 Global Step: 20530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:31:03,697-Speed 18596.77 samples/sec Loss 9.9409 LearningRate 0.3175 Epoch: 3 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:08,106-Speed 18584.26 samples/sec Loss 9.9310 LearningRate 0.3175 Epoch: 3 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:12,512-Speed 18601.44 samples/sec Loss 9.9681 LearningRate 0.3174 Epoch: 3 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:16,912-Speed 18620.31 samples/sec Loss 9.9375 LearningRate 0.3173 Epoch: 3 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:21,339-Speed 18511.39 samples/sec Loss 9.9586 LearningRate 0.3172 Epoch: 3 Global Step: 20580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:25,737-Speed 18633.08 samples/sec Loss 9.9537 LearningRate 0.3172 Epoch: 3 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:30,142-Speed 18600.75 samples/sec Loss 9.9777 LearningRate 0.3171 Epoch: 3 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:34,569-Speed 18510.74 samples/sec Loss 9.9716 LearningRate 0.3170 Epoch: 3 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:39,055-Speed 18264.47 samples/sec Loss 9.9397 LearningRate 0.3169 Epoch: 3 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:43,513-Speed 18381.92 samples/sec Loss 9.9447 LearningRate 0.3169 Epoch: 3 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:47,950-Speed 18467.86 samples/sec Loss 9.9977 LearningRate 0.3168 Epoch: 3 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:52,406-Speed 18391.93 samples/sec Loss 9.9210 LearningRate 0.3167 Epoch: 3 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:31:56,853-Speed 18429.06 samples/sec Loss 9.9724 LearningRate 0.3166 Epoch: 3 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:01,251-Speed 18628.83 samples/sec Loss 9.9001 LearningRate 0.3166 Epoch: 3 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:05,678-Speed 18511.43 samples/sec Loss 9.8797 LearningRate 0.3165 Epoch: 3 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:10,147-Speed 18334.22 samples/sec Loss 9.8954 LearningRate 0.3164 Epoch: 3 Global Step: 20690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:14,580-Speed 18482.66 samples/sec Loss 9.9283 LearningRate 0.3163 Epoch: 3 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:19,040-Speed 18372.79 samples/sec Loss 9.9756 LearningRate 0.3162 Epoch: 3 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:23,436-Speed 18638.64 samples/sec Loss 9.9609 LearningRate 0.3162 Epoch: 3 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:27,981-Speed 18030.40 samples/sec Loss 9.9565 LearningRate 0.3161 Epoch: 3 Global Step: 20730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:32,355-Speed 18736.63 samples/sec Loss 9.9395 LearningRate 0.3160 Epoch: 3 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:32:51,360-Speed 4310.62 samples/sec Loss 9.8792 LearningRate 0.3159 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:32:55,746-Speed 18685.80 samples/sec Loss 9.9008 LearningRate 0.3159 Epoch: 4 Global Step: 20760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:00,141-Speed 18644.29 samples/sec Loss 9.9452 LearningRate 0.3158 Epoch: 4 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:04,532-Speed 18666.22 samples/sec Loss 9.8822 LearningRate 0.3157 Epoch: 4 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:08,947-Speed 18561.73 samples/sec Loss 9.8583 LearningRate 0.3156 Epoch: 4 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:13,340-Speed 18654.46 samples/sec Loss 9.8942 LearningRate 0.3156 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:17,724-Speed 18690.46 samples/sec Loss 9.9165 LearningRate 0.3155 Epoch: 4 Global Step: 20810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:22,128-Speed 18605.62 samples/sec Loss 9.8317 LearningRate 0.3154 Epoch: 4 Global Step: 20820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:26,613-Speed 18289.58 samples/sec Loss 9.9164 LearningRate 0.3153 Epoch: 4 Global Step: 20830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:31,080-Speed 18346.35 samples/sec Loss 9.9255 LearningRate 0.3153 Epoch: 4 Global Step: 20840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:35,511-Speed 18498.39 samples/sec Loss 9.9017 LearningRate 0.3152 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:39,908-Speed 18641.51 samples/sec Loss 9.9221 LearningRate 0.3151 Epoch: 4 Global Step: 20860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:44,297-Speed 18672.35 samples/sec Loss 9.9103 LearningRate 0.3150 Epoch: 4 Global Step: 20870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:48,695-Speed 18635.46 samples/sec Loss 9.7943 LearningRate 0.3150 Epoch: 4 Global Step: 20880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:53,084-Speed 18667.87 samples/sec Loss 9.8103 LearningRate 0.3149 Epoch: 4 Global Step: 20890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:33:57,466-Speed 18700.70 samples/sec Loss 9.9176 LearningRate 0.3148 Epoch: 4 Global Step: 20900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:01,846-Speed 18712.42 samples/sec Loss 9.8880 LearningRate 0.3147 Epoch: 4 Global Step: 20910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:06,254-Speed 18589.55 samples/sec Loss 9.8751 LearningRate 0.3146 Epoch: 4 Global Step: 20920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:10,663-Speed 18584.80 samples/sec Loss 9.8921 LearningRate 0.3146 Epoch: 4 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:15,057-Speed 18650.74 samples/sec Loss 9.8895 LearningRate 0.3145 Epoch: 4 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:19,462-Speed 18602.34 samples/sec Loss 9.8405 LearningRate 0.3144 Epoch: 4 Global Step: 20950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:34:23,876-Speed 18565.69 samples/sec Loss 9.8951 LearningRate 0.3143 Epoch: 4 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:34:28,334-Speed 18376.29 samples/sec Loss 9.8825 LearningRate 0.3143 Epoch: 4 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:34:32,752-Speed 18549.72 samples/sec Loss 9.8868 LearningRate 0.3142 Epoch: 4 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:34:37,145-Speed 18655.52 samples/sec Loss 9.8702 LearningRate 0.3141 Epoch: 4 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:34:41,527-Speed 18701.40 samples/sec Loss 9.8611 LearningRate 0.3140 Epoch: 4 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:45,918-Speed 18660.34 samples/sec Loss 9.9122 LearningRate 0.3140 Epoch: 4 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:50,396-Speed 18299.60 samples/sec Loss 9.8072 LearningRate 0.3139 Epoch: 4 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:54,824-Speed 18507.46 samples/sec Loss 9.8976 LearningRate 0.3138 Epoch: 4 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:34:59,235-Speed 18576.44 samples/sec Loss 9.9204 LearningRate 0.3137 Epoch: 4 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:35:03,647-Speed 18572.14 samples/sec Loss 9.8324 LearningRate 0.3137 Epoch: 4 Global Step: 21050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:35:08,057-Speed 18581.51 samples/sec Loss 9.8756 LearningRate 0.3136 Epoch: 4 Global Step: 21060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:35:12,452-Speed 18644.92 samples/sec Loss 9.9099 LearningRate 0.3135 Epoch: 4 Global Step: 21070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:16,863-Speed 18578.87 samples/sec Loss 9.8818 LearningRate 0.3134 Epoch: 4 Global Step: 21080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:21,310-Speed 18425.40 samples/sec Loss 9.8242 LearningRate 0.3134 Epoch: 4 Global Step: 21090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:25,741-Speed 18490.58 samples/sec Loss 9.8896 LearningRate 0.3133 Epoch: 4 Global Step: 21100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:30,163-Speed 18531.44 samples/sec Loss 9.8795 LearningRate 0.3132 Epoch: 4 Global Step: 21110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:34,554-Speed 18664.42 samples/sec Loss 9.8538 LearningRate 0.3131 Epoch: 4 Global Step: 21120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:39,036-Speed 18284.22 samples/sec Loss 9.8420 LearningRate 0.3131 Epoch: 4 Global Step: 21130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:43,450-Speed 18564.50 samples/sec Loss 9.8455 LearningRate 0.3130 Epoch: 4 Global Step: 21140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:47,848-Speed 18630.70 samples/sec Loss 9.8621 LearningRate 0.3129 Epoch: 4 Global Step: 21150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:52,237-Speed 18672.04 samples/sec Loss 9.8290 LearningRate 0.3128 Epoch: 4 Global Step: 21160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:35:56,662-Speed 18520.54 samples/sec Loss 9.8704 LearningRate 0.3128 Epoch: 4 Global Step: 21170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:01,080-Speed 18544.43 samples/sec Loss 9.8424 LearningRate 0.3127 Epoch: 4 Global Step: 21180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:05,494-Speed 18565.96 samples/sec Loss 9.8592 LearningRate 0.3126 Epoch: 4 Global Step: 21190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:09,927-Speed 18486.30 samples/sec Loss 9.8594 LearningRate 0.3125 Epoch: 4 Global Step: 21200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:14,428-Speed 18207.02 samples/sec Loss 9.8626 LearningRate 0.3124 Epoch: 4 Global Step: 21210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:18,819-Speed 18663.26 samples/sec Loss 9.8755 LearningRate 0.3124 Epoch: 4 Global Step: 21220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:23,211-Speed 18659.04 samples/sec Loss 9.8796 LearningRate 0.3123 Epoch: 4 Global Step: 21230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:27,631-Speed 18539.23 samples/sec Loss 9.8248 LearningRate 0.3122 Epoch: 4 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:32,076-Speed 18431.78 samples/sec Loss 9.8344 LearningRate 0.3121 Epoch: 4 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:36,493-Speed 18553.76 samples/sec Loss 9.7827 LearningRate 0.3121 Epoch: 4 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:40,896-Speed 18610.24 samples/sec Loss 9.8253 LearningRate 0.3120 Epoch: 4 Global Step: 21270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:36:45,297-Speed 18616.79 samples/sec Loss 9.8173 LearningRate 0.3119 Epoch: 4 Global Step: 21280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:36:49,726-Speed 18503.67 samples/sec Loss 9.8380 LearningRate 0.3118 Epoch: 4 Global Step: 21290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:36:54,176-Speed 18411.64 samples/sec Loss 9.7837 LearningRate 0.3118 Epoch: 4 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:36:58,602-Speed 18512.02 samples/sec Loss 9.8180 LearningRate 0.3117 Epoch: 4 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:37:03,053-Speed 18410.46 samples/sec Loss 9.8355 LearningRate 0.3116 Epoch: 4 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:37:07,549-Speed 18458.37 samples/sec Loss 9.8410 LearningRate 0.3115 Epoch: 4 Global Step: 21330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:37:11,989-Speed 18453.92 samples/sec Loss 9.7976 LearningRate 0.3115 Epoch: 4 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:37:16,406-Speed 18552.66 samples/sec Loss 9.8146 LearningRate 0.3114 Epoch: 4 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:37:20,809-Speed 18609.32 samples/sec Loss 9.8349 LearningRate 0.3113 Epoch: 4 Global Step: 21360 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:28,611-Speed 10501.76 samples/sec Loss 9.8014 LearningRate 0.3112 Epoch: 4 Global Step: 21370 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:33,007-Speed 18641.53 samples/sec Loss 9.8450 LearningRate 0.3112 Epoch: 4 Global Step: 21380 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:37,421-Speed 18565.52 samples/sec Loss 9.7806 LearningRate 0.3111 Epoch: 4 Global Step: 21390 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:41,837-Speed 18553.53 samples/sec Loss 9.8495 LearningRate 0.3110 Epoch: 4 Global Step: 21400 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:46,262-Speed 18525.20 samples/sec Loss 9.8085 LearningRate 0.3109 Epoch: 4 Global Step: 21410 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:50,693-Speed 18488.57 samples/sec Loss 9.7776 LearningRate 0.3109 Epoch: 4 Global Step: 21420 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:55,124-Speed 18495.43 samples/sec Loss 9.7945 LearningRate 0.3108 Epoch: 4 Global Step: 21430 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:37:59,544-Speed 18538.24 samples/sec Loss 9.7888 LearningRate 0.3107 Epoch: 4 Global Step: 21440 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:38:03,968-Speed 18522.31 samples/sec Loss 9.7926 LearningRate 0.3106 Epoch: 4 Global Step: 21450 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-01-14 00:38:08,372-Speed 18605.68 samples/sec Loss 9.7711 LearningRate 0.3106 Epoch: 4 Global Step: 21460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:12,814-Speed 18450.53 samples/sec Loss 9.8229 LearningRate 0.3105 Epoch: 4 Global Step: 21470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:17,246-Speed 18486.38 samples/sec Loss 9.8153 LearningRate 0.3104 Epoch: 4 Global Step: 21480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:21,655-Speed 18585.79 samples/sec Loss 9.7965 LearningRate 0.3103 Epoch: 4 Global Step: 21490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:26,073-Speed 18552.58 samples/sec Loss 9.7482 LearningRate 0.3103 Epoch: 4 Global Step: 21500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:30,475-Speed 18614.14 samples/sec Loss 9.7534 LearningRate 0.3102 Epoch: 4 Global Step: 21510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:34,893-Speed 18550.01 samples/sec Loss 9.7508 LearningRate 0.3101 Epoch: 4 Global Step: 21520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:39,336-Speed 18445.65 samples/sec Loss 9.8122 LearningRate 0.3100 Epoch: 4 Global Step: 21530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:43,757-Speed 18534.20 samples/sec Loss 9.8142 LearningRate 0.3100 Epoch: 4 Global Step: 21540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:48,169-Speed 18573.62 samples/sec Loss 9.7394 LearningRate 0.3099 Epoch: 4 Global Step: 21550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-01-14 00:38:52,560-Speed 18660.88 samples/sec Loss 9.7658 LearningRate 0.3098 Epoch: 4 Global Step: 21560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:38:56,969-Speed 18589.64 samples/sec Loss 9.8139 LearningRate 0.3097 Epoch: 4 Global Step: 21570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:01,369-Speed 18629.91 samples/sec Loss 9.7951 LearningRate 0.3097 Epoch: 4 Global Step: 21580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:05,789-Speed 18541.46 samples/sec Loss 9.7760 LearningRate 0.3096 Epoch: 4 Global Step: 21590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:10,358-Speed 17933.93 samples/sec Loss 9.7981 LearningRate 0.3095 Epoch: 4 Global Step: 21600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:14,772-Speed 18570.33 samples/sec Loss 9.8109 LearningRate 0.3094 Epoch: 4 Global Step: 21610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:19,216-Speed 18436.65 samples/sec Loss 9.7896 LearningRate 0.3093 Epoch: 4 Global Step: 21620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:23,664-Speed 18424.65 samples/sec Loss 9.8261 LearningRate 0.3093 Epoch: 4 Global Step: 21630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:28,148-Speed 18275.91 samples/sec Loss 9.8016 LearningRate 0.3092 Epoch: 4 Global Step: 21640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:32,572-Speed 18518.99 samples/sec Loss 9.7947 LearningRate 0.3091 Epoch: 4 Global Step: 21650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:39:36,972-Speed 18625.12 samples/sec Loss 9.7087 LearningRate 0.3090 Epoch: 4 Global Step: 21660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:39:41,361-Speed 18671.85 samples/sec Loss 9.7702 LearningRate 0.3090 Epoch: 4 Global Step: 21670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:39:45,810-Speed 18415.73 samples/sec Loss 9.7573 LearningRate 0.3089 Epoch: 4 Global Step: 21680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:39:50,269-Speed 18378.44 samples/sec Loss 9.7431 LearningRate 0.3088 Epoch: 4 Global Step: 21690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:39:54,702-Speed 18486.10 samples/sec Loss 9.7847 LearningRate 0.3087 Epoch: 4 Global Step: 21700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:39:59,106-Speed 18603.05 samples/sec Loss 9.7978 LearningRate 0.3087 Epoch: 4 Global Step: 21710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:03,517-Speed 18576.99 samples/sec Loss 9.7589 LearningRate 0.3086 Epoch: 4 Global Step: 21720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:07,903-Speed 18680.54 samples/sec Loss 9.7881 LearningRate 0.3085 Epoch: 4 Global Step: 21730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:12,321-Speed 18546.04 samples/sec Loss 9.7362 LearningRate 0.3084 Epoch: 4 Global Step: 21740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:16,788-Speed 18343.60 samples/sec Loss 9.7599 LearningRate 0.3084 Epoch: 4 Global Step: 21750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:21,198-Speed 18580.84 samples/sec Loss 9.7508 LearningRate 0.3083 Epoch: 4 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:40:25,615-Speed 18555.43 samples/sec Loss 9.7635 LearningRate 0.3082 Epoch: 4 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:40:30,060-Speed 18434.12 samples/sec Loss 9.7312 LearningRate 0.3081 Epoch: 4 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:40:34,527-Speed 18341.30 samples/sec Loss 9.7533 LearningRate 0.3081 Epoch: 4 Global Step: 21790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:38,941-Speed 18565.59 samples/sec Loss 9.7015 LearningRate 0.3080 Epoch: 4 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:43,333-Speed 18659.81 samples/sec Loss 9.7724 LearningRate 0.3079 Epoch: 4 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:47,751-Speed 18546.28 samples/sec Loss 9.7269 LearningRate 0.3078 Epoch: 4 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:52,154-Speed 18611.40 samples/sec Loss 9.7282 LearningRate 0.3078 Epoch: 4 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:40:56,559-Speed 18605.35 samples/sec Loss 9.7644 LearningRate 0.3077 Epoch: 4 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:41:01,005-Speed 18427.37 samples/sec Loss 9.7874 LearningRate 0.3076 Epoch: 4 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:41:05,539-Speed 18074.26 samples/sec Loss 9.7716 LearningRate 0.3075 Epoch: 4 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:41:10,032-Speed 18238.66 samples/sec Loss 9.7636 LearningRate 0.3075 Epoch: 4 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:41:14,486-Speed 18395.37 samples/sec Loss 9.7398 LearningRate 0.3074 Epoch: 4 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:41:18,934-Speed 18424.19 samples/sec Loss 9.7575 LearningRate 0.3073 Epoch: 4 Global Step: 21890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:23,370-Speed 18471.93 samples/sec Loss 9.7401 LearningRate 0.3072 Epoch: 4 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:27,790-Speed 18537.68 samples/sec Loss 9.7370 LearningRate 0.3072 Epoch: 4 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:32,212-Speed 18535.80 samples/sec Loss 9.7433 LearningRate 0.3071 Epoch: 4 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:36,665-Speed 18399.72 samples/sec Loss 9.7452 LearningRate 0.3070 Epoch: 4 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:41,087-Speed 18534.41 samples/sec Loss 9.7636 LearningRate 0.3069 Epoch: 4 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:45,500-Speed 18566.63 samples/sec Loss 9.7372 LearningRate 0.3069 Epoch: 4 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:49,912-Speed 18571.39 samples/sec Loss 9.7083 LearningRate 0.3068 Epoch: 4 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:54,320-Speed 18593.83 samples/sec Loss 9.7459 LearningRate 0.3067 Epoch: 4 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:41:58,736-Speed 18556.35 samples/sec Loss 9.7471 LearningRate 0.3066 Epoch: 4 Global Step: 21980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:03,124-Speed 18671.21 samples/sec Loss 9.6923 LearningRate 0.3066 Epoch: 4 Global Step: 21990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:07,541-Speed 18553.96 samples/sec Loss 9.7760 LearningRate 0.3065 Epoch: 4 Global Step: 22000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:11,952-Speed 18575.84 samples/sec Loss 9.7410 LearningRate 0.3064 Epoch: 4 Global Step: 22010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:16,380-Speed 18506.70 samples/sec Loss 9.6957 LearningRate 0.3063 Epoch: 4 Global Step: 22020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:20,794-Speed 18565.93 samples/sec Loss 9.7368 LearningRate 0.3063 Epoch: 4 Global Step: 22030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:25,177-Speed 18694.50 samples/sec Loss 9.7586 LearningRate 0.3062 Epoch: 4 Global Step: 22040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:29,590-Speed 18568.97 samples/sec Loss 9.7423 LearningRate 0.3061 Epoch: 4 Global Step: 22050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:33,994-Speed 18604.87 samples/sec Loss 9.7203 LearningRate 0.3060 Epoch: 4 Global Step: 22060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:38,433-Speed 18461.27 samples/sec Loss 9.6496 LearningRate 0.3060 Epoch: 4 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:42,846-Speed 18568.41 samples/sec Loss 9.6747 LearningRate 0.3059 Epoch: 4 Global Step: 22080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:42:47,248-Speed 18615.34 samples/sec Loss 9.7471 LearningRate 0.3058 Epoch: 4 Global Step: 22090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:51,677-Speed 18502.31 samples/sec Loss 9.7547 LearningRate 0.3057 Epoch: 4 Global Step: 22100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:42:56,160-Speed 18277.67 samples/sec Loss 9.6653 LearningRate 0.3057 Epoch: 4 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:00,625-Speed 18350.28 samples/sec Loss 9.7126 LearningRate 0.3056 Epoch: 4 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:05,100-Speed 18311.48 samples/sec Loss 9.6960 LearningRate 0.3055 Epoch: 4 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:09,486-Speed 18681.30 samples/sec Loss 9.7352 LearningRate 0.3054 Epoch: 4 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:13,887-Speed 18622.89 samples/sec Loss 9.7545 LearningRate 0.3054 Epoch: 4 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:18,323-Speed 18469.03 samples/sec Loss 9.7260 LearningRate 0.3053 Epoch: 4 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:22,786-Speed 18362.15 samples/sec Loss 9.7248 LearningRate 0.3052 Epoch: 4 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:27,188-Speed 18613.26 samples/sec Loss 9.6736 LearningRate 0.3051 Epoch: 4 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:31,590-Speed 18616.54 samples/sec Loss 9.7459 LearningRate 0.3051 Epoch: 4 Global Step: 22190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:43:35,989-Speed 18625.74 samples/sec Loss 9.6927 LearningRate 0.3050 Epoch: 4 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:43:40,435-Speed 18428.11 samples/sec Loss 9.6871 LearningRate 0.3049 Epoch: 4 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:43:44,858-Speed 18529.07 samples/sec Loss 9.7064 LearningRate 0.3048 Epoch: 4 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:49,256-Speed 18631.45 samples/sec Loss 9.6255 LearningRate 0.3048 Epoch: 4 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:53,663-Speed 18597.04 samples/sec Loss 9.6938 LearningRate 0.3047 Epoch: 4 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:43:58,132-Speed 18334.31 samples/sec Loss 9.6973 LearningRate 0.3046 Epoch: 4 Global Step: 22250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:02,546-Speed 18563.14 samples/sec Loss 9.7266 LearningRate 0.3045 Epoch: 4 Global Step: 22260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:06,950-Speed 18609.03 samples/sec Loss 9.7208 LearningRate 0.3045 Epoch: 4 Global Step: 22270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:11,361-Speed 18575.86 samples/sec Loss 9.7100 LearningRate 0.3044 Epoch: 4 Global Step: 22280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:15,841-Speed 18290.12 samples/sec Loss 9.6808 LearningRate 0.3043 Epoch: 4 Global Step: 22290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:20,260-Speed 18545.28 samples/sec Loss 9.6556 LearningRate 0.3042 Epoch: 4 Global Step: 22300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:24,699-Speed 18459.84 samples/sec Loss 9.7045 LearningRate 0.3042 Epoch: 4 Global Step: 22310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:29,131-Speed 18491.64 samples/sec Loss 9.6914 LearningRate 0.3041 Epoch: 4 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:44:33,559-Speed 18502.41 samples/sec Loss 9.6775 LearningRate 0.3040 Epoch: 4 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:44:37,958-Speed 18631.67 samples/sec Loss 9.6786 LearningRate 0.3039 Epoch: 4 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:44:42,392-Speed 18479.61 samples/sec Loss 9.6521 LearningRate 0.3039 Epoch: 4 Global Step: 22350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:44:46,822-Speed 18502.64 samples/sec Loss 9.7221 LearningRate 0.3038 Epoch: 4 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:44:51,270-Speed 18427.54 samples/sec Loss 9.7072 LearningRate 0.3037 Epoch: 4 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:44:55,730-Speed 18373.95 samples/sec Loss 9.6709 LearningRate 0.3036 Epoch: 4 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:00,211-Speed 18282.80 samples/sec Loss 9.6772 LearningRate 0.3036 Epoch: 4 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:04,631-Speed 18538.10 samples/sec Loss 9.6413 LearningRate 0.3035 Epoch: 4 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:09,067-Speed 18475.39 samples/sec Loss 9.6391 LearningRate 0.3034 Epoch: 4 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:13,496-Speed 18508.77 samples/sec Loss 9.6465 LearningRate 0.3033 Epoch: 4 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:17,990-Speed 18230.01 samples/sec Loss 9.6936 LearningRate 0.3033 Epoch: 4 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:22,406-Speed 18564.54 samples/sec Loss 9.6304 LearningRate 0.3032 Epoch: 4 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:26,804-Speed 18634.72 samples/sec Loss 9.6670 LearningRate 0.3031 Epoch: 4 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:31,220-Speed 18557.50 samples/sec Loss 9.6455 LearningRate 0.3030 Epoch: 4 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:45:35,626-Speed 18599.65 samples/sec Loss 9.6608 LearningRate 0.3030 Epoch: 4 Global Step: 22470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:45:40,069-Speed 18438.18 samples/sec Loss 9.6421 LearningRate 0.3029 Epoch: 4 Global Step: 22480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:45:44,462-Speed 18658.45 samples/sec Loss 9.6182 LearningRate 0.3028 Epoch: 4 Global Step: 22490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:45:48,862-Speed 18620.46 samples/sec Loss 9.6748 LearningRate 0.3027 Epoch: 4 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:45:53,249-Speed 18682.30 samples/sec Loss 9.6370 LearningRate 0.3027 Epoch: 4 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:45:57,661-Speed 18570.08 samples/sec Loss 9.6456 LearningRate 0.3026 Epoch: 4 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:02,060-Speed 18625.57 samples/sec Loss 9.6443 LearningRate 0.3025 Epoch: 4 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:06,485-Speed 18519.68 samples/sec Loss 9.6483 LearningRate 0.3025 Epoch: 4 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:10,924-Speed 18457.33 samples/sec Loss 9.6585 LearningRate 0.3024 Epoch: 4 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:15,339-Speed 18563.52 samples/sec Loss 9.6694 LearningRate 0.3023 Epoch: 4 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:19,780-Speed 18450.96 samples/sec Loss 9.6755 LearningRate 0.3022 Epoch: 4 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:24,211-Speed 18493.95 samples/sec Loss 9.6587 LearningRate 0.3022 Epoch: 4 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:28,651-Speed 18452.82 samples/sec Loss 9.6731 LearningRate 0.3021 Epoch: 4 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:33,077-Speed 18513.71 samples/sec Loss 9.6923 LearningRate 0.3020 Epoch: 4 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:37,475-Speed 18631.77 samples/sec Loss 9.6520 LearningRate 0.3019 Epoch: 4 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:41,888-Speed 18567.19 samples/sec Loss 9.6345 LearningRate 0.3019 Epoch: 4 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:46,294-Speed 18603.94 samples/sec Loss 9.6481 LearningRate 0.3018 Epoch: 4 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:50,752-Speed 18379.36 samples/sec Loss 9.6289 LearningRate 0.3017 Epoch: 4 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:55,198-Speed 18431.50 samples/sec Loss 9.6079 LearningRate 0.3016 Epoch: 4 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:46:59,642-Speed 18437.57 samples/sec Loss 9.6098 LearningRate 0.3016 Epoch: 4 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:47:04,079-Speed 18467.55 samples/sec Loss 9.6560 LearningRate 0.3015 Epoch: 4 Global Step: 22670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:08,511-Speed 18489.88 samples/sec Loss 9.6512 LearningRate 0.3014 Epoch: 4 Global Step: 22680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:12,933-Speed 18529.65 samples/sec Loss 9.6741 LearningRate 0.3013 Epoch: 4 Global Step: 22690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:17,385-Speed 18406.41 samples/sec Loss 9.5932 LearningRate 0.3013 Epoch: 4 Global Step: 22700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:21,841-Speed 18387.75 samples/sec Loss 9.6329 LearningRate 0.3012 Epoch: 4 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:26,268-Speed 18510.83 samples/sec Loss 9.6243 LearningRate 0.3011 Epoch: 4 Global Step: 22720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:30,687-Speed 18541.88 samples/sec Loss 9.6067 LearningRate 0.3010 Epoch: 4 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:35,108-Speed 18532.66 samples/sec Loss 9.6796 LearningRate 0.3010 Epoch: 4 Global Step: 22740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:39,556-Speed 18425.99 samples/sec Loss 9.6519 LearningRate 0.3009 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:43,986-Speed 18495.07 samples/sec Loss 9.6061 LearningRate 0.3008 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:47:48,413-Speed 18510.72 samples/sec Loss 9.5963 LearningRate 0.3007 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:47:52,838-Speed 18517.70 samples/sec Loss 9.5859 LearningRate 0.3007 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:47:57,256-Speed 18543.75 samples/sec Loss 9.5954 LearningRate 0.3006 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:48:01,706-Speed 18415.58 samples/sec Loss 9.6326 LearningRate 0.3005 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:48:06,198-Speed 18239.57 samples/sec Loss 9.5999 LearningRate 0.3004 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:48:10,679-Speed 18287.54 samples/sec Loss 9.5976 LearningRate 0.3004 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:48:15,120-Speed 18451.08 samples/sec Loss 9.5643 LearningRate 0.3003 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:48:19,538-Speed 18546.97 samples/sec Loss 9.5409 LearningRate 0.3002 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:23,951-Speed 18570.82 samples/sec Loss 9.5932 LearningRate 0.3001 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:28,396-Speed 18431.28 samples/sec Loss 9.6065 LearningRate 0.3001 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:32,818-Speed 18528.34 samples/sec Loss 9.6074 LearningRate 0.3000 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:37,205-Speed 18679.99 samples/sec Loss 9.6242 LearningRate 0.2999 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:41,646-Speed 18449.94 samples/sec Loss 9.5959 LearningRate 0.2998 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:46,073-Speed 18508.37 samples/sec Loss 9.5492 LearningRate 0.2998 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:50,473-Speed 18623.39 samples/sec Loss 9.5780 LearningRate 0.2997 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:54,952-Speed 18294.32 samples/sec Loss 9.6147 LearningRate 0.2996 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:48:59,393-Speed 18449.95 samples/sec Loss 9.6595 LearningRate 0.2996 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:49:03,905-Speed 18161.52 samples/sec Loss 9.5945 LearningRate 0.2995 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:08,334-Speed 18502.86 samples/sec Loss 9.5911 LearningRate 0.2994 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:12,771-Speed 18467.53 samples/sec Loss 9.6454 LearningRate 0.2993 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:17,205-Speed 18482.72 samples/sec Loss 9.5999 LearningRate 0.2993 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:21,706-Speed 18201.69 samples/sec Loss 9.6318 LearningRate 0.2992 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:26,120-Speed 18564.84 samples/sec Loss 9.6009 LearningRate 0.2991 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:30,553-Speed 18482.18 samples/sec Loss 9.5826 LearningRate 0.2990 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:34,968-Speed 18558.42 samples/sec Loss 9.5579 LearningRate 0.2990 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:39,388-Speed 18537.58 samples/sec Loss 9.5728 LearningRate 0.2989 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:43,784-Speed 18641.98 samples/sec Loss 9.5912 LearningRate 0.2988 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:48,217-Speed 18481.66 samples/sec Loss 9.5747 LearningRate 0.2987 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:52,657-Speed 18458.33 samples/sec Loss 9.5598 LearningRate 0.2987 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:49:57,077-Speed 18532.59 samples/sec Loss 9.6493 LearningRate 0.2986 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:50:01,501-Speed 18520.86 samples/sec Loss 9.5680 LearningRate 0.2985 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:50:05,934-Speed 18485.32 samples/sec Loss 9.5857 LearningRate 0.2984 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:10,413-Speed 18293.36 samples/sec Loss 9.5936 LearningRate 0.2984 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:14,860-Speed 18424.13 samples/sec Loss 9.5470 LearningRate 0.2983 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:19,327-Speed 18342.86 samples/sec Loss 9.5834 LearningRate 0.2982 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:23,834-Speed 18183.39 samples/sec Loss 9.5718 LearningRate 0.2981 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:28,254-Speed 18538.20 samples/sec Loss 9.5871 LearningRate 0.2981 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:32,673-Speed 18542.00 samples/sec Loss 9.5721 LearningRate 0.2980 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:37,103-Speed 18503.92 samples/sec Loss 9.5261 LearningRate 0.2979 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:41,580-Speed 18301.19 samples/sec Loss 9.5923 LearningRate 0.2978 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:46,045-Speed 18351.82 samples/sec Loss 9.6197 LearningRate 0.2978 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:50:50,472-Speed 18510.67 samples/sec Loss 9.5961 LearningRate 0.2977 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:50:54,908-Speed 18472.56 samples/sec Loss 9.5549 LearningRate 0.2976 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:50:59,322-Speed 18565.82 samples/sec Loss 9.5327 LearningRate 0.2976 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:03,835-Speed 18165.14 samples/sec Loss 9.5453 LearningRate 0.2975 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:08,287-Speed 18413.64 samples/sec Loss 9.5845 LearningRate 0.2974 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:12,714-Speed 18508.80 samples/sec Loss 9.5989 LearningRate 0.2973 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:17,113-Speed 18630.47 samples/sec Loss 9.5736 LearningRate 0.2973 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:21,514-Speed 18617.56 samples/sec Loss 9.5674 LearningRate 0.2972 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:25,923-Speed 18587.67 samples/sec Loss 9.5340 LearningRate 0.2971 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:30,328-Speed 18600.29 samples/sec Loss 9.5542 LearningRate 0.2970 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:34,775-Speed 18434.27 samples/sec Loss 9.5379 LearningRate 0.2970 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-14 00:51:39,200-Speed 18520.05 samples/sec Loss 9.5600 LearningRate 0.2969 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:43,562-Speed 18783.68 samples/sec Loss 9.5868 LearningRate 0.2968 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:51:47,988-Speed 18516.23 samples/sec Loss 9.5646 LearningRate 0.2967 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:51:52,462-Speed 18313.06 samples/sec Loss 9.5782 LearningRate 0.2967 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:51:56,877-Speed 18563.34 samples/sec Loss 9.5736 LearningRate 0.2966 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:01,293-Speed 18552.25 samples/sec Loss 9.5887 LearningRate 0.2965 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:05,675-Speed 18699.03 samples/sec Loss 9.5214 LearningRate 0.2964 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:10,053-Speed 18718.72 samples/sec Loss 9.5558 LearningRate 0.2964 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:14,447-Speed 18653.51 samples/sec Loss 9.5441 LearningRate 0.2963 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:18,840-Speed 18650.54 samples/sec Loss 9.5459 LearningRate 0.2962 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:23,258-Speed 18550.83 samples/sec Loss 9.5682 LearningRate 0.2961 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:27,640-Speed 18698.35 samples/sec Loss 9.5197 LearningRate 0.2961 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:52:32,031-Speed 18659.58 samples/sec Loss 9.5654 LearningRate 0.2960 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:52:36,442-Speed 18580.47 samples/sec Loss 9.5461 LearningRate 0.2959 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:52:40,901-Speed 18375.71 samples/sec Loss 9.5233 LearningRate 0.2959 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:52:45,313-Speed 18571.89 samples/sec Loss 9.5094 LearningRate 0.2958 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:52:49,715-Speed 18616.30 samples/sec Loss 9.4882 LearningRate 0.2957 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:52:54,102-Speed 18679.16 samples/sec Loss 9.5648 LearningRate 0.2956 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:52:58,508-Speed 18598.51 samples/sec Loss 9.5619 LearningRate 0.2956 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:53:02,887-Speed 18714.53 samples/sec Loss 9.5527 LearningRate 0.2955 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:53:07,292-Speed 18602.86 samples/sec Loss 9.5245 LearningRate 0.2954 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:53:11,725-Speed 18481.05 samples/sec Loss 9.5236 LearningRate 0.2953 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:53:16,113-Speed 18675.44 samples/sec Loss 9.5070 LearningRate 0.2953 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-14 00:53:20,515-Speed 18612.53 samples/sec Loss 9.5977 LearningRate 0.2952 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-14 00:53:24,924-Speed 18587.84 samples/sec Loss 9.6074 LearningRate 0.2951 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:53:29,308-Speed 18690.57 samples/sec Loss 9.5440 LearningRate 0.2950 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:53:33,718-Speed 18580.30 samples/sec Loss 9.5283 LearningRate 0.2950 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:53:38,147-Speed 18503.13 samples/sec Loss 9.5210 LearningRate 0.2949 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:53:42,579-Speed 18486.37 samples/sec Loss 9.4940 LearningRate 0.2948 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:53:46,993-Speed 18566.99 samples/sec Loss 9.4965 LearningRate 0.2947 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:53:51,441-Speed 18420.20 samples/sec Loss 9.5166 LearningRate 0.2947 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:53:55,856-Speed 18558.33 samples/sec Loss 9.5588 LearningRate 0.2946 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:00,265-Speed 18585.93 samples/sec Loss 9.5132 LearningRate 0.2945 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:04,695-Speed 18493.95 samples/sec Loss 9.5004 LearningRate 0.2945 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:09,072-Speed 18724.37 samples/sec Loss 9.5051 LearningRate 0.2944 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:13,551-Speed 18294.61 samples/sec Loss 9.4753 LearningRate 0.2943 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:54:17,972-Speed 18535.51 samples/sec Loss 9.4904 LearningRate 0.2942 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:54:22,422-Speed 18412.60 samples/sec Loss 9.4972 LearningRate 0.2942 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:54:26,891-Speed 18337.73 samples/sec Loss 9.5144 LearningRate 0.2941 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:54:31,318-Speed 18507.53 samples/sec Loss 9.5033 LearningRate 0.2940 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:54:35,742-Speed 18521.60 samples/sec Loss 9.5198 LearningRate 0.2939 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:54:40,139-Speed 18639.48 samples/sec Loss 9.5246 LearningRate 0.2939 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:44,547-Speed 18589.26 samples/sec Loss 9.5473 LearningRate 0.2938 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:49,000-Speed 18400.23 samples/sec Loss 9.4938 LearningRate 0.2937 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:53,436-Speed 18473.99 samples/sec Loss 9.5334 LearningRate 0.2936 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:54:57,841-Speed 18603.62 samples/sec Loss 9.5306 LearningRate 0.2936 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:02,309-Speed 18338.90 samples/sec Loss 9.4834 LearningRate 0.2935 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:06,698-Speed 18670.24 samples/sec Loss 9.5180 LearningRate 0.2934 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:12,639-Speed 13791.56 samples/sec Loss 9.5301 LearningRate 0.2934 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:17,039-Speed 18621.05 samples/sec Loss 9.5314 LearningRate 0.2933 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:21,448-Speed 18586.66 samples/sec Loss 9.4794 LearningRate 0.2932 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:25,857-Speed 18586.03 samples/sec Loss 9.5013 LearningRate 0.2931 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:55:30,277-Speed 18538.15 samples/sec Loss 9.5285 LearningRate 0.2931 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:55:34,772-Speed 18228.67 samples/sec Loss 9.4829 LearningRate 0.2930 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:55:39,168-Speed 18642.74 samples/sec Loss 9.5360 LearningRate 0.2929 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:55:43,590-Speed 18532.39 samples/sec Loss 9.4893 LearningRate 0.2928 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:48,025-Speed 18477.31 samples/sec Loss 9.4867 LearningRate 0.2928 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:52,478-Speed 18399.99 samples/sec Loss 9.4961 LearningRate 0.2927 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:55:56,898-Speed 18539.64 samples/sec Loss 9.4757 LearningRate 0.2926 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:56:01,369-Speed 18327.37 samples/sec Loss 9.4429 LearningRate 0.2925 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:56:05,772-Speed 18608.50 samples/sec Loss 9.4764 LearningRate 0.2925 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:56:10,165-Speed 18655.56 samples/sec Loss 9.4529 LearningRate 0.2924 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:56:14,584-Speed 18543.36 samples/sec Loss 9.4842 LearningRate 0.2923 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:56:18,968-Speed 18695.83 samples/sec Loss 9.4699 LearningRate 0.2923 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:56:23,357-Speed 18674.18 samples/sec Loss 9.4606 LearningRate 0.2922 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:56:27,794-Speed 18470.54 samples/sec Loss 9.4509 LearningRate 0.2921 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:56:32,185-Speed 18678.38 samples/sec Loss 9.4160 LearningRate 0.2920 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:56:36,609-Speed 18523.08 samples/sec Loss 9.4577 LearningRate 0.2920 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:56:40,993-Speed 18691.65 samples/sec Loss 9.4667 LearningRate 0.2919 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:56:45,442-Speed 18421.72 samples/sec Loss 9.4351 LearningRate 0.2918 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:56:49,850-Speed 18587.75 samples/sec Loss 9.4320 LearningRate 0.2917 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:56:54,346-Speed 18230.90 samples/sec Loss 9.4970 LearningRate 0.2917 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:56:58,772-Speed 18517.88 samples/sec Loss 9.5015 LearningRate 0.2916 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:57:03,173-Speed 18619.84 samples/sec Loss 9.4263 LearningRate 0.2915 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:57:07,593-Speed 18536.57 samples/sec Loss 9.4375 LearningRate 0.2914 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:57:12,029-Speed 18471.54 samples/sec Loss 9.4518 LearningRate 0.2914 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:57:16,474-Speed 18435.29 samples/sec Loss 9.4185 LearningRate 0.2913 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:57:20,856-Speed 18701.70 samples/sec Loss 9.4877 LearningRate 0.2912 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:25,319-Speed 18369.47 samples/sec Loss 9.4289 LearningRate 0.2912 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:29,758-Speed 18464.69 samples/sec Loss 9.4646 LearningRate 0.2911 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:34,164-Speed 18595.87 samples/sec Loss 9.4410 LearningRate 0.2910 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:38,622-Speed 18381.18 samples/sec Loss 9.4429 LearningRate 0.2909 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:43,005-Speed 18693.51 samples/sec Loss 9.4649 LearningRate 0.2909 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:47,427-Speed 18536.04 samples/sec Loss 9.4801 LearningRate 0.2908 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:51,851-Speed 18520.78 samples/sec Loss 9.4667 LearningRate 0.2907 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:57:56,272-Speed 18535.25 samples/sec Loss 9.4041 LearningRate 0.2906 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:58:00,693-Speed 18535.90 samples/sec Loss 9.4798 LearningRate 0.2906 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:58:05,151-Speed 18380.48 samples/sec Loss 9.4475 LearningRate 0.2905 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:58:09,610-Speed 18376.19 samples/sec Loss 9.4737 LearningRate 0.2904 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:58:14,065-Speed 18394.16 samples/sec Loss 9.4333 LearningRate 0.2903 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:58:18,477-Speed 18573.51 samples/sec Loss 9.4541 LearningRate 0.2903 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:58:22,910-Speed 18485.65 samples/sec Loss 9.4687 LearningRate 0.2902 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:27,352-Speed 18447.35 samples/sec Loss 9.3958 LearningRate 0.2901 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:31,775-Speed 18530.28 samples/sec Loss 9.3666 LearningRate 0.2901 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:36,188-Speed 18565.68 samples/sec Loss 9.4058 LearningRate 0.2900 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:40,620-Speed 18487.84 samples/sec Loss 9.4186 LearningRate 0.2899 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:45,043-Speed 18525.35 samples/sec Loss 9.4748 LearningRate 0.2898 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:49,485-Speed 18447.82 samples/sec Loss 9.4092 LearningRate 0.2898 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:53,915-Speed 18498.52 samples/sec Loss 9.3931 LearningRate 0.2897 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:58:58,317-Speed 18612.27 samples/sec Loss 9.4387 LearningRate 0.2896 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:59:02,778-Speed 18370.82 samples/sec Loss 9.3870 LearningRate 0.2895 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 00:59:07,255-Speed 18301.29 samples/sec Loss 9.3819 LearningRate 0.2895 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:11,690-Speed 18475.14 samples/sec Loss 9.4078 LearningRate 0.2894 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:16,124-Speed 18480.45 samples/sec Loss 9.3831 LearningRate 0.2893 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:20,537-Speed 18564.17 samples/sec Loss 9.4274 LearningRate 0.2893 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:24,932-Speed 18644.29 samples/sec Loss 9.4325 LearningRate 0.2892 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:29,347-Speed 18557.21 samples/sec Loss 9.4109 LearningRate 0.2891 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:33,793-Speed 18430.48 samples/sec Loss 9.4251 LearningRate 0.2890 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:38,227-Speed 18481.84 samples/sec Loss 9.3732 LearningRate 0.2890 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:42,657-Speed 18498.03 samples/sec Loss 9.4156 LearningRate 0.2889 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:47,122-Speed 18354.28 samples/sec Loss 9.4182 LearningRate 0.2888 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 00:59:51,551-Speed 18498.31 samples/sec Loss 9.4356 LearningRate 0.2887 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 00:59:55,945-Speed 18650.27 samples/sec Loss 9.3895 LearningRate 0.2887 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:00:00,368-Speed 18526.88 samples/sec Loss 9.4422 LearningRate 0.2886 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:00:04,802-Speed 18479.42 samples/sec Loss 9.3944 LearningRate 0.2885 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:09,217-Speed 18561.57 samples/sec Loss 9.4409 LearningRate 0.2885 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:13,675-Speed 18378.82 samples/sec Loss 9.4106 LearningRate 0.2884 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:18,103-Speed 18505.11 samples/sec Loss 9.3946 LearningRate 0.2883 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:22,522-Speed 18544.88 samples/sec Loss 9.4178 LearningRate 0.2882 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:26,927-Speed 18602.46 samples/sec Loss 9.4034 LearningRate 0.2882 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:31,319-Speed 18661.63 samples/sec Loss 9.3770 LearningRate 0.2881 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:35,761-Speed 18445.82 samples/sec Loss 9.4264 LearningRate 0.2880 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:40,184-Speed 18524.18 samples/sec Loss 9.4043 LearningRate 0.2879 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:44,595-Speed 18581.74 samples/sec Loss 9.3604 LearningRate 0.2879 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:00:49,016-Speed 18532.89 samples/sec Loss 9.4311 LearningRate 0.2878 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:00:53,404-Speed 18676.83 samples/sec Loss 9.3606 LearningRate 0.2877 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:00:57,817-Speed 18574.39 samples/sec Loss 9.4383 LearningRate 0.2877 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:01:02,248-Speed 18493.33 samples/sec Loss 9.4025 LearningRate 0.2876 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:01:06,660-Speed 18572.87 samples/sec Loss 9.3616 LearningRate 0.2875 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:01:11,108-Speed 18419.62 samples/sec Loss 9.4324 LearningRate 0.2874 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:01:15,525-Speed 18547.65 samples/sec Loss 9.3849 LearningRate 0.2874 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:01:19,975-Speed 18422.10 samples/sec Loss 9.4368 LearningRate 0.2873 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:24,407-Speed 18489.83 samples/sec Loss 9.3859 LearningRate 0.2872 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:28,834-Speed 18509.87 samples/sec Loss 9.3626 LearningRate 0.2871 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:33,360-Speed 18103.12 samples/sec Loss 9.3286 LearningRate 0.2871 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:37,821-Speed 18384.84 samples/sec Loss 9.4022 LearningRate 0.2870 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:42,263-Speed 18444.17 samples/sec Loss 9.3788 LearningRate 0.2869 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:46,724-Speed 18370.09 samples/sec Loss 9.3525 LearningRate 0.2869 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:51,165-Speed 18446.36 samples/sec Loss 9.3530 LearningRate 0.2868 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:55,590-Speed 18520.52 samples/sec Loss 9.3861 LearningRate 0.2867 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:01:59,999-Speed 18586.00 samples/sec Loss 9.4587 LearningRate 0.2866 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:04,514-Speed 18146.58 samples/sec Loss 9.3527 LearningRate 0.2866 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:02:08,944-Speed 18494.89 samples/sec Loss 9.3800 LearningRate 0.2865 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:02:13,361-Speed 18548.39 samples/sec Loss 9.3856 LearningRate 0.2864 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:02:17,739-Speed 18716.00 samples/sec Loss 9.3822 LearningRate 0.2863 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:02:22,154-Speed 18559.14 samples/sec Loss 9.3708 LearningRate 0.2863 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:02:26,575-Speed 18531.71 samples/sec Loss 9.3833 LearningRate 0.2862 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:30,995-Speed 18540.10 samples/sec Loss 9.3692 LearningRate 0.2861 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:35,463-Speed 18338.76 samples/sec Loss 9.3791 LearningRate 0.2861 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:39,940-Speed 18303.52 samples/sec Loss 9.3597 LearningRate 0.2860 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:44,384-Speed 18438.28 samples/sec Loss 9.3751 LearningRate 0.2859 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:48,817-Speed 18481.24 samples/sec Loss 9.3803 LearningRate 0.2858 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:53,260-Speed 18445.75 samples/sec Loss 9.3747 LearningRate 0.2858 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:02:57,696-Speed 18470.31 samples/sec Loss 9.3903 LearningRate 0.2857 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:02,110-Speed 18563.07 samples/sec Loss 9.3579 LearningRate 0.2856 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:06,523-Speed 18568.44 samples/sec Loss 9.3148 LearningRate 0.2855 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:10,958-Speed 18476.34 samples/sec Loss 9.3416 LearningRate 0.2855 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:03:15,365-Speed 18592.24 samples/sec Loss 9.3643 LearningRate 0.2854 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:03:19,764-Speed 18624.49 samples/sec Loss 9.3425 LearningRate 0.2853 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:03:24,182-Speed 18547.56 samples/sec Loss 9.3474 LearningRate 0.2853 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:03:28,647-Speed 18348.24 samples/sec Loss 9.3578 LearningRate 0.2852 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:03:33,096-Speed 18422.06 samples/sec Loss 9.3516 LearningRate 0.2851 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:37,512-Speed 18552.16 samples/sec Loss 9.3503 LearningRate 0.2850 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:41,976-Speed 18359.01 samples/sec Loss 9.3747 LearningRate 0.2850 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:46,405-Speed 18499.03 samples/sec Loss 9.3545 LearningRate 0.2849 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:50,801-Speed 18640.94 samples/sec Loss 9.3409 LearningRate 0.2848 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:55,207-Speed 18594.07 samples/sec Loss 9.3517 LearningRate 0.2848 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:03:59,661-Speed 18396.55 samples/sec Loss 9.3569 LearningRate 0.2847 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:04,104-Speed 18443.69 samples/sec Loss 9.3319 LearningRate 0.2846 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:08,575-Speed 18327.95 samples/sec Loss 9.3659 LearningRate 0.2845 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:12,991-Speed 18554.75 samples/sec Loss 9.3703 LearningRate 0.2845 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:17,391-Speed 18624.25 samples/sec Loss 9.2904 LearningRate 0.2844 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:21,778-Speed 18674.14 samples/sec Loss 9.2983 LearningRate 0.2843 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:26,203-Speed 18519.48 samples/sec Loss 9.3579 LearningRate 0.2842 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:30,632-Speed 18504.94 samples/sec Loss 9.2967 LearningRate 0.2842 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:35,079-Speed 18427.08 samples/sec Loss 9.3684 LearningRate 0.2841 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:39,520-Speed 18452.39 samples/sec Loss 9.3277 LearningRate 0.2840 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:43,998-Speed 18296.79 samples/sec Loss 9.3548 LearningRate 0.2840 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:48,416-Speed 18545.56 samples/sec Loss 9.3603 LearningRate 0.2839 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:52,838-Speed 18532.61 samples/sec Loss 9.3651 LearningRate 0.2838 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:04:57,346-Speed 18178.69 samples/sec Loss 9.3259 LearningRate 0.2837 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:01,781-Speed 18481.11 samples/sec Loss 9.3575 LearningRate 0.2837 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:05:06,219-Speed 18463.57 samples/sec Loss 9.3203 LearningRate 0.2836 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:10,684-Speed 18355.75 samples/sec Loss 9.3059 LearningRate 0.2835 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:15,110-Speed 18513.83 samples/sec Loss 9.3018 LearningRate 0.2835 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:19,518-Speed 18587.36 samples/sec Loss 9.3379 LearningRate 0.2834 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:23,948-Speed 18498.97 samples/sec Loss 9.3526 LearningRate 0.2833 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:28,373-Speed 18519.93 samples/sec Loss 9.2859 LearningRate 0.2832 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:32,782-Speed 18586.37 samples/sec Loss 9.3025 LearningRate 0.2832 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:37,178-Speed 18643.15 samples/sec Loss 9.3550 LearningRate 0.2831 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:41,585-Speed 18600.82 samples/sec Loss 9.3019 LearningRate 0.2830 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:45,967-Speed 18697.35 samples/sec Loss 9.2758 LearningRate 0.2829 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:50,387-Speed 18536.43 samples/sec Loss 9.3453 LearningRate 0.2829 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:05:54,820-Speed 18489.34 samples/sec Loss 9.3491 LearningRate 0.2828 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:05:59,240-Speed 18538.42 samples/sec Loss 9.3112 LearningRate 0.2827 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:03,673-Speed 18485.24 samples/sec Loss 9.2997 LearningRate 0.2827 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:08,103-Speed 18494.37 samples/sec Loss 9.3403 LearningRate 0.2826 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:12,516-Speed 18573.22 samples/sec Loss 9.2487 LearningRate 0.2825 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:16,916-Speed 18622.05 samples/sec Loss 9.2638 LearningRate 0.2824 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:21,318-Speed 18615.98 samples/sec Loss 9.2604 LearningRate 0.2824 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:25,740-Speed 18534.03 samples/sec Loss 9.3388 LearningRate 0.2823 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:30,163-Speed 18528.23 samples/sec Loss 9.2865 LearningRate 0.2822 Epoch: 4 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:34,572-Speed 18586.94 samples/sec Loss 9.3107 LearningRate 0.2822 Epoch: 4 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:06:38,963-Speed 18661.93 samples/sec Loss 9.3097 LearningRate 0.2821 Epoch: 4 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:06:43,396-Speed 18484.26 samples/sec Loss 9.2818 LearningRate 0.2820 Epoch: 4 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:06:47,817-Speed 18538.17 samples/sec Loss 9.2596 LearningRate 0.2819 Epoch: 4 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:06:52,246-Speed 18501.95 samples/sec Loss 9.3070 LearningRate 0.2819 Epoch: 4 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:06:56,656-Speed 18580.57 samples/sec Loss 9.3012 LearningRate 0.2818 Epoch: 4 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:07:01,061-Speed 18606.50 samples/sec Loss 9.3105 LearningRate 0.2817 Epoch: 4 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:07:05,463-Speed 18613.16 samples/sec Loss 9.3275 LearningRate 0.2816 Epoch: 4 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:09,892-Speed 18509.11 samples/sec Loss 9.3190 LearningRate 0.2816 Epoch: 4 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:14,375-Speed 18282.32 samples/sec Loss 9.2714 LearningRate 0.2815 Epoch: 4 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:18,839-Speed 18358.75 samples/sec Loss 9.2822 LearningRate 0.2814 Epoch: 4 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:23,245-Speed 18603.59 samples/sec Loss 9.2593 LearningRate 0.2814 Epoch: 4 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:27,642-Speed 18633.26 samples/sec Loss 9.2468 LearningRate 0.2813 Epoch: 4 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:32,075-Speed 18484.75 samples/sec Loss 9.3161 LearningRate 0.2812 Epoch: 4 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:36,531-Speed 18389.32 samples/sec Loss 9.3476 LearningRate 0.2811 Epoch: 4 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:40,939-Speed 18591.93 samples/sec Loss 9.3031 LearningRate 0.2811 Epoch: 4 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:45,348-Speed 18587.48 samples/sec Loss 9.3293 LearningRate 0.2810 Epoch: 4 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:07:49,776-Speed 18500.62 samples/sec Loss 9.3084 LearningRate 0.2809 Epoch: 4 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:08:02,170-Speed 6610.26 samples/sec Loss 9.2936 LearningRate 0.2809 Epoch: 4 Global Step: 25490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:06,578-Speed 18592.09 samples/sec Loss 9.2416 LearningRate 0.2808 Epoch: 4 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:10,983-Speed 18600.54 samples/sec Loss 9.2766 LearningRate 0.2807 Epoch: 4 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:15,393-Speed 18581.12 samples/sec Loss 9.3448 LearningRate 0.2806 Epoch: 4 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:19,807-Speed 18561.61 samples/sec Loss 9.2645 LearningRate 0.2806 Epoch: 4 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:24,269-Speed 18366.70 samples/sec Loss 9.3161 LearningRate 0.2805 Epoch: 4 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:28,712-Speed 18439.74 samples/sec Loss 9.2695 LearningRate 0.2804 Epoch: 4 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:33,146-Speed 18480.99 samples/sec Loss 9.2770 LearningRate 0.2804 Epoch: 4 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:37,573-Speed 18506.79 samples/sec Loss 9.2953 LearningRate 0.2803 Epoch: 4 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:42,005-Speed 18491.20 samples/sec Loss 9.2935 LearningRate 0.2802 Epoch: 4 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:08:46,425-Speed 18535.80 samples/sec Loss 9.2331 LearningRate 0.2801 Epoch: 4 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:08:50,834-Speed 18584.60 samples/sec Loss 9.2357 LearningRate 0.2801 Epoch: 4 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:08:55,309-Speed 18313.34 samples/sec Loss 9.2841 LearningRate 0.2800 Epoch: 4 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:08:59,755-Speed 18430.05 samples/sec Loss 9.2589 LearningRate 0.2799 Epoch: 4 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:09:04,179-Speed 18524.83 samples/sec Loss 9.2798 LearningRate 0.2799 Epoch: 4 Global Step: 25630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:08,589-Speed 18578.72 samples/sec Loss 9.3446 LearningRate 0.2798 Epoch: 4 Global Step: 25640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:12,986-Speed 18641.05 samples/sec Loss 9.2992 LearningRate 0.2797 Epoch: 4 Global Step: 25650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:17,463-Speed 18303.06 samples/sec Loss 9.2993 LearningRate 0.2796 Epoch: 4 Global Step: 25660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:21,896-Speed 18485.45 samples/sec Loss 9.2812 LearningRate 0.2796 Epoch: 4 Global Step: 25670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:26,355-Speed 18377.25 samples/sec Loss 9.2438 LearningRate 0.2795 Epoch: 4 Global Step: 25680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:30,804-Speed 18418.09 samples/sec Loss 9.2657 LearningRate 0.2794 Epoch: 4 Global Step: 25690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:35,318-Speed 18149.33 samples/sec Loss 9.2690 LearningRate 0.2794 Epoch: 4 Global Step: 25700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:39,756-Speed 18466.19 samples/sec Loss 9.2760 LearningRate 0.2793 Epoch: 4 Global Step: 25710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:09:44,169-Speed 18567.92 samples/sec Loss 9.2721 LearningRate 0.2792 Epoch: 4 Global Step: 25720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:09:48,597-Speed 18503.19 samples/sec Loss 9.2750 LearningRate 0.2791 Epoch: 4 Global Step: 25730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:09:53,021-Speed 18525.33 samples/sec Loss 9.2686 LearningRate 0.2791 Epoch: 4 Global Step: 25740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:09:57,432-Speed 18577.06 samples/sec Loss 9.2695 LearningRate 0.2790 Epoch: 4 Global Step: 25750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:10:01,856-Speed 18522.71 samples/sec Loss 9.2588 LearningRate 0.2789 Epoch: 4 Global Step: 25760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:10:06,325-Speed 18337.01 samples/sec Loss 9.2524 LearningRate 0.2789 Epoch: 4 Global Step: 25770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:10:10,773-Speed 18424.09 samples/sec Loss 9.2634 LearningRate 0.2788 Epoch: 4 Global Step: 25780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:10:15,177-Speed 18605.40 samples/sec Loss 9.2561 LearningRate 0.2787 Epoch: 4 Global Step: 25790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:10:19,605-Speed 18504.40 samples/sec Loss 9.2248 LearningRate 0.2786 Epoch: 4 Global Step: 25800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:10:24,014-Speed 18587.33 samples/sec Loss 9.2601 LearningRate 0.2786 Epoch: 4 Global Step: 25810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:10:28,453-Speed 18458.26 samples/sec Loss 9.2924 LearningRate 0.2785 Epoch: 4 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:10:32,881-Speed 18506.65 samples/sec Loss 9.2614 LearningRate 0.2784 Epoch: 4 Global Step: 25830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:10:37,297-Speed 18557.82 samples/sec Loss 9.2327 LearningRate 0.2783 Epoch: 4 Global Step: 25840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:10:41,713-Speed 18557.60 samples/sec Loss 9.2200 LearningRate 0.2783 Epoch: 4 Global Step: 25850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:10:46,130-Speed 18548.23 samples/sec Loss 9.2023 LearningRate 0.2782 Epoch: 4 Global Step: 25860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:10:50,566-Speed 18470.34 samples/sec Loss 9.2626 LearningRate 0.2781 Epoch: 4 Global Step: 25870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:10:54,983-Speed 18554.61 samples/sec Loss 9.2825 LearningRate 0.2781 Epoch: 4 Global Step: 25880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:10:59,386-Speed 18609.36 samples/sec Loss 9.2608 LearningRate 0.2780 Epoch: 4 Global Step: 25890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:11:03,814-Speed 18504.03 samples/sec Loss 9.3056 LearningRate 0.2779 Epoch: 4 Global Step: 25900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:11:08,212-Speed 18628.35 samples/sec Loss 9.2214 LearningRate 0.2778 Epoch: 4 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:11:12,627-Speed 18561.93 samples/sec Loss 9.2523 LearningRate 0.2778 Epoch: 4 Global Step: 25920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:11:30,978-Speed 4464.45 samples/sec Loss 9.2503 LearningRate 0.2777 Epoch: 5 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:11:35,402-Speed 18519.32 samples/sec Loss 9.1782 LearningRate 0.2776 Epoch: 5 Global Step: 25940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:11:39,828-Speed 18515.11 samples/sec Loss 9.1991 LearningRate 0.2776 Epoch: 5 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:11:44,270-Speed 18444.79 samples/sec Loss 9.2430 LearningRate 0.2775 Epoch: 5 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:11:48,748-Speed 18298.42 samples/sec Loss 9.2178 LearningRate 0.2774 Epoch: 5 Global Step: 25970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:11:53,216-Speed 18346.53 samples/sec Loss 9.2250 LearningRate 0.2773 Epoch: 5 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:11:57,623-Speed 18601.83 samples/sec Loss 9.2029 LearningRate 0.2773 Epoch: 5 Global Step: 25990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:02,042-Speed 18544.46 samples/sec Loss 9.2719 LearningRate 0.2772 Epoch: 5 Global Step: 26000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:06,427-Speed 18689.15 samples/sec Loss 9.1982 LearningRate 0.2771 Epoch: 5 Global Step: 26010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:10,805-Speed 18725.79 samples/sec Loss 9.1943 LearningRate 0.2771 Epoch: 5 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:15,169-Speed 18772.81 samples/sec Loss 9.1750 LearningRate 0.2770 Epoch: 5 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:19,572-Speed 18609.38 samples/sec Loss 9.2122 LearningRate 0.2769 Epoch: 5 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:23,970-Speed 18633.29 samples/sec Loss 9.1985 LearningRate 0.2768 Epoch: 5 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:28,366-Speed 18640.83 samples/sec Loss 9.1985 LearningRate 0.2768 Epoch: 5 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:12:32,795-Speed 18500.59 samples/sec Loss 9.1731 LearningRate 0.2767 Epoch: 5 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:12:37,216-Speed 18532.33 samples/sec Loss 9.2157 LearningRate 0.2766 Epoch: 5 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:12:41,626-Speed 18579.61 samples/sec Loss 9.1905 LearningRate 0.2766 Epoch: 5 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:46,047-Speed 18534.55 samples/sec Loss 9.2812 LearningRate 0.2765 Epoch: 5 Global Step: 26100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:50,492-Speed 18433.50 samples/sec Loss 9.1932 LearningRate 0.2764 Epoch: 5 Global Step: 26110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:54,961-Speed 18343.11 samples/sec Loss 9.1434 LearningRate 0.2764 Epoch: 5 Global Step: 26120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:12:59,390-Speed 18500.26 samples/sec Loss 9.1956 LearningRate 0.2763 Epoch: 5 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:03,840-Speed 18412.13 samples/sec Loss 9.2068 LearningRate 0.2762 Epoch: 5 Global Step: 26140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:08,255-Speed 18559.89 samples/sec Loss 9.2031 LearningRate 0.2761 Epoch: 5 Global Step: 26150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:12,679-Speed 18522.95 samples/sec Loss 9.1837 LearningRate 0.2761 Epoch: 5 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:17,154-Speed 18311.46 samples/sec Loss 9.1900 LearningRate 0.2760 Epoch: 5 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:21,584-Speed 18498.95 samples/sec Loss 9.2963 LearningRate 0.2759 Epoch: 5 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:26,020-Speed 18472.43 samples/sec Loss 9.2146 LearningRate 0.2759 Epoch: 5 Global Step: 26190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:13:30,414-Speed 18646.96 samples/sec Loss 9.1451 LearningRate 0.2758 Epoch: 5 Global Step: 26200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:13:34,828-Speed 18564.27 samples/sec Loss 9.1886 LearningRate 0.2757 Epoch: 5 Global Step: 26210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:13:39,208-Speed 18707.91 samples/sec Loss 9.1804 LearningRate 0.2756 Epoch: 5 Global Step: 26220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:13:43,624-Speed 18556.31 samples/sec Loss 9.1947 LearningRate 0.2756 Epoch: 5 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:48,041-Speed 18553.06 samples/sec Loss 9.1810 LearningRate 0.2755 Epoch: 5 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:52,470-Speed 18501.34 samples/sec Loss 9.2612 LearningRate 0.2754 Epoch: 5 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:13:56,858-Speed 18674.51 samples/sec Loss 9.2332 LearningRate 0.2754 Epoch: 5 Global Step: 26260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:01,280-Speed 18530.74 samples/sec Loss 9.1817 LearningRate 0.2753 Epoch: 5 Global Step: 26270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:05,685-Speed 18601.36 samples/sec Loss 9.2222 LearningRate 0.2752 Epoch: 5 Global Step: 26280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:10,108-Speed 18530.80 samples/sec Loss 9.1837 LearningRate 0.2751 Epoch: 5 Global Step: 26290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:14,531-Speed 18524.30 samples/sec Loss 9.1465 LearningRate 0.2751 Epoch: 5 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:18,948-Speed 18555.20 samples/sec Loss 9.2168 LearningRate 0.2750 Epoch: 5 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:23,357-Speed 18589.40 samples/sec Loss 9.1632 LearningRate 0.2749 Epoch: 5 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:27,786-Speed 18500.91 samples/sec Loss 9.1625 LearningRate 0.2749 Epoch: 5 Global Step: 26330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:14:32,239-Speed 18404.71 samples/sec Loss 9.2010 LearningRate 0.2748 Epoch: 5 Global Step: 26340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:14:36,664-Speed 18524.22 samples/sec Loss 9.2222 LearningRate 0.2747 Epoch: 5 Global Step: 26350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:14:41,146-Speed 18281.12 samples/sec Loss 9.1963 LearningRate 0.2746 Epoch: 5 Global Step: 26360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:14:45,527-Speed 18702.63 samples/sec Loss 9.1792 LearningRate 0.2746 Epoch: 5 Global Step: 26370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:14:49,927-Speed 18621.99 samples/sec Loss 9.1116 LearningRate 0.2745 Epoch: 5 Global Step: 26380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:14:54,321-Speed 18648.79 samples/sec Loss 9.1970 LearningRate 0.2744 Epoch: 5 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:14:58,724-Speed 18608.25 samples/sec Loss 9.1544 LearningRate 0.2744 Epoch: 5 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:03,134-Speed 18584.63 samples/sec Loss 9.2088 LearningRate 0.2743 Epoch: 5 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:07,580-Speed 18428.65 samples/sec Loss 9.1937 LearningRate 0.2742 Epoch: 5 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:11,973-Speed 18655.79 samples/sec Loss 9.2214 LearningRate 0.2741 Epoch: 5 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:16,396-Speed 18537.91 samples/sec Loss 9.2442 LearningRate 0.2741 Epoch: 5 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:20,827-Speed 18490.46 samples/sec Loss 9.1698 LearningRate 0.2740 Epoch: 5 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:25,240-Speed 18572.74 samples/sec Loss 9.1745 LearningRate 0.2739 Epoch: 5 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:29,658-Speed 18542.12 samples/sec Loss 9.2182 LearningRate 0.2739 Epoch: 5 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:34,058-Speed 18625.41 samples/sec Loss 9.1988 LearningRate 0.2738 Epoch: 5 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:15:38,462-Speed 18604.57 samples/sec Loss 9.2017 LearningRate 0.2737 Epoch: 5 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:15:42,905-Speed 18443.16 samples/sec Loss 9.1263 LearningRate 0.2736 Epoch: 5 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:15:47,354-Speed 18416.43 samples/sec Loss 9.1722 LearningRate 0.2736 Epoch: 5 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:15:51,769-Speed 18561.68 samples/sec Loss 9.1870 LearningRate 0.2735 Epoch: 5 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:15:56,191-Speed 18535.01 samples/sec Loss 9.1515 LearningRate 0.2734 Epoch: 5 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:16:00,608-Speed 18549.85 samples/sec Loss 9.1850 LearningRate 0.2734 Epoch: 5 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:16:05,009-Speed 18620.99 samples/sec Loss 9.1856 LearningRate 0.2733 Epoch: 5 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:16:09,456-Speed 18426.33 samples/sec Loss 9.1489 LearningRate 0.2732 Epoch: 5 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:14,022-Speed 17950.80 samples/sec Loss 9.1938 LearningRate 0.2732 Epoch: 5 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:18,452-Speed 18495.23 samples/sec Loss 9.1517 LearningRate 0.2731 Epoch: 5 Global Step: 26580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:22,924-Speed 18323.64 samples/sec Loss 9.1184 LearningRate 0.2730 Epoch: 5 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:27,408-Speed 18274.11 samples/sec Loss 9.1287 LearningRate 0.2729 Epoch: 5 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:31,829-Speed 18533.55 samples/sec Loss 9.1390 LearningRate 0.2729 Epoch: 5 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:36,243-Speed 18566.87 samples/sec Loss 9.1256 LearningRate 0.2728 Epoch: 5 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:40,720-Speed 18299.95 samples/sec Loss 9.1845 LearningRate 0.2727 Epoch: 5 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:45,151-Speed 18494.56 samples/sec Loss 9.1344 LearningRate 0.2727 Epoch: 5 Global Step: 26640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:16:49,573-Speed 18529.84 samples/sec Loss 9.1731 LearningRate 0.2726 Epoch: 5 Global Step: 26650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:16:54,058-Speed 18270.92 samples/sec Loss 9.0667 LearningRate 0.2725 Epoch: 5 Global Step: 26660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:16:58,511-Speed 18397.40 samples/sec Loss 9.1527 LearningRate 0.2724 Epoch: 5 Global Step: 26670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:02,986-Speed 18314.13 samples/sec Loss 9.1235 LearningRate 0.2724 Epoch: 5 Global Step: 26680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:07,411-Speed 18517.00 samples/sec Loss 9.1123 LearningRate 0.2723 Epoch: 5 Global Step: 26690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:11,835-Speed 18523.97 samples/sec Loss 9.1409 LearningRate 0.2722 Epoch: 5 Global Step: 26700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:16,262-Speed 18508.74 samples/sec Loss 9.1308 LearningRate 0.2722 Epoch: 5 Global Step: 26710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:20,691-Speed 18498.57 samples/sec Loss 9.1398 LearningRate 0.2721 Epoch: 5 Global Step: 26720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:25,187-Speed 18227.68 samples/sec Loss 9.1622 LearningRate 0.2720 Epoch: 5 Global Step: 26730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:29,604-Speed 18550.37 samples/sec Loss 9.0985 LearningRate 0.2720 Epoch: 5 Global Step: 26740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:17:33,997-Speed 18653.58 samples/sec Loss 9.0802 LearningRate 0.2719 Epoch: 5 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:17:38,452-Speed 18393.66 samples/sec Loss 9.1073 LearningRate 0.2718 Epoch: 5 Global Step: 26760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:17:42,884-Speed 18489.47 samples/sec Loss 9.1307 LearningRate 0.2717 Epoch: 5 Global Step: 26770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:17:47,286-Speed 18612.81 samples/sec Loss 9.1136 LearningRate 0.2717 Epoch: 5 Global Step: 26780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:17:51,696-Speed 18582.16 samples/sec Loss 9.1485 LearningRate 0.2716 Epoch: 5 Global Step: 26790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:17:56,127-Speed 18492.97 samples/sec Loss 9.1513 LearningRate 0.2715 Epoch: 5 Global Step: 26800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:00,563-Speed 18473.40 samples/sec Loss 9.1028 LearningRate 0.2715 Epoch: 5 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:04,996-Speed 18484.29 samples/sec Loss 9.1524 LearningRate 0.2714 Epoch: 5 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:09,382-Speed 18682.52 samples/sec Loss 9.1564 LearningRate 0.2713 Epoch: 5 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:13,848-Speed 18346.17 samples/sec Loss 9.1107 LearningRate 0.2712 Epoch: 5 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:18,288-Speed 18456.86 samples/sec Loss 9.1479 LearningRate 0.2712 Epoch: 5 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:18:22,696-Speed 18591.21 samples/sec Loss 9.1202 LearningRate 0.2711 Epoch: 5 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:27,105-Speed 18584.48 samples/sec Loss 9.1173 LearningRate 0.2710 Epoch: 5 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:31,512-Speed 18593.54 samples/sec Loss 9.1288 LearningRate 0.2710 Epoch: 5 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:35,927-Speed 18561.31 samples/sec Loss 9.0802 LearningRate 0.2709 Epoch: 5 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:40,358-Speed 18493.76 samples/sec Loss 9.0850 LearningRate 0.2708 Epoch: 5 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:44,764-Speed 18598.33 samples/sec Loss 9.0558 LearningRate 0.2707 Epoch: 5 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:49,166-Speed 18613.29 samples/sec Loss 9.1483 LearningRate 0.2707 Epoch: 5 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:53,573-Speed 18595.81 samples/sec Loss 9.0914 LearningRate 0.2706 Epoch: 5 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:18:57,953-Speed 18707.27 samples/sec Loss 9.1535 LearningRate 0.2705 Epoch: 5 Global Step: 26940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:02,379-Speed 18517.50 samples/sec Loss 9.0780 LearningRate 0.2705 Epoch: 5 Global Step: 26950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:06,819-Speed 18454.85 samples/sec Loss 9.0975 LearningRate 0.2704 Epoch: 5 Global Step: 26960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:11,226-Speed 18593.89 samples/sec Loss 9.0765 LearningRate 0.2703 Epoch: 5 Global Step: 26970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:15,668-Speed 18446.49 samples/sec Loss 9.1380 LearningRate 0.2703 Epoch: 5 Global Step: 26980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:20,088-Speed 18536.16 samples/sec Loss 9.1449 LearningRate 0.2702 Epoch: 5 Global Step: 26990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:24,508-Speed 18537.87 samples/sec Loss 9.0853 LearningRate 0.2701 Epoch: 5 Global Step: 27000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:28,982-Speed 18315.95 samples/sec Loss 9.0963 LearningRate 0.2700 Epoch: 5 Global Step: 27010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:33,395-Speed 18573.01 samples/sec Loss 9.1064 LearningRate 0.2700 Epoch: 5 Global Step: 27020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:37,825-Speed 18491.66 samples/sec Loss 9.1074 LearningRate 0.2699 Epoch: 5 Global Step: 27030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-01-14 01:19:42,236-Speed 18577.87 samples/sec Loss 9.1158 LearningRate 0.2698 Epoch: 5 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:19:46,626-Speed 18666.15 samples/sec Loss 9.1015 LearningRate 0.2698 Epoch: 5 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:19:51,027-Speed 18617.68 samples/sec Loss 9.1044 LearningRate 0.2697 Epoch: 5 Global Step: 27060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:19:55,423-Speed 18637.81 samples/sec Loss 9.1100 LearningRate 0.2696 Epoch: 5 Global Step: 27070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:19:59,823-Speed 18621.19 samples/sec Loss 9.0905 LearningRate 0.2696 Epoch: 5 Global Step: 27080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:20:04,265-Speed 18448.65 samples/sec Loss 9.1050 LearningRate 0.2695 Epoch: 5 Global Step: 27090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:20:08,712-Speed 18422.39 samples/sec Loss 9.0942 LearningRate 0.2694 Epoch: 5 Global Step: 27100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:20:13,167-Speed 18392.41 samples/sec Loss 9.1311 LearningRate 0.2693 Epoch: 5 Global Step: 27110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:20:17,598-Speed 18492.18 samples/sec Loss 9.1071 LearningRate 0.2693 Epoch: 5 Global Step: 27120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:20:22,067-Speed 18332.45 samples/sec Loss 9.1259 LearningRate 0.2692 Epoch: 5 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:20:26,483-Speed 18555.28 samples/sec Loss 9.1146 LearningRate 0.2691 Epoch: 5 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:20:30,885-Speed 18611.58 samples/sec Loss 9.0837 LearningRate 0.2691 Epoch: 5 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:20:35,323-Speed 18462.92 samples/sec Loss 9.0768 LearningRate 0.2690 Epoch: 5 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:20:39,819-Speed 18227.44 samples/sec Loss 9.0982 LearningRate 0.2689 Epoch: 5 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:20:44,253-Speed 18479.42 samples/sec Loss 9.1480 LearningRate 0.2688 Epoch: 5 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:20:48,666-Speed 18568.01 samples/sec Loss 9.0857 LearningRate 0.2688 Epoch: 5 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:20:53,083-Speed 18550.08 samples/sec Loss 9.0521 LearningRate 0.2687 Epoch: 5 Global Step: 27200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:20:57,527-Speed 18441.95 samples/sec Loss 9.1071 LearningRate 0.2686 Epoch: 5 Global Step: 27210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:01,904-Speed 18718.53 samples/sec Loss 9.0932 LearningRate 0.2686 Epoch: 5 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:06,328-Speed 18523.51 samples/sec Loss 9.0885 LearningRate 0.2685 Epoch: 5 Global Step: 27230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:10,741-Speed 18567.41 samples/sec Loss 9.0821 LearningRate 0.2684 Epoch: 5 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:15,137-Speed 18641.69 samples/sec Loss 9.0491 LearningRate 0.2684 Epoch: 5 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:19,608-Speed 18324.31 samples/sec Loss 9.1145 LearningRate 0.2683 Epoch: 5 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:24,107-Speed 18216.77 samples/sec Loss 9.1124 LearningRate 0.2682 Epoch: 5 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:28,567-Speed 18373.50 samples/sec Loss 9.0925 LearningRate 0.2681 Epoch: 5 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:21:32,985-Speed 18546.20 samples/sec Loss 9.0099 LearningRate 0.2681 Epoch: 5 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:21:37,440-Speed 18395.72 samples/sec Loss 9.1121 LearningRate 0.2680 Epoch: 5 Global Step: 27300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:21:41,851-Speed 18574.37 samples/sec Loss 9.0490 LearningRate 0.2679 Epoch: 5 Global Step: 27310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:21:46,267-Speed 18556.40 samples/sec Loss 9.0696 LearningRate 0.2679 Epoch: 5 Global Step: 27320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:21:50,692-Speed 18514.42 samples/sec Loss 9.1035 LearningRate 0.2678 Epoch: 5 Global Step: 27330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:21:55,160-Speed 18342.97 samples/sec Loss 9.0954 LearningRate 0.2677 Epoch: 5 Global Step: 27340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:21:59,569-Speed 18583.13 samples/sec Loss 9.0967 LearningRate 0.2677 Epoch: 5 Global Step: 27350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:22:04,026-Speed 18387.01 samples/sec Loss 9.1283 LearningRate 0.2676 Epoch: 5 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:22:08,420-Speed 18646.82 samples/sec Loss 9.0891 LearningRate 0.2675 Epoch: 5 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:22:12,831-Speed 18580.55 samples/sec Loss 9.0457 LearningRate 0.2674 Epoch: 5 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-14 01:22:17,255-Speed 18522.04 samples/sec Loss 9.0294 LearningRate 0.2674 Epoch: 5 Global Step: 27390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:22:21,738-Speed 18276.98 samples/sec Loss 9.0833 LearningRate 0.2673 Epoch: 5 Global Step: 27400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-14 01:22:26,121-Speed 18696.53 samples/sec Loss 9.0606 LearningRate 0.2672 Epoch: 5 Global Step: 27410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:22:30,564-Speed 18446.60 samples/sec Loss 9.1231 LearningRate 0.2672 Epoch: 5 Global Step: 27420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:22:35,027-Speed 18356.96 samples/sec Loss 9.1117 LearningRate 0.2671 Epoch: 5 Global Step: 27430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:22:39,424-Speed 18636.52 samples/sec Loss 9.0823 LearningRate 0.2670 Epoch: 5 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:22:43,835-Speed 18579.01 samples/sec Loss 9.0698 LearningRate 0.2670 Epoch: 5 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:22:48,273-Speed 18463.42 samples/sec Loss 9.0245 LearningRate 0.2669 Epoch: 5 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:22:52,677-Speed 18605.57 samples/sec Loss 9.0466 LearningRate 0.2668 Epoch: 5 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:22:57,089-Speed 18571.85 samples/sec Loss 9.0166 LearningRate 0.2667 Epoch: 5 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:01,535-Speed 18429.83 samples/sec Loss 9.0701 LearningRate 0.2667 Epoch: 5 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:05,983-Speed 18424.91 samples/sec Loss 9.1049 LearningRate 0.2666 Epoch: 5 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:10,390-Speed 18592.91 samples/sec Loss 9.0433 LearningRate 0.2665 Epoch: 5 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:14,777-Speed 18682.57 samples/sec Loss 9.0564 LearningRate 0.2665 Epoch: 5 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:19,235-Speed 18378.09 samples/sec Loss 9.0950 LearningRate 0.2664 Epoch: 5 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:23,695-Speed 18372.69 samples/sec Loss 9.0431 LearningRate 0.2663 Epoch: 5 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:28,094-Speed 18629.78 samples/sec Loss 9.0507 LearningRate 0.2663 Epoch: 5 Global Step: 27550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:32,529-Speed 18475.54 samples/sec Loss 9.0527 LearningRate 0.2662 Epoch: 5 Global Step: 27560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:36,915-Speed 18687.46 samples/sec Loss 9.0758 LearningRate 0.2661 Epoch: 5 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:41,323-Speed 18587.20 samples/sec Loss 9.0213 LearningRate 0.2660 Epoch: 5 Global Step: 27580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:52,121-Speed 7587.39 samples/sec Loss 9.0737 LearningRate 0.2660 Epoch: 5 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:23:56,570-Speed 18419.40 samples/sec Loss 9.0345 LearningRate 0.2659 Epoch: 5 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:00,973-Speed 18610.05 samples/sec Loss 8.9993 LearningRate 0.2658 Epoch: 5 Global Step: 27610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:05,444-Speed 18329.81 samples/sec Loss 9.0257 LearningRate 0.2658 Epoch: 5 Global Step: 27620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:09,907-Speed 18357.63 samples/sec Loss 9.0272 LearningRate 0.2657 Epoch: 5 Global Step: 27630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:14,304-Speed 18636.98 samples/sec Loss 9.0359 LearningRate 0.2656 Epoch: 5 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:24:18,747-Speed 18443.93 samples/sec Loss 9.0370 LearningRate 0.2656 Epoch: 5 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:24:23,157-Speed 18577.18 samples/sec Loss 9.0015 LearningRate 0.2655 Epoch: 5 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:24:27,577-Speed 18540.82 samples/sec Loss 8.9988 LearningRate 0.2654 Epoch: 5 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:24:32,005-Speed 18508.21 samples/sec Loss 9.0211 LearningRate 0.2653 Epoch: 5 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:36,420-Speed 18558.04 samples/sec Loss 9.0545 LearningRate 0.2653 Epoch: 5 Global Step: 27690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:40,862-Speed 18446.44 samples/sec Loss 9.0381 LearningRate 0.2652 Epoch: 5 Global Step: 27700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:45,265-Speed 18608.34 samples/sec Loss 9.0517 LearningRate 0.2651 Epoch: 5 Global Step: 27710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:49,687-Speed 18533.05 samples/sec Loss 9.0308 LearningRate 0.2651 Epoch: 5 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:54,072-Speed 18688.70 samples/sec Loss 9.0722 LearningRate 0.2650 Epoch: 5 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:24:58,506-Speed 18477.07 samples/sec Loss 9.0630 LearningRate 0.2649 Epoch: 5 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:25:02,920-Speed 18567.24 samples/sec Loss 9.0546 LearningRate 0.2649 Epoch: 5 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:25:07,320-Speed 18619.15 samples/sec Loss 9.0400 LearningRate 0.2648 Epoch: 5 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:25:11,704-Speed 18692.52 samples/sec Loss 9.0775 LearningRate 0.2647 Epoch: 5 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:25:16,126-Speed 18531.54 samples/sec Loss 9.0585 LearningRate 0.2646 Epoch: 5 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:25:20,537-Speed 18572.78 samples/sec Loss 8.9956 LearningRate 0.2646 Epoch: 5 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:25:24,968-Speed 18494.36 samples/sec Loss 9.0268 LearningRate 0.2645 Epoch: 5 Global Step: 27800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:25:29,430-Speed 18364.37 samples/sec Loss 9.0300 LearningRate 0.2644 Epoch: 5 Global Step: 27810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:25:33,856-Speed 18512.87 samples/sec Loss 8.9972 LearningRate 0.2644 Epoch: 5 Global Step: 27820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:25:38,333-Speed 18303.65 samples/sec Loss 9.0356 LearningRate 0.2643 Epoch: 5 Global Step: 27830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:25:42,828-Speed 18228.95 samples/sec Loss 9.0334 LearningRate 0.2642 Epoch: 5 Global Step: 27840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:25:47,254-Speed 18514.43 samples/sec Loss 9.0080 LearningRate 0.2642 Epoch: 5 Global Step: 27850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:25:51,763-Speed 18170.76 samples/sec Loss 9.0119 LearningRate 0.2641 Epoch: 5 Global Step: 27860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:25:56,197-Speed 18482.91 samples/sec Loss 9.0725 LearningRate 0.2640 Epoch: 5 Global Step: 27870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:26:00,618-Speed 18533.66 samples/sec Loss 9.0223 LearningRate 0.2640 Epoch: 5 Global Step: 27880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:26:05,118-Speed 18208.83 samples/sec Loss 8.9947 LearningRate 0.2639 Epoch: 5 Global Step: 27890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:26:09,594-Speed 18305.53 samples/sec Loss 9.0303 LearningRate 0.2638 Epoch: 5 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:26:14,018-Speed 18522.65 samples/sec Loss 9.0242 LearningRate 0.2637 Epoch: 5 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:26:18,439-Speed 18536.90 samples/sec Loss 8.9773 LearningRate 0.2637 Epoch: 5 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:26:22,859-Speed 18538.29 samples/sec Loss 8.9730 LearningRate 0.2636 Epoch: 5 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:26:27,264-Speed 18602.04 samples/sec Loss 9.0294 LearningRate 0.2635 Epoch: 5 Global Step: 27940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:26:31,681-Speed 18551.29 samples/sec Loss 9.0046 LearningRate 0.2635 Epoch: 5 Global Step: 27950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:26:36,126-Speed 18436.90 samples/sec Loss 9.0431 LearningRate 0.2634 Epoch: 5 Global Step: 27960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:26:40,546-Speed 18539.99 samples/sec Loss 9.0207 LearningRate 0.2633 Epoch: 5 Global Step: 27970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:26:44,940-Speed 18649.17 samples/sec Loss 9.0123 LearningRate 0.2633 Epoch: 5 Global Step: 27980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:26:49,350-Speed 18581.18 samples/sec Loss 9.0159 LearningRate 0.2632 Epoch: 5 Global Step: 27990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:26:53,793-Speed 18442.94 samples/sec Loss 8.9893 LearningRate 0.2631 Epoch: 5 Global Step: 28000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:26:58,220-Speed 18515.59 samples/sec Loss 8.9697 LearningRate 0.2630 Epoch: 5 Global Step: 28010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:27:02,645-Speed 18519.15 samples/sec Loss 9.0265 LearningRate 0.2630 Epoch: 5 Global Step: 28020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:27:07,104-Speed 18372.06 samples/sec Loss 9.0111 LearningRate 0.2629 Epoch: 5 Global Step: 28030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:27:11,591-Speed 18266.54 samples/sec Loss 8.9860 LearningRate 0.2628 Epoch: 5 Global Step: 28040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:27:16,068-Speed 18301.94 samples/sec Loss 8.9564 LearningRate 0.2628 Epoch: 5 Global Step: 28050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:27:20,535-Speed 18348.07 samples/sec Loss 8.9780 LearningRate 0.2627 Epoch: 5 Global Step: 28060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:27:24,969-Speed 18486.08 samples/sec Loss 9.0174 LearningRate 0.2626 Epoch: 5 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:27:29,357-Speed 18672.64 samples/sec Loss 8.9930 LearningRate 0.2626 Epoch: 5 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:27:33,761-Speed 18607.46 samples/sec Loss 8.9860 LearningRate 0.2625 Epoch: 5 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:27:38,165-Speed 18609.17 samples/sec Loss 9.0084 LearningRate 0.2624 Epoch: 5 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:27:42,569-Speed 18606.46 samples/sec Loss 8.9806 LearningRate 0.2624 Epoch: 5 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:27:47,025-Speed 18389.66 samples/sec Loss 8.9904 LearningRate 0.2623 Epoch: 5 Global Step: 28120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:27:51,430-Speed 18597.62 samples/sec Loss 8.9658 LearningRate 0.2622 Epoch: 5 Global Step: 28130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:27:55,841-Speed 18579.80 samples/sec Loss 8.9691 LearningRate 0.2621 Epoch: 5 Global Step: 28140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:28:00,299-Speed 18379.24 samples/sec Loss 8.9574 LearningRate 0.2621 Epoch: 5 Global Step: 28150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:28:04,728-Speed 18504.19 samples/sec Loss 8.9769 LearningRate 0.2620 Epoch: 5 Global Step: 28160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:28:09,172-Speed 18444.13 samples/sec Loss 8.9765 LearningRate 0.2619 Epoch: 5 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:28:13,585-Speed 18572.66 samples/sec Loss 8.9474 LearningRate 0.2619 Epoch: 5 Global Step: 28180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:28:17,960-Speed 18724.19 samples/sec Loss 8.9998 LearningRate 0.2618 Epoch: 5 Global Step: 28190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:22,387-Speed 18517.77 samples/sec Loss 9.0336 LearningRate 0.2617 Epoch: 5 Global Step: 28200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:26,824-Speed 18471.97 samples/sec Loss 8.9714 LearningRate 0.2617 Epoch: 5 Global Step: 28210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:31,224-Speed 18623.70 samples/sec Loss 8.9628 LearningRate 0.2616 Epoch: 5 Global Step: 28220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:35,645-Speed 18533.81 samples/sec Loss 8.9507 LearningRate 0.2615 Epoch: 5 Global Step: 28230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:40,061-Speed 18558.61 samples/sec Loss 8.9551 LearningRate 0.2614 Epoch: 5 Global Step: 28240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:44,462-Speed 18626.95 samples/sec Loss 8.9795 LearningRate 0.2614 Epoch: 5 Global Step: 28250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:48,923-Speed 18368.67 samples/sec Loss 9.0038 LearningRate 0.2613 Epoch: 5 Global Step: 28260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:53,325-Speed 18614.11 samples/sec Loss 9.0235 LearningRate 0.2612 Epoch: 5 Global Step: 28270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:28:57,743-Speed 18545.28 samples/sec Loss 9.0229 LearningRate 0.2612 Epoch: 5 Global Step: 28280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:29:02,158-Speed 18560.84 samples/sec Loss 8.9736 LearningRate 0.2611 Epoch: 5 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:06,575-Speed 18555.01 samples/sec Loss 9.0150 LearningRate 0.2610 Epoch: 5 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:11,018-Speed 18439.97 samples/sec Loss 8.9937 LearningRate 0.2610 Epoch: 5 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:15,445-Speed 18513.92 samples/sec Loss 9.0001 LearningRate 0.2609 Epoch: 5 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:19,858-Speed 18567.23 samples/sec Loss 8.9493 LearningRate 0.2608 Epoch: 5 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:24,303-Speed 18434.13 samples/sec Loss 8.9943 LearningRate 0.2608 Epoch: 5 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:28,742-Speed 18459.82 samples/sec Loss 8.9666 LearningRate 0.2607 Epoch: 5 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:33,147-Speed 18603.80 samples/sec Loss 8.9865 LearningRate 0.2606 Epoch: 5 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:37,602-Speed 18392.89 samples/sec Loss 9.0074 LearningRate 0.2605 Epoch: 5 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:41,984-Speed 18699.57 samples/sec Loss 8.9756 LearningRate 0.2605 Epoch: 5 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:46,412-Speed 18504.86 samples/sec Loss 8.9749 LearningRate 0.2604 Epoch: 5 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:29:50,865-Speed 18402.66 samples/sec Loss 8.9377 LearningRate 0.2603 Epoch: 5 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:29:55,259-Speed 18647.21 samples/sec Loss 8.9866 LearningRate 0.2603 Epoch: 5 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:29:59,630-Speed 18745.96 samples/sec Loss 8.9528 LearningRate 0.2602 Epoch: 5 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:04,068-Speed 18463.59 samples/sec Loss 8.9541 LearningRate 0.2601 Epoch: 5 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:08,526-Speed 18379.62 samples/sec Loss 8.9532 LearningRate 0.2601 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:12,995-Speed 18333.47 samples/sec Loss 8.9204 LearningRate 0.2600 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:17,396-Speed 18617.31 samples/sec Loss 8.9642 LearningRate 0.2599 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:21,818-Speed 18529.75 samples/sec Loss 8.9566 LearningRate 0.2599 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:26,254-Speed 18474.66 samples/sec Loss 8.9638 LearningRate 0.2598 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:30,707-Speed 18399.43 samples/sec Loss 9.0250 LearningRate 0.2597 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:30:35,113-Speed 18601.91 samples/sec Loss 8.9771 LearningRate 0.2597 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:30:39,528-Speed 18556.30 samples/sec Loss 8.9616 LearningRate 0.2596 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:30:43,953-Speed 18519.46 samples/sec Loss 8.9464 LearningRate 0.2595 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:30:48,387-Speed 18479.69 samples/sec Loss 8.9910 LearningRate 0.2594 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:30:52,784-Speed 18632.94 samples/sec Loss 8.9280 LearningRate 0.2594 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:30:57,205-Speed 18534.63 samples/sec Loss 8.9352 LearningRate 0.2593 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:31:01,608-Speed 18612.12 samples/sec Loss 8.9282 LearningRate 0.2592 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:31:06,039-Speed 18493.64 samples/sec Loss 8.9265 LearningRate 0.2592 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:31:10,438-Speed 18628.67 samples/sec Loss 8.9799 LearningRate 0.2591 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:31:14,827-Speed 18671.64 samples/sec Loss 8.9430 LearningRate 0.2590 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:31:19,212-Speed 18684.03 samples/sec Loss 8.9514 LearningRate 0.2590 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:23,741-Speed 18093.46 samples/sec Loss 8.9220 LearningRate 0.2589 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:28,183-Speed 18445.68 samples/sec Loss 8.9349 LearningRate 0.2588 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:32,599-Speed 18560.82 samples/sec Loss 8.9464 LearningRate 0.2588 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:37,021-Speed 18530.10 samples/sec Loss 8.9500 LearningRate 0.2587 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:41,443-Speed 18531.94 samples/sec Loss 8.9698 LearningRate 0.2586 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:45,852-Speed 18587.00 samples/sec Loss 8.9496 LearningRate 0.2585 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:50,291-Speed 18457.47 samples/sec Loss 8.9132 LearningRate 0.2585 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:54,716-Speed 18514.55 samples/sec Loss 8.8958 LearningRate 0.2584 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:31:59,105-Speed 18670.85 samples/sec Loss 8.9172 LearningRate 0.2583 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:32:03,500-Speed 18644.36 samples/sec Loss 8.9888 LearningRate 0.2583 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:32:07,909-Speed 18582.94 samples/sec Loss 8.9478 LearningRate 0.2582 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:32:12,365-Speed 18389.24 samples/sec Loss 8.9597 LearningRate 0.2581 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:32:16,754-Speed 18669.08 samples/sec Loss 8.8858 LearningRate 0.2581 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:32:21,144-Speed 18662.88 samples/sec Loss 8.9790 LearningRate 0.2580 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:32:25,551-Speed 18597.66 samples/sec Loss 8.9316 LearningRate 0.2579 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:32:29,946-Speed 18643.03 samples/sec Loss 8.9398 LearningRate 0.2579 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:32:34,369-Speed 18527.27 samples/sec Loss 8.9279 LearningRate 0.2578 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:32:38,816-Speed 18431.28 samples/sec Loss 8.9388 LearningRate 0.2577 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:32:43,234-Speed 18548.57 samples/sec Loss 8.9416 LearningRate 0.2577 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:32:47,668-Speed 18479.89 samples/sec Loss 8.9883 LearningRate 0.2576 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:32:52,068-Speed 18620.22 samples/sec Loss 8.9477 LearningRate 0.2575 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:32:56,461-Speed 18656.19 samples/sec Loss 8.8921 LearningRate 0.2574 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:33:00,862-Speed 18617.18 samples/sec Loss 8.9872 LearningRate 0.2574 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:33:05,241-Speed 18714.37 samples/sec Loss 8.9399 LearningRate 0.2573 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:33:09,608-Speed 18764.02 samples/sec Loss 8.9305 LearningRate 0.2572 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:33:14,006-Speed 18629.81 samples/sec Loss 8.9598 LearningRate 0.2572 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:33:18,426-Speed 18542.67 samples/sec Loss 8.9175 LearningRate 0.2571 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:33:22,890-Speed 18354.87 samples/sec Loss 8.9570 LearningRate 0.2570 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:27,347-Speed 18385.61 samples/sec Loss 8.8851 LearningRate 0.2570 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:31,797-Speed 18416.42 samples/sec Loss 8.9191 LearningRate 0.2569 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:36,224-Speed 18510.49 samples/sec Loss 8.9868 LearningRate 0.2568 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:40,654-Speed 18496.84 samples/sec Loss 8.9224 LearningRate 0.2568 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:45,120-Speed 18348.04 samples/sec Loss 8.9672 LearningRate 0.2567 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:49,571-Speed 18410.94 samples/sec Loss 8.9231 LearningRate 0.2566 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:54,013-Speed 18447.76 samples/sec Loss 8.8837 LearningRate 0.2566 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:33:58,410-Speed 18636.33 samples/sec Loss 8.9062 LearningRate 0.2565 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:34:02,826-Speed 18558.46 samples/sec Loss 8.8470 LearningRate 0.2564 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:34:07,223-Speed 18641.44 samples/sec Loss 8.9292 LearningRate 0.2563 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:11,627-Speed 18612.24 samples/sec Loss 8.9580 LearningRate 0.2563 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:16,068-Speed 18452.77 samples/sec Loss 8.9216 LearningRate 0.2562 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:20,479-Speed 18576.77 samples/sec Loss 8.9074 LearningRate 0.2561 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:29,258-Speed 9332.71 samples/sec Loss 8.9091 LearningRate 0.2561 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:33,671-Speed 18570.89 samples/sec Loss 8.9088 LearningRate 0.2560 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:38,123-Speed 18404.89 samples/sec Loss 8.9095 LearningRate 0.2559 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:42,511-Speed 18679.48 samples/sec Loss 8.9028 LearningRate 0.2559 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:46,909-Speed 18634.64 samples/sec Loss 8.9462 LearningRate 0.2558 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:51,444-Speed 18068.93 samples/sec Loss 8.9068 LearningRate 0.2557 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:34:55,849-Speed 18603.98 samples/sec Loss 8.9189 LearningRate 0.2557 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:00,237-Speed 18672.72 samples/sec Loss 8.8756 LearningRate 0.2556 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:04,676-Speed 18459.51 samples/sec Loss 8.9258 LearningRate 0.2555 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:09,106-Speed 18496.73 samples/sec Loss 8.8852 LearningRate 0.2555 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:13,553-Speed 18429.45 samples/sec Loss 8.8790 LearningRate 0.2554 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:17,978-Speed 18519.21 samples/sec Loss 8.8801 LearningRate 0.2553 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:22,464-Speed 18264.51 samples/sec Loss 8.8674 LearningRate 0.2552 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:26,905-Speed 18452.88 samples/sec Loss 8.8953 LearningRate 0.2552 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:31,322-Speed 18552.42 samples/sec Loss 8.8360 LearningRate 0.2551 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:35,717-Speed 18647.07 samples/sec Loss 8.8235 LearningRate 0.2550 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:35:40,130-Speed 18568.12 samples/sec Loss 8.8468 LearningRate 0.2550 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:35:44,571-Speed 18453.04 samples/sec Loss 8.8510 LearningRate 0.2549 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:35:48,972-Speed 18615.97 samples/sec Loss 8.8864 LearningRate 0.2548 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:35:53,407-Speed 18473.75 samples/sec Loss 8.9017 LearningRate 0.2548 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:35:57,835-Speed 18505.70 samples/sec Loss 8.8620 LearningRate 0.2547 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:02,300-Speed 18352.67 samples/sec Loss 8.9116 LearningRate 0.2546 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:06,767-Speed 18344.15 samples/sec Loss 8.8645 LearningRate 0.2546 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:11,185-Speed 18545.89 samples/sec Loss 8.8730 LearningRate 0.2545 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:15,586-Speed 18617.89 samples/sec Loss 8.8962 LearningRate 0.2544 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:20,109-Speed 18117.70 samples/sec Loss 8.9172 LearningRate 0.2544 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:24,539-Speed 18494.59 samples/sec Loss 8.8940 LearningRate 0.2543 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:28,964-Speed 18521.07 samples/sec Loss 8.9016 LearningRate 0.2542 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:33,441-Speed 18301.37 samples/sec Loss 8.9339 LearningRate 0.2542 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:37,840-Speed 18629.18 samples/sec Loss 8.8648 LearningRate 0.2541 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:42,252-Speed 18573.96 samples/sec Loss 8.8769 LearningRate 0.2540 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:46,663-Speed 18574.17 samples/sec Loss 8.8887 LearningRate 0.2539 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:51,106-Speed 18442.98 samples/sec Loss 8.8342 LearningRate 0.2539 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:55,541-Speed 18479.44 samples/sec Loss 8.8620 LearningRate 0.2538 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:36:59,964-Speed 18524.85 samples/sec Loss 8.9047 LearningRate 0.2537 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:04,397-Speed 18495.13 samples/sec Loss 8.8289 LearningRate 0.2537 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:08,843-Speed 18430.01 samples/sec Loss 8.8753 LearningRate 0.2536 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-14 01:37:13,239-Speed 18639.05 samples/sec Loss 8.8658 LearningRate 0.2535 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:17,645-Speed 18599.74 samples/sec Loss 8.8742 LearningRate 0.2535 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:22,112-Speed 18344.50 samples/sec Loss 8.8512 LearningRate 0.2534 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:26,553-Speed 18449.57 samples/sec Loss 8.9138 LearningRate 0.2533 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:30,978-Speed 18518.19 samples/sec Loss 8.9466 LearningRate 0.2533 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:35,421-Speed 18451.17 samples/sec Loss 8.8442 LearningRate 0.2532 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:39,856-Speed 18479.17 samples/sec Loss 8.7973 LearningRate 0.2531 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:44,401-Speed 18033.74 samples/sec Loss 8.8811 LearningRate 0.2531 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:48,853-Speed 18402.45 samples/sec Loss 8.8496 LearningRate 0.2530 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:53,283-Speed 18494.99 samples/sec Loss 8.9154 LearningRate 0.2529 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:37:57,672-Speed 18677.87 samples/sec Loss 8.8884 LearningRate 0.2529 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:38:02,072-Speed 18622.13 samples/sec Loss 8.8542 LearningRate 0.2528 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:38:06,528-Speed 18391.50 samples/sec Loss 8.8623 LearningRate 0.2527 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:38:10,983-Speed 18392.67 samples/sec Loss 8.8615 LearningRate 0.2527 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:38:15,435-Speed 18407.02 samples/sec Loss 8.8403 LearningRate 0.2526 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:38:19,894-Speed 18374.48 samples/sec Loss 8.8267 LearningRate 0.2525 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:24,413-Speed 18134.19 samples/sec Loss 8.8134 LearningRate 0.2524 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:28,943-Speed 18088.41 samples/sec Loss 8.8621 LearningRate 0.2524 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:33,389-Speed 18433.31 samples/sec Loss 8.8368 LearningRate 0.2523 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:37,856-Speed 18343.20 samples/sec Loss 8.8264 LearningRate 0.2522 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:42,336-Speed 18294.53 samples/sec Loss 8.8462 LearningRate 0.2522 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:46,778-Speed 18443.17 samples/sec Loss 8.8831 LearningRate 0.2521 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:51,233-Speed 18394.00 samples/sec Loss 8.8389 LearningRate 0.2520 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:38:55,660-Speed 18513.06 samples/sec Loss 8.8549 LearningRate 0.2520 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:00,127-Speed 18341.34 samples/sec Loss 8.8695 LearningRate 0.2519 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:04,601-Speed 18324.90 samples/sec Loss 8.8454 LearningRate 0.2518 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:39:08,995-Speed 18654.91 samples/sec Loss 8.8371 LearningRate 0.2518 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:39:13,395-Speed 18624.77 samples/sec Loss 8.8269 LearningRate 0.2517 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:17,857-Speed 18364.61 samples/sec Loss 8.8694 LearningRate 0.2516 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:22,297-Speed 18453.51 samples/sec Loss 8.8475 LearningRate 0.2516 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:26,755-Speed 18385.92 samples/sec Loss 8.8745 LearningRate 0.2515 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:31,179-Speed 18520.53 samples/sec Loss 8.8221 LearningRate 0.2514 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:35,653-Speed 18313.07 samples/sec Loss 8.8303 LearningRate 0.2514 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:40,090-Speed 18467.81 samples/sec Loss 8.8293 LearningRate 0.2513 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:44,527-Speed 18471.18 samples/sec Loss 8.8246 LearningRate 0.2512 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:48,956-Speed 18499.75 samples/sec Loss 8.8135 LearningRate 0.2512 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:53,459-Speed 18200.09 samples/sec Loss 8.8293 LearningRate 0.2511 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:39:57,917-Speed 18378.88 samples/sec Loss 8.8192 LearningRate 0.2510 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:40:02,383-Speed 18349.42 samples/sec Loss 8.8616 LearningRate 0.2510 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:40:06,843-Speed 18376.93 samples/sec Loss 8.8396 LearningRate 0.2509 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:40:11,243-Speed 18620.41 samples/sec Loss 8.8001 LearningRate 0.2508 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:40:15,655-Speed 18579.25 samples/sec Loss 8.8906 LearningRate 0.2507 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:20,076-Speed 18538.10 samples/sec Loss 8.8536 LearningRate 0.2507 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:24,518-Speed 18447.21 samples/sec Loss 8.8978 LearningRate 0.2506 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:29,034-Speed 18146.57 samples/sec Loss 8.8716 LearningRate 0.2505 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:33,543-Speed 18173.11 samples/sec Loss 8.8272 LearningRate 0.2505 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:37,960-Speed 18553.08 samples/sec Loss 8.8023 LearningRate 0.2504 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:42,388-Speed 18508.28 samples/sec Loss 8.7849 LearningRate 0.2503 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:46,893-Speed 18187.21 samples/sec Loss 8.7541 LearningRate 0.2503 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:51,327-Speed 18480.38 samples/sec Loss 8.8132 LearningRate 0.2502 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:40:55,771-Speed 18442.70 samples/sec Loss 8.8170 LearningRate 0.2501 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:00,179-Speed 18589.87 samples/sec Loss 8.7751 LearningRate 0.2501 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:41:04,595-Speed 18566.20 samples/sec Loss 8.7822 LearningRate 0.2500 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:09,058-Speed 18359.03 samples/sec Loss 8.8867 LearningRate 0.2499 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:13,513-Speed 18393.90 samples/sec Loss 8.8042 LearningRate 0.2499 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:17,938-Speed 18515.95 samples/sec Loss 8.8259 LearningRate 0.2498 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:22,359-Speed 18535.47 samples/sec Loss 8.8037 LearningRate 0.2497 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:26,818-Speed 18380.77 samples/sec Loss 8.8269 LearningRate 0.2497 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:31,232-Speed 18562.08 samples/sec Loss 8.8072 LearningRate 0.2496 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:35,650-Speed 18545.29 samples/sec Loss 8.8488 LearningRate 0.2495 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:40,175-Speed 18111.26 samples/sec Loss 8.7628 LearningRate 0.2495 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:44,617-Speed 18448.74 samples/sec Loss 8.7923 LearningRate 0.2494 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:49,009-Speed 18655.47 samples/sec Loss 8.8577 LearningRate 0.2493 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:41:53,464-Speed 18392.20 samples/sec Loss 8.8235 LearningRate 0.2493 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:41:57,888-Speed 18528.92 samples/sec Loss 8.8089 LearningRate 0.2492 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:02,364-Speed 18310.70 samples/sec Loss 8.8281 LearningRate 0.2491 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:06,876-Speed 18162.62 samples/sec Loss 8.7964 LearningRate 0.2491 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:11,303-Speed 18507.47 samples/sec Loss 8.7796 LearningRate 0.2490 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:15,692-Speed 18672.18 samples/sec Loss 8.8159 LearningRate 0.2489 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:20,167-Speed 18315.36 samples/sec Loss 8.8138 LearningRate 0.2489 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:24,569-Speed 18614.35 samples/sec Loss 8.7772 LearningRate 0.2488 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:29,039-Speed 18332.79 samples/sec Loss 8.8131 LearningRate 0.2487 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:33,452-Speed 18565.13 samples/sec Loss 8.7582 LearningRate 0.2486 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:42:37,895-Speed 18448.16 samples/sec Loss 8.7731 LearningRate 0.2486 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:42:42,406-Speed 18166.20 samples/sec Loss 8.7804 LearningRate 0.2485 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:42:46,876-Speed 18337.25 samples/sec Loss 8.8047 LearningRate 0.2484 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:42:51,324-Speed 18420.67 samples/sec Loss 8.7878 LearningRate 0.2484 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:42:55,771-Speed 18424.82 samples/sec Loss 8.8089 LearningRate 0.2483 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:43:00,186-Speed 18561.00 samples/sec Loss 8.7984 LearningRate 0.2482 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:43:04,622-Speed 18475.54 samples/sec Loss 8.8068 LearningRate 0.2482 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:43:09,042-Speed 18537.11 samples/sec Loss 8.7858 LearningRate 0.2481 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:43:13,484-Speed 18451.63 samples/sec Loss 8.8088 LearningRate 0.2480 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:43:17,930-Speed 18426.29 samples/sec Loss 8.7917 LearningRate 0.2480 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:43:22,379-Speed 18418.48 samples/sec Loss 8.8001 LearningRate 0.2479 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:26,827-Speed 18421.72 samples/sec Loss 8.7976 LearningRate 0.2478 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:31,254-Speed 18510.09 samples/sec Loss 8.8238 LearningRate 0.2478 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:35,667-Speed 18568.86 samples/sec Loss 8.7557 LearningRate 0.2477 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:40,085-Speed 18548.15 samples/sec Loss 8.7699 LearningRate 0.2476 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:44,538-Speed 18401.41 samples/sec Loss 8.8479 LearningRate 0.2476 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:48,952-Speed 18561.56 samples/sec Loss 8.7545 LearningRate 0.2475 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:53,443-Speed 18247.39 samples/sec Loss 8.7636 LearningRate 0.2474 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:43:57,872-Speed 18503.02 samples/sec Loss 8.7306 LearningRate 0.2474 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:44:02,319-Speed 18426.42 samples/sec Loss 8.7566 LearningRate 0.2473 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:44:11,325-Speed 9097.27 samples/sec Loss 8.7914 LearningRate 0.2472 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:15,783-Speed 18382.67 samples/sec Loss 8.7688 LearningRate 0.2472 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:20,216-Speed 18480.30 samples/sec Loss 8.7713 LearningRate 0.2471 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:24,625-Speed 18587.87 samples/sec Loss 8.7932 LearningRate 0.2470 Epoch: 5 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:29,030-Speed 18603.87 samples/sec Loss 8.8544 LearningRate 0.2470 Epoch: 5 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:33,457-Speed 18509.18 samples/sec Loss 8.7817 LearningRate 0.2469 Epoch: 5 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:37,932-Speed 18312.87 samples/sec Loss 8.8063 LearningRate 0.2468 Epoch: 5 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:42,314-Speed 18699.76 samples/sec Loss 8.7663 LearningRate 0.2468 Epoch: 5 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:46,827-Speed 18159.56 samples/sec Loss 8.7389 LearningRate 0.2467 Epoch: 5 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:51,315-Speed 18255.99 samples/sec Loss 8.7620 LearningRate 0.2466 Epoch: 5 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:44:55,741-Speed 18514.61 samples/sec Loss 8.7715 LearningRate 0.2466 Epoch: 5 Global Step: 30420 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-14 01:45:00,164-Speed 18526.54 samples/sec Loss 8.7183 LearningRate 0.2465 Epoch: 5 Global Step: 30430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:45:04,612-Speed 18423.13 samples/sec Loss 8.7240 LearningRate 0.2464 Epoch: 5 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:45:09,033-Speed 18534.40 samples/sec Loss 8.7326 LearningRate 0.2464 Epoch: 5 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:45:13,464-Speed 18493.27 samples/sec Loss 8.7622 LearningRate 0.2463 Epoch: 5 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:45:17,884-Speed 18541.22 samples/sec Loss 8.7731 LearningRate 0.2462 Epoch: 5 Global Step: 30470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:22,281-Speed 18634.29 samples/sec Loss 8.7222 LearningRate 0.2462 Epoch: 5 Global Step: 30480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:26,691-Speed 18585.79 samples/sec Loss 8.8076 LearningRate 0.2461 Epoch: 5 Global Step: 30490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:31,112-Speed 18531.88 samples/sec Loss 8.7791 LearningRate 0.2460 Epoch: 5 Global Step: 30500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:35,569-Speed 18387.69 samples/sec Loss 8.8151 LearningRate 0.2460 Epoch: 5 Global Step: 30510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:39,999-Speed 18494.39 samples/sec Loss 8.7433 LearningRate 0.2459 Epoch: 5 Global Step: 30520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:44,458-Speed 18379.17 samples/sec Loss 8.7822 LearningRate 0.2458 Epoch: 5 Global Step: 30530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:48,895-Speed 18463.05 samples/sec Loss 8.7985 LearningRate 0.2458 Epoch: 5 Global Step: 30540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:53,306-Speed 18579.69 samples/sec Loss 8.7511 LearningRate 0.2457 Epoch: 5 Global Step: 30550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:45:57,754-Speed 18420.46 samples/sec Loss 8.7428 LearningRate 0.2456 Epoch: 5 Global Step: 30560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:46:02,213-Speed 18374.27 samples/sec Loss 8.7852 LearningRate 0.2455 Epoch: 5 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:06,636-Speed 18529.26 samples/sec Loss 8.7473 LearningRate 0.2455 Epoch: 5 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:11,041-Speed 18600.97 samples/sec Loss 8.7630 LearningRate 0.2454 Epoch: 5 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:15,536-Speed 18231.85 samples/sec Loss 8.7989 LearningRate 0.2453 Epoch: 5 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:19,983-Speed 18421.85 samples/sec Loss 8.7668 LearningRate 0.2453 Epoch: 5 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:24,447-Speed 18358.21 samples/sec Loss 8.7282 LearningRate 0.2452 Epoch: 5 Global Step: 30620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:28,878-Speed 18490.78 samples/sec Loss 8.7346 LearningRate 0.2451 Epoch: 5 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:33,294-Speed 18555.55 samples/sec Loss 8.7784 LearningRate 0.2451 Epoch: 5 Global Step: 30640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:37,752-Speed 18384.27 samples/sec Loss 8.7210 LearningRate 0.2450 Epoch: 5 Global Step: 30650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:42,198-Speed 18429.78 samples/sec Loss 8.7147 LearningRate 0.2449 Epoch: 5 Global Step: 30660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:46:46,690-Speed 18243.13 samples/sec Loss 8.7410 LearningRate 0.2449 Epoch: 5 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:46:51,142-Speed 18404.30 samples/sec Loss 8.7373 LearningRate 0.2448 Epoch: 5 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:46:55,650-Speed 18177.34 samples/sec Loss 8.7324 LearningRate 0.2447 Epoch: 5 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:00,089-Speed 18464.57 samples/sec Loss 8.8020 LearningRate 0.2447 Epoch: 5 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:04,599-Speed 18174.91 samples/sec Loss 8.7425 LearningRate 0.2446 Epoch: 5 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:09,016-Speed 18549.92 samples/sec Loss 8.7287 LearningRate 0.2445 Epoch: 5 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:13,433-Speed 18556.62 samples/sec Loss 8.7448 LearningRate 0.2445 Epoch: 5 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:17,911-Speed 18301.69 samples/sec Loss 8.6897 LearningRate 0.2444 Epoch: 5 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:22,358-Speed 18425.65 samples/sec Loss 8.7371 LearningRate 0.2443 Epoch: 5 Global Step: 30750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:26,831-Speed 18320.71 samples/sec Loss 8.7297 LearningRate 0.2443 Epoch: 5 Global Step: 30760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:47:31,241-Speed 18581.87 samples/sec Loss 8.7695 LearningRate 0.2442 Epoch: 5 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:47:35,731-Speed 18248.47 samples/sec Loss 8.7256 LearningRate 0.2441 Epoch: 5 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:47:40,221-Speed 18252.14 samples/sec Loss 8.6861 LearningRate 0.2441 Epoch: 5 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:47:44,679-Speed 18380.01 samples/sec Loss 8.7496 LearningRate 0.2440 Epoch: 5 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:47:49,153-Speed 18315.18 samples/sec Loss 8.7861 LearningRate 0.2439 Epoch: 5 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:47:53,567-Speed 18561.50 samples/sec Loss 8.7849 LearningRate 0.2439 Epoch: 5 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:47:57,994-Speed 18512.84 samples/sec Loss 8.7875 LearningRate 0.2438 Epoch: 5 Global Step: 30830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:02,406-Speed 18573.30 samples/sec Loss 8.7622 LearningRate 0.2437 Epoch: 5 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:06,820-Speed 18565.93 samples/sec Loss 8.6920 LearningRate 0.2437 Epoch: 5 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:11,246-Speed 18511.71 samples/sec Loss 8.7224 LearningRate 0.2436 Epoch: 5 Global Step: 30860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:15,678-Speed 18491.00 samples/sec Loss 8.7345 LearningRate 0.2435 Epoch: 5 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:48:20,093-Speed 18559.59 samples/sec Loss 8.6832 LearningRate 0.2435 Epoch: 5 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:48:24,505-Speed 18569.65 samples/sec Loss 8.7548 LearningRate 0.2434 Epoch: 5 Global Step: 30890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:28,922-Speed 18550.39 samples/sec Loss 8.7322 LearningRate 0.2433 Epoch: 5 Global Step: 30900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:33,348-Speed 18513.87 samples/sec Loss 8.7003 LearningRate 0.2433 Epoch: 5 Global Step: 30910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:37,797-Speed 18422.17 samples/sec Loss 8.6881 LearningRate 0.2432 Epoch: 5 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:42,268-Speed 18328.00 samples/sec Loss 8.6735 LearningRate 0.2431 Epoch: 5 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:46,704-Speed 18474.76 samples/sec Loss 8.7158 LearningRate 0.2431 Epoch: 5 Global Step: 30940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:51,201-Speed 18218.58 samples/sec Loss 8.7330 LearningRate 0.2430 Epoch: 5 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:48:55,640-Speed 18460.92 samples/sec Loss 8.6892 LearningRate 0.2429 Epoch: 5 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:00,083-Speed 18441.24 samples/sec Loss 8.7263 LearningRate 0.2429 Epoch: 5 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:04,501-Speed 18552.87 samples/sec Loss 8.7214 LearningRate 0.2428 Epoch: 5 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:08,938-Speed 18470.49 samples/sec Loss 8.7335 LearningRate 0.2427 Epoch: 5 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:13,370-Speed 18486.54 samples/sec Loss 8.7314 LearningRate 0.2427 Epoch: 5 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:17,844-Speed 18314.64 samples/sec Loss 8.7410 LearningRate 0.2426 Epoch: 5 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:22,266-Speed 18530.90 samples/sec Loss 8.7092 LearningRate 0.2425 Epoch: 5 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:26,751-Speed 18269.28 samples/sec Loss 8.6791 LearningRate 0.2425 Epoch: 5 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:31,178-Speed 18512.11 samples/sec Loss 8.7158 LearningRate 0.2424 Epoch: 5 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:35,624-Speed 18430.83 samples/sec Loss 8.7645 LearningRate 0.2423 Epoch: 5 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:40,056-Speed 18492.05 samples/sec Loss 8.7209 LearningRate 0.2423 Epoch: 5 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:44,494-Speed 18463.36 samples/sec Loss 8.7077 LearningRate 0.2422 Epoch: 5 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:48,905-Speed 18588.38 samples/sec Loss 8.7116 LearningRate 0.2421 Epoch: 5 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:49:53,320-Speed 18562.34 samples/sec Loss 8.7417 LearningRate 0.2421 Epoch: 5 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:49:57,747-Speed 18510.78 samples/sec Loss 8.7470 LearningRate 0.2420 Epoch: 5 Global Step: 31100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:50:02,187-Speed 18457.58 samples/sec Loss 8.7492 LearningRate 0.2419 Epoch: 5 Global Step: 31110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:50:22,393-Speed 4058.51 samples/sec Loss 8.7150 LearningRate 0.2419 Epoch: 6 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:50:26,783-Speed 18666.66 samples/sec Loss 8.7020 LearningRate 0.2418 Epoch: 6 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:50:31,250-Speed 18343.25 samples/sec Loss 8.6453 LearningRate 0.2417 Epoch: 6 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:50:35,678-Speed 18507.59 samples/sec Loss 8.6642 LearningRate 0.2417 Epoch: 6 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:50:40,087-Speed 18590.11 samples/sec Loss 8.6733 LearningRate 0.2416 Epoch: 6 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:50:44,476-Speed 18683.08 samples/sec Loss 8.6776 LearningRate 0.2415 Epoch: 6 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:50:48,872-Speed 18639.13 samples/sec Loss 8.7494 LearningRate 0.2415 Epoch: 6 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:50:53,415-Speed 18036.12 samples/sec Loss 8.7076 LearningRate 0.2414 Epoch: 6 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:50:57,814-Speed 18629.41 samples/sec Loss 8.7111 LearningRate 0.2413 Epoch: 6 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:51:02,192-Speed 18713.90 samples/sec Loss 8.6339 LearningRate 0.2413 Epoch: 6 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:51:06,583-Speed 18662.97 samples/sec Loss 8.6903 LearningRate 0.2412 Epoch: 6 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:51:11,001-Speed 18546.81 samples/sec Loss 8.6647 LearningRate 0.2411 Epoch: 6 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:51:15,435-Speed 18485.15 samples/sec Loss 8.6948 LearningRate 0.2411 Epoch: 6 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:51:19,916-Speed 18288.50 samples/sec Loss 8.6888 LearningRate 0.2410 Epoch: 6 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:51:24,324-Speed 18591.08 samples/sec Loss 8.6602 LearningRate 0.2409 Epoch: 6 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:51:28,814-Speed 18249.52 samples/sec Loss 8.7321 LearningRate 0.2409 Epoch: 6 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:51:33,205-Speed 18661.56 samples/sec Loss 8.6981 LearningRate 0.2408 Epoch: 6 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:51:37,646-Speed 18452.15 samples/sec Loss 8.6228 LearningRate 0.2407 Epoch: 6 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:51:42,091-Speed 18430.02 samples/sec Loss 8.6692 LearningRate 0.2407 Epoch: 6 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:51:46,547-Speed 18390.71 samples/sec Loss 8.6550 LearningRate 0.2406 Epoch: 6 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:51:51,028-Speed 18287.76 samples/sec Loss 8.6928 LearningRate 0.2405 Epoch: 6 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:51:55,405-Speed 18719.20 samples/sec Loss 8.6674 LearningRate 0.2405 Epoch: 6 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:51:59,797-Speed 18658.35 samples/sec Loss 8.6899 LearningRate 0.2404 Epoch: 6 Global Step: 31340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:52:04,203-Speed 18597.72 samples/sec Loss 8.6922 LearningRate 0.2403 Epoch: 6 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:08,699-Speed 18222.35 samples/sec Loss 8.7116 LearningRate 0.2403 Epoch: 6 Global Step: 31360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:13,161-Speed 18365.29 samples/sec Loss 8.6996 LearningRate 0.2402 Epoch: 6 Global Step: 31370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:17,598-Speed 18469.67 samples/sec Loss 8.7011 LearningRate 0.2401 Epoch: 6 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:21,987-Speed 18670.18 samples/sec Loss 8.7045 LearningRate 0.2401 Epoch: 6 Global Step: 31390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:26,396-Speed 18585.69 samples/sec Loss 8.6627 LearningRate 0.2400 Epoch: 6 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:30,807-Speed 18578.07 samples/sec Loss 8.6839 LearningRate 0.2399 Epoch: 6 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:35,213-Speed 18597.77 samples/sec Loss 8.6643 LearningRate 0.2399 Epoch: 6 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:39,680-Speed 18345.70 samples/sec Loss 8.7117 LearningRate 0.2398 Epoch: 6 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:44,070-Speed 18668.31 samples/sec Loss 8.7016 LearningRate 0.2397 Epoch: 6 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:52:48,508-Speed 18462.29 samples/sec Loss 8.6893 LearningRate 0.2397 Epoch: 6 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:52:52,893-Speed 18686.91 samples/sec Loss 8.6696 LearningRate 0.2396 Epoch: 6 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:52:57,310-Speed 18552.21 samples/sec Loss 8.6474 LearningRate 0.2395 Epoch: 6 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:53:01,738-Speed 18511.72 samples/sec Loss 8.6707 LearningRate 0.2395 Epoch: 6 Global Step: 31480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:06,161-Speed 18531.04 samples/sec Loss 8.6913 LearningRate 0.2394 Epoch: 6 Global Step: 31490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:10,580-Speed 18546.97 samples/sec Loss 8.6487 LearningRate 0.2393 Epoch: 6 Global Step: 31500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:14,994-Speed 18566.24 samples/sec Loss 8.7001 LearningRate 0.2393 Epoch: 6 Global Step: 31510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:19,398-Speed 18609.29 samples/sec Loss 8.6600 LearningRate 0.2392 Epoch: 6 Global Step: 31520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:23,847-Speed 18415.52 samples/sec Loss 8.6857 LearningRate 0.2391 Epoch: 6 Global Step: 31530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:28,280-Speed 18485.32 samples/sec Loss 8.6538 LearningRate 0.2391 Epoch: 6 Global Step: 31540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:32,745-Speed 18351.19 samples/sec Loss 8.6943 LearningRate 0.2390 Epoch: 6 Global Step: 31550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:37,163-Speed 18547.33 samples/sec Loss 8.6577 LearningRate 0.2389 Epoch: 6 Global Step: 31560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:41,574-Speed 18581.22 samples/sec Loss 8.6438 LearningRate 0.2389 Epoch: 6 Global Step: 31570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:53:45,964-Speed 18663.22 samples/sec Loss 8.6285 LearningRate 0.2388 Epoch: 6 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:53:50,386-Speed 18530.89 samples/sec Loss 8.6543 LearningRate 0.2387 Epoch: 6 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:53:54,807-Speed 18533.11 samples/sec Loss 8.6894 LearningRate 0.2387 Epoch: 6 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:04,051-Speed 8863.08 samples/sec Loss 8.7043 LearningRate 0.2386 Epoch: 6 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:08,504-Speed 18403.90 samples/sec Loss 8.7200 LearningRate 0.2385 Epoch: 6 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:12,936-Speed 18488.25 samples/sec Loss 8.6600 LearningRate 0.2385 Epoch: 6 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:17,348-Speed 18571.40 samples/sec Loss 8.6464 LearningRate 0.2384 Epoch: 6 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:21,802-Speed 18401.31 samples/sec Loss 8.6095 LearningRate 0.2383 Epoch: 6 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:26,198-Speed 18637.15 samples/sec Loss 8.6561 LearningRate 0.2383 Epoch: 6 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:30,608-Speed 18580.13 samples/sec Loss 8.6705 LearningRate 0.2382 Epoch: 6 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:54:35,074-Speed 18350.40 samples/sec Loss 8.6292 LearningRate 0.2381 Epoch: 6 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:54:39,498-Speed 18520.52 samples/sec Loss 8.6433 LearningRate 0.2381 Epoch: 6 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:54:43,902-Speed 18607.85 samples/sec Loss 8.6465 LearningRate 0.2380 Epoch: 6 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:54:48,306-Speed 18607.69 samples/sec Loss 8.6465 LearningRate 0.2380 Epoch: 6 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:54:52,770-Speed 18358.50 samples/sec Loss 8.6530 LearningRate 0.2379 Epoch: 6 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:54:57,189-Speed 18542.43 samples/sec Loss 8.6773 LearningRate 0.2378 Epoch: 6 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:55:01,629-Speed 18455.37 samples/sec Loss 8.6572 LearningRate 0.2378 Epoch: 6 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:55:06,033-Speed 18605.53 samples/sec Loss 8.5957 LearningRate 0.2377 Epoch: 6 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:10,453-Speed 18545.19 samples/sec Loss 8.6214 LearningRate 0.2376 Epoch: 6 Global Step: 31760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:14,906-Speed 18398.45 samples/sec Loss 8.6161 LearningRate 0.2376 Epoch: 6 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:19,330-Speed 18522.37 samples/sec Loss 8.6212 LearningRate 0.2375 Epoch: 6 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:23,748-Speed 18546.27 samples/sec Loss 8.6361 LearningRate 0.2374 Epoch: 6 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:28,253-Speed 18188.29 samples/sec Loss 8.6190 LearningRate 0.2374 Epoch: 6 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:32,668-Speed 18560.70 samples/sec Loss 8.6470 LearningRate 0.2373 Epoch: 6 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:37,065-Speed 18638.56 samples/sec Loss 8.6466 LearningRate 0.2372 Epoch: 6 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:41,530-Speed 18349.24 samples/sec Loss 8.5663 LearningRate 0.2372 Epoch: 6 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:45,957-Speed 18509.06 samples/sec Loss 8.6118 LearningRate 0.2371 Epoch: 6 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:55:50,406-Speed 18419.31 samples/sec Loss 8.6316 LearningRate 0.2370 Epoch: 6 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:55:54,855-Speed 18417.64 samples/sec Loss 8.7138 LearningRate 0.2370 Epoch: 6 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:55:59,289-Speed 18481.48 samples/sec Loss 8.6634 LearningRate 0.2369 Epoch: 6 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:56:03,718-Speed 18498.41 samples/sec Loss 8.6627 LearningRate 0.2368 Epoch: 6 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:56:08,137-Speed 18542.25 samples/sec Loss 8.6017 LearningRate 0.2368 Epoch: 6 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:12,649-Speed 18162.59 samples/sec Loss 8.5812 LearningRate 0.2367 Epoch: 6 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:17,086-Speed 18470.48 samples/sec Loss 8.6326 LearningRate 0.2366 Epoch: 6 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:21,520-Speed 18479.67 samples/sec Loss 8.6310 LearningRate 0.2366 Epoch: 6 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:25,959-Speed 18459.23 samples/sec Loss 8.6137 LearningRate 0.2365 Epoch: 6 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:30,405-Speed 18431.21 samples/sec Loss 8.6556 LearningRate 0.2364 Epoch: 6 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:34,956-Speed 18004.69 samples/sec Loss 8.6038 LearningRate 0.2364 Epoch: 6 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:39,380-Speed 18522.73 samples/sec Loss 8.5899 LearningRate 0.2363 Epoch: 6 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:43,784-Speed 18608.51 samples/sec Loss 8.6496 LearningRate 0.2362 Epoch: 6 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:48,207-Speed 18527.00 samples/sec Loss 8.6358 LearningRate 0.2362 Epoch: 6 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:56:52,631-Speed 18522.33 samples/sec Loss 8.6211 LearningRate 0.2361 Epoch: 6 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:56:57,050-Speed 18541.73 samples/sec Loss 8.6854 LearningRate 0.2360 Epoch: 6 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:01,477-Speed 18508.01 samples/sec Loss 8.6487 LearningRate 0.2360 Epoch: 6 Global Step: 32010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:05,924-Speed 18428.69 samples/sec Loss 8.6335 LearningRate 0.2359 Epoch: 6 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:10,337-Speed 18566.26 samples/sec Loss 8.6518 LearningRate 0.2358 Epoch: 6 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:14,781-Speed 18441.31 samples/sec Loss 8.6066 LearningRate 0.2358 Epoch: 6 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:19,214-Speed 18483.03 samples/sec Loss 8.6176 LearningRate 0.2357 Epoch: 6 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:23,625-Speed 18577.37 samples/sec Loss 8.6237 LearningRate 0.2356 Epoch: 6 Global Step: 32060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:28,037-Speed 18568.55 samples/sec Loss 8.5460 LearningRate 0.2356 Epoch: 6 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:32,439-Speed 18618.55 samples/sec Loss 8.6150 LearningRate 0.2355 Epoch: 6 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:36,834-Speed 18643.54 samples/sec Loss 8.5937 LearningRate 0.2354 Epoch: 6 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:41,226-Speed 18658.68 samples/sec Loss 8.5892 LearningRate 0.2354 Epoch: 6 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:57:45,643-Speed 18550.13 samples/sec Loss 8.5864 LearningRate 0.2353 Epoch: 6 Global Step: 32110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:57:50,075-Speed 18491.50 samples/sec Loss 8.6134 LearningRate 0.2352 Epoch: 6 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:57:54,555-Speed 18286.92 samples/sec Loss 8.5903 LearningRate 0.2352 Epoch: 6 Global Step: 32130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:57:58,958-Speed 18610.50 samples/sec Loss 8.6137 LearningRate 0.2351 Epoch: 6 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:03,396-Speed 18465.52 samples/sec Loss 8.5887 LearningRate 0.2351 Epoch: 6 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:07,831-Speed 18475.74 samples/sec Loss 8.5456 LearningRate 0.2350 Epoch: 6 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:12,271-Speed 18455.93 samples/sec Loss 8.5787 LearningRate 0.2349 Epoch: 6 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:16,676-Speed 18599.17 samples/sec Loss 8.6261 LearningRate 0.2349 Epoch: 6 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:21,133-Speed 18387.35 samples/sec Loss 8.6107 LearningRate 0.2348 Epoch: 6 Global Step: 32190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:25,592-Speed 18377.97 samples/sec Loss 8.6205 LearningRate 0.2347 Epoch: 6 Global Step: 32200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:30,031-Speed 18458.59 samples/sec Loss 8.5392 LearningRate 0.2347 Epoch: 6 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:58:34,463-Speed 18488.05 samples/sec Loss 8.6174 LearningRate 0.2346 Epoch: 6 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:38,889-Speed 18514.78 samples/sec Loss 8.6184 LearningRate 0.2345 Epoch: 6 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:43,291-Speed 18618.14 samples/sec Loss 8.5723 LearningRate 0.2345 Epoch: 6 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:47,692-Speed 18614.79 samples/sec Loss 8.6640 LearningRate 0.2344 Epoch: 6 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:52,094-Speed 18618.08 samples/sec Loss 8.5651 LearningRate 0.2343 Epoch: 6 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:58:56,492-Speed 18631.26 samples/sec Loss 8.5805 LearningRate 0.2343 Epoch: 6 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:00,894-Speed 18612.92 samples/sec Loss 8.6105 LearningRate 0.2342 Epoch: 6 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:05,313-Speed 18547.65 samples/sec Loss 8.6237 LearningRate 0.2341 Epoch: 6 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:09,715-Speed 18609.71 samples/sec Loss 8.5886 LearningRate 0.2341 Epoch: 6 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:14,140-Speed 18519.51 samples/sec Loss 8.6290 LearningRate 0.2340 Epoch: 6 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:18,630-Speed 18248.92 samples/sec Loss 8.6040 LearningRate 0.2339 Epoch: 6 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 01:59:23,050-Speed 18542.21 samples/sec Loss 8.6250 LearningRate 0.2339 Epoch: 6 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:27,458-Speed 18588.97 samples/sec Loss 8.6051 LearningRate 0.2338 Epoch: 6 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:31,869-Speed 18573.75 samples/sec Loss 8.5695 LearningRate 0.2337 Epoch: 6 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:36,306-Speed 18469.05 samples/sec Loss 8.5969 LearningRate 0.2337 Epoch: 6 Global Step: 32360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 01:59:40,722-Speed 18556.73 samples/sec Loss 8.6163 LearningRate 0.2336 Epoch: 6 Global Step: 32370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:59:45,154-Speed 18485.92 samples/sec Loss 8.5442 LearningRate 0.2335 Epoch: 6 Global Step: 32380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:59:49,647-Speed 18240.43 samples/sec Loss 8.6112 LearningRate 0.2335 Epoch: 6 Global Step: 32390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:59:54,150-Speed 18199.55 samples/sec Loss 8.6090 LearningRate 0.2334 Epoch: 6 Global Step: 32400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 01:59:58,572-Speed 18526.87 samples/sec Loss 8.6004 LearningRate 0.2333 Epoch: 6 Global Step: 32410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:00:03,008-Speed 18474.07 samples/sec Loss 8.5593 LearningRate 0.2333 Epoch: 6 Global Step: 32420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:00:07,455-Speed 18426.40 samples/sec Loss 8.5480 LearningRate 0.2332 Epoch: 6 Global Step: 32430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:00:11,911-Speed 18389.01 samples/sec Loss 8.5792 LearningRate 0.2331 Epoch: 6 Global Step: 32440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:00:16,328-Speed 18549.84 samples/sec Loss 8.5804 LearningRate 0.2331 Epoch: 6 Global Step: 32450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:00:20,737-Speed 18590.58 samples/sec Loss 8.5627 LearningRate 0.2330 Epoch: 6 Global Step: 32460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:00:25,200-Speed 18359.09 samples/sec Loss 8.6215 LearningRate 0.2330 Epoch: 6 Global Step: 32470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:00:29,641-Speed 18447.64 samples/sec Loss 8.6068 LearningRate 0.2329 Epoch: 6 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:00:34,065-Speed 18522.94 samples/sec Loss 8.6275 LearningRate 0.2328 Epoch: 6 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:00:38,474-Speed 18585.97 samples/sec Loss 8.5706 LearningRate 0.2328 Epoch: 6 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:00:42,917-Speed 18440.61 samples/sec Loss 8.6141 LearningRate 0.2327 Epoch: 6 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:00:47,333-Speed 18557.52 samples/sec Loss 8.6022 LearningRate 0.2326 Epoch: 6 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:00:51,727-Speed 18649.66 samples/sec Loss 8.5901 LearningRate 0.2326 Epoch: 6 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:00:56,179-Speed 18408.61 samples/sec Loss 8.5916 LearningRate 0.2325 Epoch: 6 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:00,589-Speed 18582.03 samples/sec Loss 8.6077 LearningRate 0.2324 Epoch: 6 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:05,010-Speed 18533.53 samples/sec Loss 8.5301 LearningRate 0.2324 Epoch: 6 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:09,407-Speed 18638.68 samples/sec Loss 8.5453 LearningRate 0.2323 Epoch: 6 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:13,833-Speed 18512.77 samples/sec Loss 8.5464 LearningRate 0.2322 Epoch: 6 Global Step: 32580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:18,262-Speed 18499.43 samples/sec Loss 8.5956 LearningRate 0.2322 Epoch: 6 Global Step: 32590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:22,690-Speed 18508.98 samples/sec Loss 8.5997 LearningRate 0.2321 Epoch: 6 Global Step: 32600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:27,114-Speed 18523.20 samples/sec Loss 8.5753 LearningRate 0.2320 Epoch: 6 Global Step: 32610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:31,549-Speed 18473.72 samples/sec Loss 8.6070 LearningRate 0.2320 Epoch: 6 Global Step: 32620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:35,982-Speed 18489.02 samples/sec Loss 8.5740 LearningRate 0.2319 Epoch: 6 Global Step: 32630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:40,374-Speed 18655.56 samples/sec Loss 8.5428 LearningRate 0.2318 Epoch: 6 Global Step: 32640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:44,839-Speed 18352.27 samples/sec Loss 8.5913 LearningRate 0.2318 Epoch: 6 Global Step: 32650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:49,256-Speed 18554.29 samples/sec Loss 8.5870 LearningRate 0.2317 Epoch: 6 Global Step: 32660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:01:53,665-Speed 18592.89 samples/sec Loss 8.5746 LearningRate 0.2316 Epoch: 6 Global Step: 32670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:01:58,081-Speed 18555.02 samples/sec Loss 8.6290 LearningRate 0.2316 Epoch: 6 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:02,511-Speed 18496.93 samples/sec Loss 8.5828 LearningRate 0.2315 Epoch: 6 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:06,964-Speed 18402.17 samples/sec Loss 8.6118 LearningRate 0.2314 Epoch: 6 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:11,385-Speed 18534.49 samples/sec Loss 8.5514 LearningRate 0.2314 Epoch: 6 Global Step: 32710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:15,812-Speed 18509.17 samples/sec Loss 8.5373 LearningRate 0.2313 Epoch: 6 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:20,229-Speed 18549.57 samples/sec Loss 8.5385 LearningRate 0.2313 Epoch: 6 Global Step: 32730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:24,647-Speed 18548.75 samples/sec Loss 8.5127 LearningRate 0.2312 Epoch: 6 Global Step: 32740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:29,086-Speed 18455.54 samples/sec Loss 8.5018 LearningRate 0.2311 Epoch: 6 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:33,484-Speed 18632.85 samples/sec Loss 8.4826 LearningRate 0.2311 Epoch: 6 Global Step: 32760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:37,889-Speed 18605.84 samples/sec Loss 8.5991 LearningRate 0.2310 Epoch: 6 Global Step: 32770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:02:42,313-Speed 18526.00 samples/sec Loss 8.5363 LearningRate 0.2309 Epoch: 6 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:02:46,748-Speed 18475.11 samples/sec Loss 8.5024 LearningRate 0.2309 Epoch: 6 Global Step: 32790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:02:51,196-Speed 18423.52 samples/sec Loss 8.5343 LearningRate 0.2308 Epoch: 6 Global Step: 32800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:02:55,621-Speed 18516.52 samples/sec Loss 8.5370 LearningRate 0.2307 Epoch: 6 Global Step: 32810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:00,047-Speed 18515.61 samples/sec Loss 8.5821 LearningRate 0.2307 Epoch: 6 Global Step: 32820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:04,496-Speed 18416.12 samples/sec Loss 8.5575 LearningRate 0.2306 Epoch: 6 Global Step: 32830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:08,903-Speed 18597.38 samples/sec Loss 8.5136 LearningRate 0.2305 Epoch: 6 Global Step: 32840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:13,350-Speed 18434.59 samples/sec Loss 8.5736 LearningRate 0.2305 Epoch: 6 Global Step: 32850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:17,758-Speed 18587.74 samples/sec Loss 8.5455 LearningRate 0.2304 Epoch: 6 Global Step: 32860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:22,201-Speed 18442.57 samples/sec Loss 8.5454 LearningRate 0.2303 Epoch: 6 Global Step: 32870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:26,651-Speed 18419.43 samples/sec Loss 8.5757 LearningRate 0.2303 Epoch: 6 Global Step: 32880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:03:31,059-Speed 18591.08 samples/sec Loss 8.5600 LearningRate 0.2302 Epoch: 6 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:03:35,507-Speed 18424.59 samples/sec Loss 8.5835 LearningRate 0.2301 Epoch: 6 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:03:39,926-Speed 18542.08 samples/sec Loss 8.5265 LearningRate 0.2301 Epoch: 6 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:03:44,338-Speed 18574.30 samples/sec Loss 8.4841 LearningRate 0.2300 Epoch: 6 Global Step: 32920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:03:48,772-Speed 18479.64 samples/sec Loss 8.5251 LearningRate 0.2300 Epoch: 6 Global Step: 32930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:03:53,203-Speed 18492.92 samples/sec Loss 8.5808 LearningRate 0.2299 Epoch: 6 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:00,985-Speed 10528.36 samples/sec Loss 8.5852 LearningRate 0.2298 Epoch: 6 Global Step: 32950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:05,386-Speed 18621.21 samples/sec Loss 8.5049 LearningRate 0.2298 Epoch: 6 Global Step: 32960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:09,783-Speed 18636.56 samples/sec Loss 8.5427 LearningRate 0.2297 Epoch: 6 Global Step: 32970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:14,178-Speed 18646.93 samples/sec Loss 8.5657 LearningRate 0.2296 Epoch: 6 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:18,600-Speed 18530.36 samples/sec Loss 8.5370 LearningRate 0.2296 Epoch: 6 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:04:23,008-Speed 18590.05 samples/sec Loss 8.5132 LearningRate 0.2295 Epoch: 6 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:04:27,444-Speed 18470.20 samples/sec Loss 8.5238 LearningRate 0.2294 Epoch: 6 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:04:31,838-Speed 18647.51 samples/sec Loss 8.5179 LearningRate 0.2294 Epoch: 6 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:36,236-Speed 18635.07 samples/sec Loss 8.4807 LearningRate 0.2293 Epoch: 6 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:40,669-Speed 18481.18 samples/sec Loss 8.5165 LearningRate 0.2292 Epoch: 6 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:45,064-Speed 18647.70 samples/sec Loss 8.5451 LearningRate 0.2292 Epoch: 6 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:49,468-Speed 18610.04 samples/sec Loss 8.5373 LearningRate 0.2291 Epoch: 6 Global Step: 33060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:53,852-Speed 18697.64 samples/sec Loss 8.5052 LearningRate 0.2290 Epoch: 6 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:04:58,239-Speed 18677.35 samples/sec Loss 8.5137 LearningRate 0.2290 Epoch: 6 Global Step: 33080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:02,669-Speed 18501.11 samples/sec Loss 8.5489 LearningRate 0.2289 Epoch: 6 Global Step: 33090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:07,055-Speed 18684.22 samples/sec Loss 8.5388 LearningRate 0.2288 Epoch: 6 Global Step: 33100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:11,457-Speed 18615.89 samples/sec Loss 8.5080 LearningRate 0.2288 Epoch: 6 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:15,842-Speed 18695.29 samples/sec Loss 8.5329 LearningRate 0.2287 Epoch: 6 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:05:20,226-Speed 18691.93 samples/sec Loss 8.5359 LearningRate 0.2287 Epoch: 6 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:05:24,629-Speed 18609.78 samples/sec Loss 8.4959 LearningRate 0.2286 Epoch: 6 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:05:29,014-Speed 18683.91 samples/sec Loss 8.5027 LearningRate 0.2285 Epoch: 6 Global Step: 33150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:33,408-Speed 18650.93 samples/sec Loss 8.5177 LearningRate 0.2285 Epoch: 6 Global Step: 33160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:37,805-Speed 18634.19 samples/sec Loss 8.5298 LearningRate 0.2284 Epoch: 6 Global Step: 33170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:42,200-Speed 18646.99 samples/sec Loss 8.4961 LearningRate 0.2283 Epoch: 6 Global Step: 33180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:46,590-Speed 18662.19 samples/sec Loss 8.5096 LearningRate 0.2283 Epoch: 6 Global Step: 33190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:50,999-Speed 18584.24 samples/sec Loss 8.5238 LearningRate 0.2282 Epoch: 6 Global Step: 33200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:05:55,408-Speed 18586.48 samples/sec Loss 8.5059 LearningRate 0.2281 Epoch: 6 Global Step: 33210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:05:59,887-Speed 18296.55 samples/sec Loss 8.5248 LearningRate 0.2281 Epoch: 6 Global Step: 33220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:04,330-Speed 18442.12 samples/sec Loss 8.4917 LearningRate 0.2280 Epoch: 6 Global Step: 33230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:08,790-Speed 18372.06 samples/sec Loss 8.4780 LearningRate 0.2279 Epoch: 6 Global Step: 33240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:13,234-Speed 18438.49 samples/sec Loss 8.5123 LearningRate 0.2279 Epoch: 6 Global Step: 33250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:17,695-Speed 18368.93 samples/sec Loss 8.5087 LearningRate 0.2278 Epoch: 6 Global Step: 33260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:22,124-Speed 18500.00 samples/sec Loss 8.4910 LearningRate 0.2277 Epoch: 6 Global Step: 33270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:26,663-Speed 18052.69 samples/sec Loss 8.5054 LearningRate 0.2277 Epoch: 6 Global Step: 33280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:31,111-Speed 18420.08 samples/sec Loss 8.4704 LearningRate 0.2276 Epoch: 6 Global Step: 33290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:35,597-Speed 18264.69 samples/sec Loss 8.4653 LearningRate 0.2276 Epoch: 6 Global Step: 33300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:06:39,996-Speed 18626.57 samples/sec Loss 8.5249 LearningRate 0.2275 Epoch: 6 Global Step: 33310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:06:44,410-Speed 18564.35 samples/sec Loss 8.5241 LearningRate 0.2274 Epoch: 6 Global Step: 33320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:06:48,865-Speed 18393.58 samples/sec Loss 8.4615 LearningRate 0.2274 Epoch: 6 Global Step: 33330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:06:53,294-Speed 18502.68 samples/sec Loss 8.4948 LearningRate 0.2273 Epoch: 6 Global Step: 33340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:06:57,721-Speed 18508.55 samples/sec Loss 8.5306 LearningRate 0.2272 Epoch: 6 Global Step: 33350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:02,130-Speed 18584.37 samples/sec Loss 8.5181 LearningRate 0.2272 Epoch: 6 Global Step: 33360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:06,541-Speed 18580.41 samples/sec Loss 8.5164 LearningRate 0.2271 Epoch: 6 Global Step: 33370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:10,921-Speed 18707.78 samples/sec Loss 8.5137 LearningRate 0.2270 Epoch: 6 Global Step: 33380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:15,326-Speed 18601.58 samples/sec Loss 8.5477 LearningRate 0.2270 Epoch: 6 Global Step: 33390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:19,757-Speed 18493.90 samples/sec Loss 8.5594 LearningRate 0.2269 Epoch: 6 Global Step: 33400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:24,162-Speed 18603.55 samples/sec Loss 8.5188 LearningRate 0.2268 Epoch: 6 Global Step: 33410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:28,542-Speed 18708.64 samples/sec Loss 8.4575 LearningRate 0.2268 Epoch: 6 Global Step: 33420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:32,967-Speed 18519.63 samples/sec Loss 8.4433 LearningRate 0.2267 Epoch: 6 Global Step: 33430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:37,372-Speed 18600.77 samples/sec Loss 8.5041 LearningRate 0.2266 Epoch: 6 Global Step: 33440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:07:41,770-Speed 18642.21 samples/sec Loss 8.4944 LearningRate 0.2266 Epoch: 6 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:07:46,176-Speed 18594.56 samples/sec Loss 8.5273 LearningRate 0.2265 Epoch: 6 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:07:50,647-Speed 18331.84 samples/sec Loss 8.5036 LearningRate 0.2265 Epoch: 6 Global Step: 33470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:07:55,080-Speed 18480.96 samples/sec Loss 8.4832 LearningRate 0.2264 Epoch: 6 Global Step: 33480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:07:59,522-Speed 18449.28 samples/sec Loss 8.4560 LearningRate 0.2263 Epoch: 6 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:03,931-Speed 18588.88 samples/sec Loss 8.4849 LearningRate 0.2263 Epoch: 6 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:08,351-Speed 18536.45 samples/sec Loss 8.5133 LearningRate 0.2262 Epoch: 6 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:12,753-Speed 18616.93 samples/sec Loss 8.5093 LearningRate 0.2261 Epoch: 6 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:17,156-Speed 18611.18 samples/sec Loss 8.4509 LearningRate 0.2261 Epoch: 6 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:21,584-Speed 18506.33 samples/sec Loss 8.5056 LearningRate 0.2260 Epoch: 6 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:25,984-Speed 18622.97 samples/sec Loss 8.4856 LearningRate 0.2259 Epoch: 6 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:08:30,370-Speed 18683.86 samples/sec Loss 8.5146 LearningRate 0.2259 Epoch: 6 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:08:34,794-Speed 18520.06 samples/sec Loss 8.4686 LearningRate 0.2258 Epoch: 6 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:08:39,208-Speed 18564.60 samples/sec Loss 8.4860 LearningRate 0.2257 Epoch: 6 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:08:43,710-Speed 18203.99 samples/sec Loss 8.4589 LearningRate 0.2257 Epoch: 6 Global Step: 33590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:48,180-Speed 18331.04 samples/sec Loss 8.4636 LearningRate 0.2256 Epoch: 6 Global Step: 33600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:52,617-Speed 18469.37 samples/sec Loss 8.5157 LearningRate 0.2256 Epoch: 6 Global Step: 33610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:08:57,018-Speed 18617.51 samples/sec Loss 8.4688 LearningRate 0.2255 Epoch: 6 Global Step: 33620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:01,396-Speed 18720.01 samples/sec Loss 8.5125 LearningRate 0.2254 Epoch: 6 Global Step: 33630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:05,846-Speed 18415.65 samples/sec Loss 8.5026 LearningRate 0.2254 Epoch: 6 Global Step: 33640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:10,281-Speed 18474.27 samples/sec Loss 8.4545 LearningRate 0.2253 Epoch: 6 Global Step: 33650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:14,737-Speed 18392.41 samples/sec Loss 8.5027 LearningRate 0.2252 Epoch: 6 Global Step: 33660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:19,238-Speed 18203.73 samples/sec Loss 8.5170 LearningRate 0.2252 Epoch: 6 Global Step: 33670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:23,645-Speed 18594.15 samples/sec Loss 8.4852 LearningRate 0.2251 Epoch: 6 Global Step: 33680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:28,061-Speed 18555.55 samples/sec Loss 8.4842 LearningRate 0.2250 Epoch: 6 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:32,516-Speed 18393.43 samples/sec Loss 8.4774 LearningRate 0.2250 Epoch: 6 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:37,002-Speed 18270.32 samples/sec Loss 8.4766 LearningRate 0.2249 Epoch: 6 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:41,460-Speed 18380.84 samples/sec Loss 8.5325 LearningRate 0.2248 Epoch: 6 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:45,881-Speed 18533.66 samples/sec Loss 8.4949 LearningRate 0.2248 Epoch: 6 Global Step: 33730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:50,286-Speed 18605.08 samples/sec Loss 8.4825 LearningRate 0.2247 Epoch: 6 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:54,778-Speed 18242.97 samples/sec Loss 8.4642 LearningRate 0.2247 Epoch: 6 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:09:59,202-Speed 18522.57 samples/sec Loss 8.4452 LearningRate 0.2246 Epoch: 6 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:10:03,636-Speed 18478.76 samples/sec Loss 8.4596 LearningRate 0.2245 Epoch: 6 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:10:08,043-Speed 18595.33 samples/sec Loss 8.5124 LearningRate 0.2245 Epoch: 6 Global Step: 33780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:12,498-Speed 18395.89 samples/sec Loss 8.4894 LearningRate 0.2244 Epoch: 6 Global Step: 33790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:16,934-Speed 18469.73 samples/sec Loss 8.4316 LearningRate 0.2243 Epoch: 6 Global Step: 33800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:21,348-Speed 18567.78 samples/sec Loss 8.4992 LearningRate 0.2243 Epoch: 6 Global Step: 33810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:25,849-Speed 18206.53 samples/sec Loss 8.4239 LearningRate 0.2242 Epoch: 6 Global Step: 33820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:30,324-Speed 18306.07 samples/sec Loss 8.4672 LearningRate 0.2241 Epoch: 6 Global Step: 33830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:34,768-Speed 18445.46 samples/sec Loss 8.5018 LearningRate 0.2241 Epoch: 6 Global Step: 33840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:39,207-Speed 18459.58 samples/sec Loss 8.4673 LearningRate 0.2240 Epoch: 6 Global Step: 33850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:43,589-Speed 18699.26 samples/sec Loss 8.4994 LearningRate 0.2239 Epoch: 6 Global Step: 33860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:48,008-Speed 18541.82 samples/sec Loss 8.4520 LearningRate 0.2239 Epoch: 6 Global Step: 33870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:10:52,395-Speed 18680.37 samples/sec Loss 8.4225 LearningRate 0.2238 Epoch: 6 Global Step: 33880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:10:56,823-Speed 18505.02 samples/sec Loss 8.4830 LearningRate 0.2238 Epoch: 6 Global Step: 33890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:01,251-Speed 18504.99 samples/sec Loss 8.4848 LearningRate 0.2237 Epoch: 6 Global Step: 33900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:05,703-Speed 18403.36 samples/sec Loss 8.4194 LearningRate 0.2236 Epoch: 6 Global Step: 33910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:10,226-Speed 18119.04 samples/sec Loss 8.5033 LearningRate 0.2236 Epoch: 6 Global Step: 33920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:14,624-Speed 18632.95 samples/sec Loss 8.4690 LearningRate 0.2235 Epoch: 6 Global Step: 33930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:19,024-Speed 18623.62 samples/sec Loss 8.4776 LearningRate 0.2234 Epoch: 6 Global Step: 33940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:23,439-Speed 18559.32 samples/sec Loss 8.4308 LearningRate 0.2234 Epoch: 6 Global Step: 33950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:27,848-Speed 18586.70 samples/sec Loss 8.4828 LearningRate 0.2233 Epoch: 6 Global Step: 33960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:32,254-Speed 18594.77 samples/sec Loss 8.4795 LearningRate 0.2232 Epoch: 6 Global Step: 33970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:36,679-Speed 18520.64 samples/sec Loss 8.4598 LearningRate 0.2232 Epoch: 6 Global Step: 33980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:11:41,120-Speed 18448.22 samples/sec Loss 8.4641 LearningRate 0.2231 Epoch: 6 Global Step: 33990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:11:45,531-Speed 18578.96 samples/sec Loss 8.4620 LearningRate 0.2230 Epoch: 6 Global Step: 34000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:11:50,012-Speed 18287.28 samples/sec Loss 8.5133 LearningRate 0.2230 Epoch: 6 Global Step: 34010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:11:54,423-Speed 18579.39 samples/sec Loss 8.4669 LearningRate 0.2229 Epoch: 6 Global Step: 34020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:11:58,834-Speed 18574.39 samples/sec Loss 8.4274 LearningRate 0.2229 Epoch: 6 Global Step: 34030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:03,277-Speed 18443.21 samples/sec Loss 8.4607 LearningRate 0.2228 Epoch: 6 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:07,710-Speed 18488.87 samples/sec Loss 8.4431 LearningRate 0.2227 Epoch: 6 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:12,171-Speed 18377.43 samples/sec Loss 8.4938 LearningRate 0.2227 Epoch: 6 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:16,598-Speed 18508.22 samples/sec Loss 8.4312 LearningRate 0.2226 Epoch: 6 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:21,001-Speed 18611.48 samples/sec Loss 8.4812 LearningRate 0.2225 Epoch: 6 Global Step: 34080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:25,410-Speed 18586.69 samples/sec Loss 8.4852 LearningRate 0.2225 Epoch: 6 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:12:29,825-Speed 18559.45 samples/sec Loss 8.4370 LearningRate 0.2224 Epoch: 6 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:12:34,243-Speed 18548.23 samples/sec Loss 8.4554 LearningRate 0.2223 Epoch: 6 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:38,674-Speed 18493.83 samples/sec Loss 8.4074 LearningRate 0.2223 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:43,112-Speed 18467.70 samples/sec Loss 8.3909 LearningRate 0.2222 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:47,502-Speed 18662.98 samples/sec Loss 8.4046 LearningRate 0.2222 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:51,923-Speed 18537.73 samples/sec Loss 8.4380 LearningRate 0.2221 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:12:56,384-Speed 18369.76 samples/sec Loss 8.4503 LearningRate 0.2220 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:00,804-Speed 18542.95 samples/sec Loss 8.4181 LearningRate 0.2220 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:05,273-Speed 18338.10 samples/sec Loss 8.4350 LearningRate 0.2219 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:09,722-Speed 18418.42 samples/sec Loss 8.4571 LearningRate 0.2218 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:14,162-Speed 18460.87 samples/sec Loss 8.4484 LearningRate 0.2218 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:18,681-Speed 18132.31 samples/sec Loss 8.3927 LearningRate 0.2217 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:13:23,122-Speed 18453.25 samples/sec Loss 8.4578 LearningRate 0.2216 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:27,538-Speed 18557.76 samples/sec Loss 8.4107 LearningRate 0.2216 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:31,962-Speed 18526.84 samples/sec Loss 8.4420 LearningRate 0.2215 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:36,413-Speed 18410.93 samples/sec Loss 8.4539 LearningRate 0.2215 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:40,810-Speed 18632.68 samples/sec Loss 8.4913 LearningRate 0.2214 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:45,225-Speed 18564.28 samples/sec Loss 8.4005 LearningRate 0.2213 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:49,652-Speed 18511.04 samples/sec Loss 8.4204 LearningRate 0.2213 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:13:54,142-Speed 18246.44 samples/sec Loss 8.3936 LearningRate 0.2212 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:13:58,531-Speed 18669.83 samples/sec Loss 8.4167 LearningRate 0.2211 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:02,963-Speed 18490.95 samples/sec Loss 8.3830 LearningRate 0.2211 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:07,348-Speed 18683.75 samples/sec Loss 8.3905 LearningRate 0.2210 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:11,733-Speed 18691.41 samples/sec Loss 8.4231 LearningRate 0.2209 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:16,130-Speed 18632.69 samples/sec Loss 8.4280 LearningRate 0.2209 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:20,549-Speed 18545.07 samples/sec Loss 8.4469 LearningRate 0.2208 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:24,991-Speed 18443.81 samples/sec Loss 8.4123 LearningRate 0.2208 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:29,387-Speed 18646.15 samples/sec Loss 8.4065 LearningRate 0.2207 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:38,233-Speed 9263.48 samples/sec Loss 8.4205 LearningRate 0.2206 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-01-14 02:14:42,609-Speed 18727.10 samples/sec Loss 8.4241 LearningRate 0.2206 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:14:47,004-Speed 18643.17 samples/sec Loss 8.3928 LearningRate 0.2205 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:14:51,404-Speed 18625.01 samples/sec Loss 8.4002 LearningRate 0.2204 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:14:55,853-Speed 18415.82 samples/sec Loss 8.4338 LearningRate 0.2204 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:15:00,250-Speed 18638.42 samples/sec Loss 8.4070 LearningRate 0.2203 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:15:04,638-Speed 18674.91 samples/sec Loss 8.4338 LearningRate 0.2202 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:15:09,061-Speed 18526.05 samples/sec Loss 8.4202 LearningRate 0.2202 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:15:13,528-Speed 18343.40 samples/sec Loss 8.4076 LearningRate 0.2201 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:15:17,972-Speed 18437.29 samples/sec Loss 8.4211 LearningRate 0.2201 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:15:22,455-Speed 18281.99 samples/sec Loss 8.4142 LearningRate 0.2200 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:15:26,912-Speed 18383.68 samples/sec Loss 8.4187 LearningRate 0.2199 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:15:31,340-Speed 18503.33 samples/sec Loss 8.4340 LearningRate 0.2199 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:15:35,776-Speed 18476.65 samples/sec Loss 8.3968 LearningRate 0.2198 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:15:40,213-Speed 18465.98 samples/sec Loss 8.3940 LearningRate 0.2197 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:15:44,656-Speed 18447.74 samples/sec Loss 8.3903 LearningRate 0.2197 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:15:49,087-Speed 18494.05 samples/sec Loss 8.3960 LearningRate 0.2196 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:15:53,492-Speed 18609.33 samples/sec Loss 8.3827 LearningRate 0.2195 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:15:57,929-Speed 18476.01 samples/sec Loss 8.4348 LearningRate 0.2195 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:02,354-Speed 18514.88 samples/sec Loss 8.3771 LearningRate 0.2194 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:06,778-Speed 18524.27 samples/sec Loss 8.3835 LearningRate 0.2194 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:11,235-Speed 18390.91 samples/sec Loss 8.4036 LearningRate 0.2193 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-14 02:16:15,656-Speed 18538.59 samples/sec Loss 8.4539 LearningRate 0.2192 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-14 02:16:20,062-Speed 18599.47 samples/sec Loss 8.3926 LearningRate 0.2192 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-14 02:16:24,465-Speed 18612.73 samples/sec Loss 8.3749 LearningRate 0.2191 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:28,938-Speed 18317.15 samples/sec Loss 8.3891 LearningRate 0.2190 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:33,366-Speed 18507.63 samples/sec Loss 8.3990 LearningRate 0.2190 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:37,812-Speed 18434.19 samples/sec Loss 8.4187 LearningRate 0.2189 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:42,306-Speed 18236.60 samples/sec Loss 8.4186 LearningRate 0.2188 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:46,783-Speed 18300.79 samples/sec Loss 8.3960 LearningRate 0.2188 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:16:51,205-Speed 18532.19 samples/sec Loss 8.3944 LearningRate 0.2187 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:16:55,668-Speed 18359.50 samples/sec Loss 8.3871 LearningRate 0.2187 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:00,077-Speed 18587.89 samples/sec Loss 8.3895 LearningRate 0.2186 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:04,532-Speed 18395.08 samples/sec Loss 8.4084 LearningRate 0.2185 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:09,003-Speed 18325.99 samples/sec Loss 8.3756 LearningRate 0.2185 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:13,430-Speed 18507.58 samples/sec Loss 8.4189 LearningRate 0.2184 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:17,847-Speed 18553.16 samples/sec Loss 8.4096 LearningRate 0.2183 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:22,276-Speed 18501.65 samples/sec Loss 8.3847 LearningRate 0.2183 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:26,730-Speed 18394.47 samples/sec Loss 8.3931 LearningRate 0.2182 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:31,164-Speed 18478.82 samples/sec Loss 8.4185 LearningRate 0.2181 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:17:35,630-Speed 18352.67 samples/sec Loss 8.3356 LearningRate 0.2181 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:17:40,100-Speed 18330.37 samples/sec Loss 8.4176 LearningRate 0.2180 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:17:44,567-Speed 18346.59 samples/sec Loss 8.3839 LearningRate 0.2180 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:17:49,007-Speed 18473.75 samples/sec Loss 8.4103 LearningRate 0.2179 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:17:53,484-Speed 18301.17 samples/sec Loss 8.3864 LearningRate 0.2178 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:17:57,900-Speed 18559.70 samples/sec Loss 8.4123 LearningRate 0.2178 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:02,337-Speed 18465.55 samples/sec Loss 8.3504 LearningRate 0.2177 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:06,759-Speed 18530.76 samples/sec Loss 8.3582 LearningRate 0.2176 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:11,207-Speed 18419.48 samples/sec Loss 8.3488 LearningRate 0.2176 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:15,691-Speed 18276.05 samples/sec Loss 8.3959 LearningRate 0.2175 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:20,106-Speed 18557.19 samples/sec Loss 8.3950 LearningRate 0.2175 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-14 02:18:24,512-Speed 18600.13 samples/sec Loss 8.3823 LearningRate 0.2174 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-14 02:18:28,910-Speed 18628.62 samples/sec Loss 8.3365 LearningRate 0.2173 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:33,331-Speed 18535.86 samples/sec Loss 8.3315 LearningRate 0.2173 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:37,782-Speed 18412.21 samples/sec Loss 8.3427 LearningRate 0.2172 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:42,252-Speed 18330.29 samples/sec Loss 8.3508 LearningRate 0.2171 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:46,640-Speed 18673.35 samples/sec Loss 8.4117 LearningRate 0.2171 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:51,151-Speed 18164.97 samples/sec Loss 8.3627 LearningRate 0.2170 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:55,537-Speed 18681.51 samples/sec Loss 8.4128 LearningRate 0.2169 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:18:59,934-Speed 18641.52 samples/sec Loss 8.3955 LearningRate 0.2169 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:19:04,360-Speed 18518.05 samples/sec Loss 8.3262 LearningRate 0.2168 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:19:08,784-Speed 18536.34 samples/sec Loss 8.3255 LearningRate 0.2168 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:19:13,246-Speed 18366.11 samples/sec Loss 8.3935 LearningRate 0.2167 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:17,662-Speed 18554.09 samples/sec Loss 8.3392 LearningRate 0.2166 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:22,081-Speed 18553.15 samples/sec Loss 8.3730 LearningRate 0.2166 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:26,497-Speed 18560.95 samples/sec Loss 8.3701 LearningRate 0.2165 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:30,965-Speed 18336.15 samples/sec Loss 8.3218 LearningRate 0.2164 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:35,408-Speed 18446.42 samples/sec Loss 8.3088 LearningRate 0.2164 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:39,788-Speed 18707.33 samples/sec Loss 8.3376 LearningRate 0.2163 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:44,193-Speed 18603.23 samples/sec Loss 8.4027 LearningRate 0.2163 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:48,603-Speed 18578.95 samples/sec Loss 8.3604 LearningRate 0.2162 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:53,026-Speed 18533.03 samples/sec Loss 8.4067 LearningRate 0.2161 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:19:57,459-Speed 18486.85 samples/sec Loss 8.3734 LearningRate 0.2161 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:20:01,848-Speed 18670.09 samples/sec Loss 8.3140 LearningRate 0.2160 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:20:06,247-Speed 18627.46 samples/sec Loss 8.3785 LearningRate 0.2159 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:20:10,675-Speed 18505.79 samples/sec Loss 8.3994 LearningRate 0.2159 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:20:15,109-Speed 18485.64 samples/sec Loss 8.3870 LearningRate 0.2158 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:19,568-Speed 18380.83 samples/sec Loss 8.3567 LearningRate 0.2157 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:24,016-Speed 18421.55 samples/sec Loss 8.3518 LearningRate 0.2157 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:28,426-Speed 18577.10 samples/sec Loss 8.3369 LearningRate 0.2156 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:32,899-Speed 18321.51 samples/sec Loss 8.3985 LearningRate 0.2156 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:37,359-Speed 18370.13 samples/sec Loss 8.3885 LearningRate 0.2155 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:41,763-Speed 18604.98 samples/sec Loss 8.3239 LearningRate 0.2154 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:46,190-Speed 18508.20 samples/sec Loss 8.3000 LearningRate 0.2154 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:50,629-Speed 18459.32 samples/sec Loss 8.2988 LearningRate 0.2153 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:55,033-Speed 18608.38 samples/sec Loss 8.3667 LearningRate 0.2152 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:20:59,425-Speed 18654.50 samples/sec Loss 8.3415 LearningRate 0.2152 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:21:03,874-Speed 18420.44 samples/sec Loss 8.3279 LearningRate 0.2151 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:21:08,319-Speed 18439.50 samples/sec Loss 8.3729 LearningRate 0.2151 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:21:12,748-Speed 18502.83 samples/sec Loss 8.3506 LearningRate 0.2150 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:17,180-Speed 18487.96 samples/sec Loss 8.3596 LearningRate 0.2149 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:21,600-Speed 18538.21 samples/sec Loss 8.3527 LearningRate 0.2149 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:26,021-Speed 18531.44 samples/sec Loss 8.2583 LearningRate 0.2148 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:30,446-Speed 18518.20 samples/sec Loss 8.3048 LearningRate 0.2147 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:34,881-Speed 18476.30 samples/sec Loss 8.3456 LearningRate 0.2147 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:39,275-Speed 18648.55 samples/sec Loss 8.3650 LearningRate 0.2146 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:43,660-Speed 18688.23 samples/sec Loss 8.3579 LearningRate 0.2146 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:48,094-Speed 18482.39 samples/sec Loss 8.3070 LearningRate 0.2145 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:52,533-Speed 18461.91 samples/sec Loss 8.3303 LearningRate 0.2144 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-14 02:21:56,977-Speed 18435.55 samples/sec Loss 8.3374 LearningRate 0.2144 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:22:01,415-Speed 18463.38 samples/sec Loss 8.3131 LearningRate 0.2143 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:22:05,821-Speed 18600.95 samples/sec Loss 8.4054 LearningRate 0.2142 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:22:10,246-Speed 18515.70 samples/sec Loss 8.2923 LearningRate 0.2142 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:22:14,660-Speed 18564.65 samples/sec Loss 8.3975 LearningRate 0.2141 Epoch: 6 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:22:19,065-Speed 18601.97 samples/sec Loss 8.3156 LearningRate 0.2141 Epoch: 6 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:22:23,488-Speed 18527.64 samples/sec Loss 8.3882 LearningRate 0.2140 Epoch: 6 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-14 02:22:27,923-Speed 18475.54 samples/sec Loss 8.3835 LearningRate 0.2139 Epoch: 6 Global Step: 35440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:22:32,346-Speed 18525.47 samples/sec Loss 8.3469 LearningRate 0.2139 Epoch: 6 Global Step: 35450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:22:36,785-Speed 18462.10 samples/sec Loss 8.2873 LearningRate 0.2138 Epoch: 6 Global Step: 35460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:22:41,183-Speed 18630.76 samples/sec Loss 8.3121 LearningRate 0.2137 Epoch: 6 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:22:45,625-Speed 18450.56 samples/sec Loss 8.3373 LearningRate 0.2137 Epoch: 6 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:22:50,031-Speed 18596.55 samples/sec Loss 8.3176 LearningRate 0.2136 Epoch: 6 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:22:54,468-Speed 18465.67 samples/sec Loss 8.3265 LearningRate 0.2135 Epoch: 6 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:22:58,912-Speed 18441.00 samples/sec Loss 8.3128 LearningRate 0.2135 Epoch: 6 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:03,370-Speed 18380.31 samples/sec Loss 8.2968 LearningRate 0.2134 Epoch: 6 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:07,827-Speed 18384.35 samples/sec Loss 8.2624 LearningRate 0.2134 Epoch: 6 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:12,228-Speed 18619.93 samples/sec Loss 8.3018 LearningRate 0.2133 Epoch: 6 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:16,681-Speed 18403.18 samples/sec Loss 8.3267 LearningRate 0.2132 Epoch: 6 Global Step: 35550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:21,099-Speed 18546.27 samples/sec Loss 8.3333 LearningRate 0.2132 Epoch: 6 Global Step: 35560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:25,586-Speed 18260.44 samples/sec Loss 8.4062 LearningRate 0.2131 Epoch: 6 Global Step: 35570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:30,011-Speed 18516.83 samples/sec Loss 8.3137 LearningRate 0.2130 Epoch: 6 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:34,460-Speed 18418.46 samples/sec Loss 8.3381 LearningRate 0.2130 Epoch: 6 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:38,908-Speed 18432.26 samples/sec Loss 8.2796 LearningRate 0.2129 Epoch: 6 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:43,316-Speed 18594.47 samples/sec Loss 8.3044 LearningRate 0.2129 Epoch: 6 Global Step: 35610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:47,766-Speed 18414.75 samples/sec Loss 8.3098 LearningRate 0.2128 Epoch: 6 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:52,178-Speed 18570.60 samples/sec Loss 8.3418 LearningRate 0.2127 Epoch: 6 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:23:56,596-Speed 18548.16 samples/sec Loss 8.3214 LearningRate 0.2127 Epoch: 6 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:24:01,018-Speed 18529.28 samples/sec Loss 8.3292 LearningRate 0.2126 Epoch: 6 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:24:05,440-Speed 18535.13 samples/sec Loss 8.3770 LearningRate 0.2125 Epoch: 6 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:24:09,842-Speed 18611.61 samples/sec Loss 8.3475 LearningRate 0.2125 Epoch: 6 Global Step: 35670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:24:14,221-Speed 18714.88 samples/sec Loss 8.3006 LearningRate 0.2124 Epoch: 6 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:18,626-Speed 18602.85 samples/sec Loss 8.3278 LearningRate 0.2124 Epoch: 6 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:23,062-Speed 18473.43 samples/sec Loss 8.2954 LearningRate 0.2123 Epoch: 6 Global Step: 35700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:27,464-Speed 18615.82 samples/sec Loss 8.2990 LearningRate 0.2122 Epoch: 6 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:31,905-Speed 18447.87 samples/sec Loss 8.2509 LearningRate 0.2122 Epoch: 6 Global Step: 35720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:39,754-Speed 10440.43 samples/sec Loss 8.2729 LearningRate 0.2121 Epoch: 6 Global Step: 35730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:44,157-Speed 18609.98 samples/sec Loss 8.2930 LearningRate 0.2120 Epoch: 6 Global Step: 35740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:48,653-Speed 18230.45 samples/sec Loss 8.2938 LearningRate 0.2120 Epoch: 6 Global Step: 35750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:53,041-Speed 18676.90 samples/sec Loss 8.3155 LearningRate 0.2119 Epoch: 6 Global Step: 35760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:24:57,432-Speed 18662.21 samples/sec Loss 8.3230 LearningRate 0.2119 Epoch: 6 Global Step: 35770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:01,813-Speed 18705.21 samples/sec Loss 8.3227 LearningRate 0.2118 Epoch: 6 Global Step: 35780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:06,288-Speed 18312.54 samples/sec Loss 8.3003 LearningRate 0.2117 Epoch: 6 Global Step: 35790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:10,755-Speed 18345.29 samples/sec Loss 8.3020 LearningRate 0.2117 Epoch: 6 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:15,190-Speed 18476.44 samples/sec Loss 8.3007 LearningRate 0.2116 Epoch: 6 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:19,573-Speed 18691.03 samples/sec Loss 8.2621 LearningRate 0.2115 Epoch: 6 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:23,940-Speed 18764.31 samples/sec Loss 8.2934 LearningRate 0.2115 Epoch: 6 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:28,355-Speed 18561.04 samples/sec Loss 8.3410 LearningRate 0.2114 Epoch: 6 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:32,762-Speed 18591.12 samples/sec Loss 8.3206 LearningRate 0.2114 Epoch: 6 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:37,148-Speed 18688.16 samples/sec Loss 8.2328 LearningRate 0.2113 Epoch: 6 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:41,557-Speed 18584.60 samples/sec Loss 8.2943 LearningRate 0.2112 Epoch: 6 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:45,955-Speed 18634.41 samples/sec Loss 8.3120 LearningRate 0.2112 Epoch: 6 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:50,451-Speed 18225.78 samples/sec Loss 8.2839 LearningRate 0.2111 Epoch: 6 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:54,892-Speed 18451.89 samples/sec Loss 8.3101 LearningRate 0.2111 Epoch: 6 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:25:59,334-Speed 18446.41 samples/sec Loss 8.2848 LearningRate 0.2110 Epoch: 6 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:26:03,777-Speed 18442.92 samples/sec Loss 8.3430 LearningRate 0.2109 Epoch: 6 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:26:08,200-Speed 18527.77 samples/sec Loss 8.2850 LearningRate 0.2109 Epoch: 6 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:26:12,637-Speed 18469.95 samples/sec Loss 8.2714 LearningRate 0.2108 Epoch: 6 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:26:17,074-Speed 18467.14 samples/sec Loss 8.2603 LearningRate 0.2107 Epoch: 6 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:26:21,506-Speed 18486.89 samples/sec Loss 8.3084 LearningRate 0.2107 Epoch: 6 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:26:25,904-Speed 18635.03 samples/sec Loss 8.2953 LearningRate 0.2106 Epoch: 6 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:26:30,347-Speed 18441.79 samples/sec Loss 8.3039 LearningRate 0.2106 Epoch: 6 Global Step: 35980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:26:34,799-Speed 18405.55 samples/sec Loss 8.2984 LearningRate 0.2105 Epoch: 6 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:26:39,220-Speed 18531.44 samples/sec Loss 8.2782 LearningRate 0.2104 Epoch: 6 Global Step: 36000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:26:43,626-Speed 18599.34 samples/sec Loss 8.2844 LearningRate 0.2104 Epoch: 6 Global Step: 36010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:26:48,108-Speed 18281.38 samples/sec Loss 8.3162 LearningRate 0.2103 Epoch: 6 Global Step: 36020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:26:52,587-Speed 18293.83 samples/sec Loss 8.2917 LearningRate 0.2102 Epoch: 6 Global Step: 36030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:26:57,011-Speed 18521.80 samples/sec Loss 8.2458 LearningRate 0.2102 Epoch: 6 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:27:01,419-Speed 18590.53 samples/sec Loss 8.2765 LearningRate 0.2101 Epoch: 6 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:27:05,822-Speed 18608.67 samples/sec Loss 8.2727 LearningRate 0.2101 Epoch: 6 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:10,280-Speed 18377.31 samples/sec Loss 8.2497 LearningRate 0.2100 Epoch: 6 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:14,701-Speed 18538.74 samples/sec Loss 8.2934 LearningRate 0.2099 Epoch: 6 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:19,148-Speed 18421.67 samples/sec Loss 8.2888 LearningRate 0.2099 Epoch: 6 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:23,629-Speed 18290.62 samples/sec Loss 8.2847 LearningRate 0.2098 Epoch: 6 Global Step: 36100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:28,084-Speed 18390.17 samples/sec Loss 8.2653 LearningRate 0.2097 Epoch: 6 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:32,484-Speed 18624.07 samples/sec Loss 8.3191 LearningRate 0.2097 Epoch: 6 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:36,892-Speed 18591.28 samples/sec Loss 8.2675 LearningRate 0.2096 Epoch: 6 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:41,312-Speed 18533.24 samples/sec Loss 8.3532 LearningRate 0.2096 Epoch: 6 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:45,712-Speed 18624.73 samples/sec Loss 8.2123 LearningRate 0.2095 Epoch: 6 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:27:50,122-Speed 18588.91 samples/sec Loss 8.2997 LearningRate 0.2094 Epoch: 6 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:27:54,523-Speed 18623.93 samples/sec Loss 8.3341 LearningRate 0.2094 Epoch: 6 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:27:58,927-Speed 18608.64 samples/sec Loss 8.3405 LearningRate 0.2093 Epoch: 6 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:28:03,360-Speed 18483.35 samples/sec Loss 8.2611 LearningRate 0.2092 Epoch: 6 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:07,772-Speed 18570.54 samples/sec Loss 8.2690 LearningRate 0.2092 Epoch: 6 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:12,172-Speed 18620.14 samples/sec Loss 8.2440 LearningRate 0.2091 Epoch: 6 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:16,590-Speed 18549.45 samples/sec Loss 8.3225 LearningRate 0.2091 Epoch: 6 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:20,989-Speed 18638.49 samples/sec Loss 8.2918 LearningRate 0.2090 Epoch: 6 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:25,376-Speed 18682.38 samples/sec Loss 8.3039 LearningRate 0.2089 Epoch: 6 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:29,757-Speed 18701.87 samples/sec Loss 8.2764 LearningRate 0.2089 Epoch: 6 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:34,142-Speed 18688.01 samples/sec Loss 8.2686 LearningRate 0.2088 Epoch: 6 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:38,521-Speed 18712.42 samples/sec Loss 8.2804 LearningRate 0.2088 Epoch: 6 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:42,932-Speed 18577.22 samples/sec Loss 8.2372 LearningRate 0.2087 Epoch: 6 Global Step: 36280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:28:47,326-Speed 18649.80 samples/sec Loss 8.3007 LearningRate 0.2086 Epoch: 6 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:29:06,065-Speed 4371.66 samples/sec Loss 8.2667 LearningRate 0.2086 Epoch: 7 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:29:10,464-Speed 18628.60 samples/sec Loss 8.2055 LearningRate 0.2085 Epoch: 7 Global Step: 36310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:14,912-Speed 18422.23 samples/sec Loss 8.2350 LearningRate 0.2084 Epoch: 7 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:19,299-Speed 18680.68 samples/sec Loss 8.2796 LearningRate 0.2084 Epoch: 7 Global Step: 36330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:23,728-Speed 18502.86 samples/sec Loss 8.2251 LearningRate 0.2083 Epoch: 7 Global Step: 36340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:28,150-Speed 18527.94 samples/sec Loss 8.2620 LearningRate 0.2083 Epoch: 7 Global Step: 36350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:32,574-Speed 18521.83 samples/sec Loss 8.2359 LearningRate 0.2082 Epoch: 7 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:36,959-Speed 18687.39 samples/sec Loss 8.2463 LearningRate 0.2081 Epoch: 7 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:41,349-Speed 18661.45 samples/sec Loss 8.2318 LearningRate 0.2081 Epoch: 7 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:45,740-Speed 18664.51 samples/sec Loss 8.1983 LearningRate 0.2080 Epoch: 7 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:50,127-Speed 18677.06 samples/sec Loss 8.2098 LearningRate 0.2079 Epoch: 7 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:29:54,527-Speed 18626.26 samples/sec Loss 8.2893 LearningRate 0.2079 Epoch: 7 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:29:58,933-Speed 18596.58 samples/sec Loss 8.2589 LearningRate 0.2078 Epoch: 7 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:30:03,374-Speed 18450.82 samples/sec Loss 8.2066 LearningRate 0.2078 Epoch: 7 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:30:07,778-Speed 18605.59 samples/sec Loss 8.2272 LearningRate 0.2077 Epoch: 7 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:30:12,221-Speed 18441.36 samples/sec Loss 8.2144 LearningRate 0.2076 Epoch: 7 Global Step: 36450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:16,635-Speed 18566.15 samples/sec Loss 8.2398 LearningRate 0.2076 Epoch: 7 Global Step: 36460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:21,090-Speed 18392.99 samples/sec Loss 8.2159 LearningRate 0.2075 Epoch: 7 Global Step: 36470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:25,505-Speed 18561.12 samples/sec Loss 8.2224 LearningRate 0.2075 Epoch: 7 Global Step: 36480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:29,914-Speed 18584.51 samples/sec Loss 8.2463 LearningRate 0.2074 Epoch: 7 Global Step: 36490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:34,381-Speed 18344.03 samples/sec Loss 8.2218 LearningRate 0.2073 Epoch: 7 Global Step: 36500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:38,794-Speed 18569.94 samples/sec Loss 8.2799 LearningRate 0.2073 Epoch: 7 Global Step: 36510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:43,192-Speed 18633.39 samples/sec Loss 8.2111 LearningRate 0.2072 Epoch: 7 Global Step: 36520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:47,621-Speed 18500.51 samples/sec Loss 8.2611 LearningRate 0.2071 Epoch: 7 Global Step: 36530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:52,060-Speed 18458.91 samples/sec Loss 8.2673 LearningRate 0.2071 Epoch: 7 Global Step: 36540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:30:56,454-Speed 18648.04 samples/sec Loss 8.2277 LearningRate 0.2070 Epoch: 7 Global Step: 36550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:00,863-Speed 18585.66 samples/sec Loss 8.2232 LearningRate 0.2070 Epoch: 7 Global Step: 36560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:05,313-Speed 18411.85 samples/sec Loss 8.2120 LearningRate 0.2069 Epoch: 7 Global Step: 36570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:09,715-Speed 18617.59 samples/sec Loss 8.2144 LearningRate 0.2068 Epoch: 7 Global Step: 36580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:14,107-Speed 18658.85 samples/sec Loss 8.2204 LearningRate 0.2068 Epoch: 7 Global Step: 36590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:18,497-Speed 18663.81 samples/sec Loss 8.2300 LearningRate 0.2067 Epoch: 7 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:22,952-Speed 18393.16 samples/sec Loss 8.2355 LearningRate 0.2067 Epoch: 7 Global Step: 36610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:27,325-Speed 18736.53 samples/sec Loss 8.2225 LearningRate 0.2066 Epoch: 7 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:31,761-Speed 18470.81 samples/sec Loss 8.2378 LearningRate 0.2065 Epoch: 7 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:36,177-Speed 18556.99 samples/sec Loss 8.2077 LearningRate 0.2065 Epoch: 7 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:31:40,592-Speed 18559.73 samples/sec Loss 8.2700 LearningRate 0.2064 Epoch: 7 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:31:45,002-Speed 18578.24 samples/sec Loss 8.2283 LearningRate 0.2063 Epoch: 7 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:31:49,443-Speed 18453.76 samples/sec Loss 8.2660 LearningRate 0.2063 Epoch: 7 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:31:53,847-Speed 18607.23 samples/sec Loss 8.1991 LearningRate 0.2062 Epoch: 7 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:31:58,272-Speed 18514.11 samples/sec Loss 8.2498 LearningRate 0.2062 Epoch: 7 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:32:02,681-Speed 18585.52 samples/sec Loss 8.2473 LearningRate 0.2061 Epoch: 7 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:32:07,078-Speed 18636.56 samples/sec Loss 8.2289 LearningRate 0.2060 Epoch: 7 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:11,469-Speed 18667.92 samples/sec Loss 8.2721 LearningRate 0.2060 Epoch: 7 Global Step: 36720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:15,885-Speed 18562.13 samples/sec Loss 8.2213 LearningRate 0.2059 Epoch: 7 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:20,310-Speed 18519.47 samples/sec Loss 8.2453 LearningRate 0.2059 Epoch: 7 Global Step: 36740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:24,750-Speed 18457.20 samples/sec Loss 8.2985 LearningRate 0.2058 Epoch: 7 Global Step: 36750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:29,163-Speed 18570.25 samples/sec Loss 8.2463 LearningRate 0.2057 Epoch: 7 Global Step: 36760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:33,580-Speed 18553.57 samples/sec Loss 8.1988 LearningRate 0.2057 Epoch: 7 Global Step: 36770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:38,008-Speed 18506.06 samples/sec Loss 8.2231 LearningRate 0.2056 Epoch: 7 Global Step: 36780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:42,451-Speed 18444.19 samples/sec Loss 8.1612 LearningRate 0.2055 Epoch: 7 Global Step: 36790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:46,845-Speed 18651.34 samples/sec Loss 8.2299 LearningRate 0.2055 Epoch: 7 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:32:51,270-Speed 18521.00 samples/sec Loss 8.2041 LearningRate 0.2054 Epoch: 7 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:32:55,698-Speed 18505.07 samples/sec Loss 8.2274 LearningRate 0.2054 Epoch: 7 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:33:00,124-Speed 18514.07 samples/sec Loss 8.2411 LearningRate 0.2053 Epoch: 7 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:33:04,579-Speed 18394.21 samples/sec Loss 8.2491 LearningRate 0.2052 Epoch: 7 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:33:09,007-Speed 18506.27 samples/sec Loss 8.2078 LearningRate 0.2052 Epoch: 7 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:33:13,494-Speed 18262.52 samples/sec Loss 8.2338 LearningRate 0.2051 Epoch: 7 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:17,963-Speed 18334.01 samples/sec Loss 8.1905 LearningRate 0.2051 Epoch: 7 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:22,403-Speed 18455.51 samples/sec Loss 8.1984 LearningRate 0.2050 Epoch: 7 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:26,809-Speed 18597.76 samples/sec Loss 8.2129 LearningRate 0.2049 Epoch: 7 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:31,275-Speed 18345.31 samples/sec Loss 8.1864 LearningRate 0.2049 Epoch: 7 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:35,702-Speed 18516.27 samples/sec Loss 8.1828 LearningRate 0.2048 Epoch: 7 Global Step: 36910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:40,197-Speed 18227.87 samples/sec Loss 8.1851 LearningRate 0.2047 Epoch: 7 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:44,653-Speed 18385.50 samples/sec Loss 8.2109 LearningRate 0.2047 Epoch: 7 Global Step: 36930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:49,103-Speed 18415.39 samples/sec Loss 8.1911 LearningRate 0.2046 Epoch: 7 Global Step: 36940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:53,568-Speed 18349.97 samples/sec Loss 8.2192 LearningRate 0.2046 Epoch: 7 Global Step: 36950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:33:58,006-Speed 18479.79 samples/sec Loss 8.2066 LearningRate 0.2045 Epoch: 7 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:34:02,436-Speed 18496.46 samples/sec Loss 8.2040 LearningRate 0.2044 Epoch: 7 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:34:06,840-Speed 18606.83 samples/sec Loss 8.2372 LearningRate 0.2044 Epoch: 7 Global Step: 36980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:34:11,266-Speed 18512.76 samples/sec Loss 8.2301 LearningRate 0.2043 Epoch: 7 Global Step: 36990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:34:15,719-Speed 18400.18 samples/sec Loss 8.1809 LearningRate 0.2043 Epoch: 7 Global Step: 37000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:34:20,139-Speed 18540.00 samples/sec Loss 8.1825 LearningRate 0.2042 Epoch: 7 Global Step: 37010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:24,600-Speed 18373.29 samples/sec Loss 8.2131 LearningRate 0.2041 Epoch: 7 Global Step: 37020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:29,041-Speed 18450.02 samples/sec Loss 8.2305 LearningRate 0.2041 Epoch: 7 Global Step: 37030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:33,508-Speed 18345.13 samples/sec Loss 8.1214 LearningRate 0.2040 Epoch: 7 Global Step: 37040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:37,964-Speed 18391.48 samples/sec Loss 8.1625 LearningRate 0.2040 Epoch: 7 Global Step: 37050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:42,366-Speed 18616.00 samples/sec Loss 8.1951 LearningRate 0.2039 Epoch: 7 Global Step: 37060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:46,775-Speed 18585.10 samples/sec Loss 8.2117 LearningRate 0.2038 Epoch: 7 Global Step: 37070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:52,043-Speed 15558.06 samples/sec Loss 8.2069 LearningRate 0.2038 Epoch: 7 Global Step: 37080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:34:56,444-Speed 18623.61 samples/sec Loss 8.1436 LearningRate 0.2037 Epoch: 7 Global Step: 37090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:35:00,871-Speed 18506.86 samples/sec Loss 8.1818 LearningRate 0.2036 Epoch: 7 Global Step: 37100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:35:05,293-Speed 18535.66 samples/sec Loss 8.1328 LearningRate 0.2036 Epoch: 7 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:09,696-Speed 18608.64 samples/sec Loss 8.1682 LearningRate 0.2035 Epoch: 7 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:14,084-Speed 18672.78 samples/sec Loss 8.2163 LearningRate 0.2035 Epoch: 7 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:18,498-Speed 18570.78 samples/sec Loss 8.1853 LearningRate 0.2034 Epoch: 7 Global Step: 37140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:22,941-Speed 18440.58 samples/sec Loss 8.1540 LearningRate 0.2033 Epoch: 7 Global Step: 37150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:27,359-Speed 18548.62 samples/sec Loss 8.1831 LearningRate 0.2033 Epoch: 7 Global Step: 37160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:31,786-Speed 18510.82 samples/sec Loss 8.1692 LearningRate 0.2032 Epoch: 7 Global Step: 37170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:36,237-Speed 18410.15 samples/sec Loss 8.2491 LearningRate 0.2032 Epoch: 7 Global Step: 37180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:40,641-Speed 18605.87 samples/sec Loss 8.1943 LearningRate 0.2031 Epoch: 7 Global Step: 37190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:45,070-Speed 18506.25 samples/sec Loss 8.1790 LearningRate 0.2030 Epoch: 7 Global Step: 37200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:35:49,520-Speed 18417.23 samples/sec Loss 8.2155 LearningRate 0.2030 Epoch: 7 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:35:53,888-Speed 18759.81 samples/sec Loss 8.1692 LearningRate 0.2029 Epoch: 7 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:35:58,281-Speed 18654.35 samples/sec Loss 8.1696 LearningRate 0.2028 Epoch: 7 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:02,673-Speed 18658.54 samples/sec Loss 8.1575 LearningRate 0.2028 Epoch: 7 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:07,087-Speed 18562.97 samples/sec Loss 8.1584 LearningRate 0.2027 Epoch: 7 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:11,511-Speed 18522.70 samples/sec Loss 8.1402 LearningRate 0.2027 Epoch: 7 Global Step: 37260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:15,943-Speed 18497.53 samples/sec Loss 8.1755 LearningRate 0.2026 Epoch: 7 Global Step: 37270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:20,416-Speed 18322.06 samples/sec Loss 8.1692 LearningRate 0.2025 Epoch: 7 Global Step: 37280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:24,810-Speed 18649.06 samples/sec Loss 8.2169 LearningRate 0.2025 Epoch: 7 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:29,314-Speed 18195.48 samples/sec Loss 8.2160 LearningRate 0.2024 Epoch: 7 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:33,784-Speed 18331.73 samples/sec Loss 8.1436 LearningRate 0.2024 Epoch: 7 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:38,273-Speed 18259.16 samples/sec Loss 8.1985 LearningRate 0.2023 Epoch: 7 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:42,676-Speed 18619.38 samples/sec Loss 8.1542 LearningRate 0.2022 Epoch: 7 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:36:47,062-Speed 18680.28 samples/sec Loss 8.1621 LearningRate 0.2022 Epoch: 7 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:36:51,448-Speed 18687.41 samples/sec Loss 8.1620 LearningRate 0.2021 Epoch: 7 Global Step: 37350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:36:55,863-Speed 18564.58 samples/sec Loss 8.1919 LearningRate 0.2021 Epoch: 7 Global Step: 37360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:00,282-Speed 18544.27 samples/sec Loss 8.1564 LearningRate 0.2020 Epoch: 7 Global Step: 37370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:04,702-Speed 18543.54 samples/sec Loss 8.1138 LearningRate 0.2019 Epoch: 7 Global Step: 37380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:09,116-Speed 18565.18 samples/sec Loss 8.1549 LearningRate 0.2019 Epoch: 7 Global Step: 37390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:13,538-Speed 18530.31 samples/sec Loss 8.1358 LearningRate 0.2018 Epoch: 7 Global Step: 37400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:17,964-Speed 18518.53 samples/sec Loss 8.1569 LearningRate 0.2018 Epoch: 7 Global Step: 37410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:22,387-Speed 18525.49 samples/sec Loss 8.1850 LearningRate 0.2017 Epoch: 7 Global Step: 37420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:26,845-Speed 18381.72 samples/sec Loss 8.1692 LearningRate 0.2016 Epoch: 7 Global Step: 37430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:31,316-Speed 18326.24 samples/sec Loss 8.1512 LearningRate 0.2016 Epoch: 7 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:37:35,726-Speed 18582.41 samples/sec Loss 8.2074 LearningRate 0.2015 Epoch: 7 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:37:40,146-Speed 18540.88 samples/sec Loss 8.1665 LearningRate 0.2014 Epoch: 7 Global Step: 37460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:44,560-Speed 18560.69 samples/sec Loss 8.1927 LearningRate 0.2014 Epoch: 7 Global Step: 37470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:37:48,953-Speed 18662.48 samples/sec Loss 8.1537 LearningRate 0.2013 Epoch: 7 Global Step: 37480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:37:53,374-Speed 18531.25 samples/sec Loss 8.1970 LearningRate 0.2013 Epoch: 7 Global Step: 37490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:37:57,803-Speed 18500.46 samples/sec Loss 8.1490 LearningRate 0.2012 Epoch: 7 Global Step: 37500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:02,217-Speed 18567.98 samples/sec Loss 8.1741 LearningRate 0.2011 Epoch: 7 Global Step: 37510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:06,668-Speed 18409.81 samples/sec Loss 8.1866 LearningRate 0.2011 Epoch: 7 Global Step: 37520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:11,067-Speed 18626.26 samples/sec Loss 8.1682 LearningRate 0.2010 Epoch: 7 Global Step: 37530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:15,507-Speed 18456.99 samples/sec Loss 8.1943 LearningRate 0.2010 Epoch: 7 Global Step: 37540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:19,909-Speed 18611.95 samples/sec Loss 8.2029 LearningRate 0.2009 Epoch: 7 Global Step: 37550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:24,365-Speed 18391.16 samples/sec Loss 8.1386 LearningRate 0.2008 Epoch: 7 Global Step: 37560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:28,753-Speed 18674.61 samples/sec Loss 8.1427 LearningRate 0.2008 Epoch: 7 Global Step: 37570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:38:33,195-Speed 18447.04 samples/sec Loss 8.2002 LearningRate 0.2007 Epoch: 7 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:38:37,643-Speed 18421.38 samples/sec Loss 8.1177 LearningRate 0.2007 Epoch: 7 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:38:42,087-Speed 18434.66 samples/sec Loss 8.1871 LearningRate 0.2006 Epoch: 7 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:38:46,489-Speed 18620.92 samples/sec Loss 8.1430 LearningRate 0.2005 Epoch: 7 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:38:50,899-Speed 18579.71 samples/sec Loss 8.1560 LearningRate 0.2005 Epoch: 7 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:38:55,333-Speed 18480.28 samples/sec Loss 8.1588 LearningRate 0.2004 Epoch: 7 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:38:59,754-Speed 18533.51 samples/sec Loss 8.1603 LearningRate 0.2004 Epoch: 7 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:04,217-Speed 18370.00 samples/sec Loss 8.1877 LearningRate 0.2003 Epoch: 7 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:08,621-Speed 18604.97 samples/sec Loss 8.1309 LearningRate 0.2002 Epoch: 7 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:13,036-Speed 18561.31 samples/sec Loss 8.1665 LearningRate 0.2002 Epoch: 7 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:17,462-Speed 18518.68 samples/sec Loss 8.1901 LearningRate 0.2001 Epoch: 7 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:21,906-Speed 18446.06 samples/sec Loss 8.1853 LearningRate 0.2001 Epoch: 7 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:26,366-Speed 18376.61 samples/sec Loss 8.1915 LearningRate 0.2000 Epoch: 7 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:30,799-Speed 18483.18 samples/sec Loss 8.1918 LearningRate 0.1999 Epoch: 7 Global Step: 37710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:35,209-Speed 18581.15 samples/sec Loss 8.1051 LearningRate 0.1999 Epoch: 7 Global Step: 37720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:39,641-Speed 18486.87 samples/sec Loss 8.1305 LearningRate 0.1998 Epoch: 7 Global Step: 37730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:44,065-Speed 18522.79 samples/sec Loss 8.1562 LearningRate 0.1997 Epoch: 7 Global Step: 37740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:48,501-Speed 18472.77 samples/sec Loss 8.1329 LearningRate 0.1997 Epoch: 7 Global Step: 37750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:52,928-Speed 18510.70 samples/sec Loss 8.1601 LearningRate 0.1996 Epoch: 7 Global Step: 37760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:39:57,332-Speed 18613.84 samples/sec Loss 8.1384 LearningRate 0.1996 Epoch: 7 Global Step: 37770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:01,825-Speed 18241.73 samples/sec Loss 8.1109 LearningRate 0.1995 Epoch: 7 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:40:06,301-Speed 18309.97 samples/sec Loss 8.1388 LearningRate 0.1994 Epoch: 7 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:10,723-Speed 18532.59 samples/sec Loss 8.1526 LearningRate 0.1994 Epoch: 7 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:15,158-Speed 18481.88 samples/sec Loss 8.2226 LearningRate 0.1993 Epoch: 7 Global Step: 37810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:19,541-Speed 18695.73 samples/sec Loss 8.1175 LearningRate 0.1993 Epoch: 7 Global Step: 37820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:23,954-Speed 18565.08 samples/sec Loss 8.1650 LearningRate 0.1992 Epoch: 7 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:28,386-Speed 18488.87 samples/sec Loss 8.1416 LearningRate 0.1991 Epoch: 7 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:32,859-Speed 18320.16 samples/sec Loss 8.1151 LearningRate 0.1991 Epoch: 7 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:37,312-Speed 18406.39 samples/sec Loss 8.1259 LearningRate 0.1990 Epoch: 7 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:41,727-Speed 18561.24 samples/sec Loss 8.1567 LearningRate 0.1990 Epoch: 7 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:46,191-Speed 18356.82 samples/sec Loss 8.1356 LearningRate 0.1989 Epoch: 7 Global Step: 37880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:40:50,654-Speed 18368.78 samples/sec Loss 8.1381 LearningRate 0.1988 Epoch: 7 Global Step: 37890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:40:55,084-Speed 18500.83 samples/sec Loss 8.1455 LearningRate 0.1988 Epoch: 7 Global Step: 37900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:40:59,509-Speed 18517.92 samples/sec Loss 8.1339 LearningRate 0.1987 Epoch: 7 Global Step: 37910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:41:03,955-Speed 18429.09 samples/sec Loss 8.1330 LearningRate 0.1987 Epoch: 7 Global Step: 37920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:41:08,372-Speed 18552.67 samples/sec Loss 8.1170 LearningRate 0.1986 Epoch: 7 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:41:12,794-Speed 18531.75 samples/sec Loss 8.1341 LearningRate 0.1985 Epoch: 7 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:41:17,194-Speed 18626.89 samples/sec Loss 8.1146 LearningRate 0.1985 Epoch: 7 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:41:21,603-Speed 18583.86 samples/sec Loss 8.1288 LearningRate 0.1984 Epoch: 7 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:41:26,016-Speed 18570.68 samples/sec Loss 8.1260 LearningRate 0.1984 Epoch: 7 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:41:30,436-Speed 18536.78 samples/sec Loss 8.1462 LearningRate 0.1983 Epoch: 7 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:41:34,844-Speed 18589.71 samples/sec Loss 8.1238 LearningRate 0.1982 Epoch: 7 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:41:39,266-Speed 18529.75 samples/sec Loss 8.1182 LearningRate 0.1982 Epoch: 7 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:41:43,671-Speed 18599.79 samples/sec Loss 8.1292 LearningRate 0.1981 Epoch: 7 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:41:48,081-Speed 18583.38 samples/sec Loss 8.1342 LearningRate 0.1981 Epoch: 7 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:41:52,490-Speed 18585.41 samples/sec Loss 8.0976 LearningRate 0.1980 Epoch: 7 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:41:56,926-Speed 18471.44 samples/sec Loss 8.1055 LearningRate 0.1979 Epoch: 7 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:01,374-Speed 18420.88 samples/sec Loss 8.0921 LearningRate 0.1979 Epoch: 7 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:05,820-Speed 18435.03 samples/sec Loss 8.1104 LearningRate 0.1978 Epoch: 7 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:10,232-Speed 18574.60 samples/sec Loss 8.1034 LearningRate 0.1978 Epoch: 7 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:42:14,719-Speed 18259.31 samples/sec Loss 8.1534 LearningRate 0.1977 Epoch: 7 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:42:19,125-Speed 18601.88 samples/sec Loss 8.1013 LearningRate 0.1976 Epoch: 7 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:42:23,582-Speed 18384.45 samples/sec Loss 8.1032 LearningRate 0.1976 Epoch: 7 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:42:27,994-Speed 18573.66 samples/sec Loss 8.1106 LearningRate 0.1975 Epoch: 7 Global Step: 38110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:32,425-Speed 18492.53 samples/sec Loss 8.1326 LearningRate 0.1975 Epoch: 7 Global Step: 38120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:36,812-Speed 18680.70 samples/sec Loss 8.1410 LearningRate 0.1974 Epoch: 7 Global Step: 38130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:41,257-Speed 18434.47 samples/sec Loss 8.1461 LearningRate 0.1973 Epoch: 7 Global Step: 38140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:45,653-Speed 18637.91 samples/sec Loss 8.1232 LearningRate 0.1973 Epoch: 7 Global Step: 38150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:50,068-Speed 18559.60 samples/sec Loss 8.0974 LearningRate 0.1972 Epoch: 7 Global Step: 38160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:54,562-Speed 18241.21 samples/sec Loss 8.1139 LearningRate 0.1972 Epoch: 7 Global Step: 38170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:42:58,949-Speed 18680.23 samples/sec Loss 8.1295 LearningRate 0.1971 Epoch: 7 Global Step: 38180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:03,421-Speed 18328.88 samples/sec Loss 8.0900 LearningRate 0.1970 Epoch: 7 Global Step: 38190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:07,813-Speed 18657.42 samples/sec Loss 8.0976 LearningRate 0.1970 Epoch: 7 Global Step: 38200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:12,256-Speed 18439.94 samples/sec Loss 8.0934 LearningRate 0.1969 Epoch: 7 Global Step: 38210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:43:16,712-Speed 18392.32 samples/sec Loss 8.0939 LearningRate 0.1969 Epoch: 7 Global Step: 38220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:43:21,186-Speed 18318.28 samples/sec Loss 8.1299 LearningRate 0.1968 Epoch: 7 Global Step: 38230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:43:25,650-Speed 18357.27 samples/sec Loss 8.1117 LearningRate 0.1967 Epoch: 7 Global Step: 38240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:43:30,088-Speed 18461.40 samples/sec Loss 8.0452 LearningRate 0.1967 Epoch: 7 Global Step: 38250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:43:34,513-Speed 18520.43 samples/sec Loss 8.0424 LearningRate 0.1966 Epoch: 7 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:38,932-Speed 18547.27 samples/sec Loss 8.1021 LearningRate 0.1966 Epoch: 7 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:43,387-Speed 18398.40 samples/sec Loss 8.0591 LearningRate 0.1965 Epoch: 7 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:47,779-Speed 18659.06 samples/sec Loss 8.1193 LearningRate 0.1964 Epoch: 7 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:52,179-Speed 18621.09 samples/sec Loss 8.0881 LearningRate 0.1964 Epoch: 7 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:43:56,575-Speed 18642.65 samples/sec Loss 8.0780 LearningRate 0.1963 Epoch: 7 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:01,043-Speed 18342.63 samples/sec Loss 8.0690 LearningRate 0.1962 Epoch: 7 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:05,568-Speed 18110.91 samples/sec Loss 8.0863 LearningRate 0.1962 Epoch: 7 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:10,056-Speed 18259.34 samples/sec Loss 8.1604 LearningRate 0.1961 Epoch: 7 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:14,599-Speed 18038.97 samples/sec Loss 8.1142 LearningRate 0.1961 Epoch: 7 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:19,108-Speed 18175.49 samples/sec Loss 8.1394 LearningRate 0.1960 Epoch: 7 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:44:23,643-Speed 18069.47 samples/sec Loss 8.0898 LearningRate 0.1959 Epoch: 7 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:44:28,181-Speed 18058.08 samples/sec Loss 8.0862 LearningRate 0.1959 Epoch: 7 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:44:32,652-Speed 18328.38 samples/sec Loss 8.0999 LearningRate 0.1958 Epoch: 7 Global Step: 38390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:37,118-Speed 18348.72 samples/sec Loss 8.1191 LearningRate 0.1958 Epoch: 7 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:41,574-Speed 18388.50 samples/sec Loss 8.0635 LearningRate 0.1957 Epoch: 7 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:46,030-Speed 18384.73 samples/sec Loss 8.0622 LearningRate 0.1956 Epoch: 7 Global Step: 38420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:50,475-Speed 18436.98 samples/sec Loss 8.1018 LearningRate 0.1956 Epoch: 7 Global Step: 38430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:54,918-Speed 18442.29 samples/sec Loss 8.0342 LearningRate 0.1955 Epoch: 7 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:44:59,358-Speed 18455.27 samples/sec Loss 8.0773 LearningRate 0.1955 Epoch: 7 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:45:03,798-Speed 18457.85 samples/sec Loss 8.0598 LearningRate 0.1954 Epoch: 7 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:45:08,224-Speed 18509.69 samples/sec Loss 8.0799 LearningRate 0.1953 Epoch: 7 Global Step: 38470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:45:12,688-Speed 18356.93 samples/sec Loss 8.0464 LearningRate 0.1953 Epoch: 7 Global Step: 38480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:45:17,144-Speed 18388.42 samples/sec Loss 8.0621 LearningRate 0.1952 Epoch: 7 Global Step: 38490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:21,603-Speed 18377.52 samples/sec Loss 8.0778 LearningRate 0.1952 Epoch: 7 Global Step: 38500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:26,064-Speed 18367.27 samples/sec Loss 8.0594 LearningRate 0.1951 Epoch: 7 Global Step: 38510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:30,506-Speed 18446.96 samples/sec Loss 8.0939 LearningRate 0.1951 Epoch: 7 Global Step: 38520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:34,941-Speed 18486.25 samples/sec Loss 8.0998 LearningRate 0.1950 Epoch: 7 Global Step: 38530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:39,494-Speed 17996.69 samples/sec Loss 8.1059 LearningRate 0.1949 Epoch: 7 Global Step: 38540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:43,923-Speed 18501.73 samples/sec Loss 8.1309 LearningRate 0.1949 Epoch: 7 Global Step: 38550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:48,384-Speed 18370.06 samples/sec Loss 8.0728 LearningRate 0.1948 Epoch: 7 Global Step: 38560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:52,840-Speed 18390.42 samples/sec Loss 8.1049 LearningRate 0.1948 Epoch: 7 Global Step: 38570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:45:57,278-Speed 18467.74 samples/sec Loss 8.0586 LearningRate 0.1947 Epoch: 7 Global Step: 38580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:46:01,709-Speed 18492.40 samples/sec Loss 8.1186 LearningRate 0.1946 Epoch: 7 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:06,136-Speed 18510.49 samples/sec Loss 8.0568 LearningRate 0.1946 Epoch: 7 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:10,594-Speed 18383.12 samples/sec Loss 8.0436 LearningRate 0.1945 Epoch: 7 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:15,119-Speed 18106.75 samples/sec Loss 8.0989 LearningRate 0.1945 Epoch: 7 Global Step: 38620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:19,565-Speed 18431.65 samples/sec Loss 8.1099 LearningRate 0.1944 Epoch: 7 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:24,006-Speed 18451.61 samples/sec Loss 8.1065 LearningRate 0.1943 Epoch: 7 Global Step: 38640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:28,428-Speed 18531.87 samples/sec Loss 8.0809 LearningRate 0.1943 Epoch: 7 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:32,867-Speed 18460.21 samples/sec Loss 8.0622 LearningRate 0.1942 Epoch: 7 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:37,373-Speed 18183.93 samples/sec Loss 8.0331 LearningRate 0.1942 Epoch: 7 Global Step: 38670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:41,819-Speed 18431.45 samples/sec Loss 8.1129 LearningRate 0.1941 Epoch: 7 Global Step: 38680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:46:49,166-Speed 11151.63 samples/sec Loss 8.0914 LearningRate 0.1940 Epoch: 7 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:46:53,614-Speed 18422.16 samples/sec Loss 8.0823 LearningRate 0.1940 Epoch: 7 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:46:58,076-Speed 18365.07 samples/sec Loss 8.0661 LearningRate 0.1939 Epoch: 7 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:47:02,488-Speed 18571.84 samples/sec Loss 8.0448 LearningRate 0.1939 Epoch: 7 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:47:06,955-Speed 18347.75 samples/sec Loss 8.1138 LearningRate 0.1938 Epoch: 7 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:47:11,413-Speed 18379.20 samples/sec Loss 8.0485 LearningRate 0.1937 Epoch: 7 Global Step: 38740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:15,829-Speed 18551.84 samples/sec Loss 8.0218 LearningRate 0.1937 Epoch: 7 Global Step: 38750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:20,304-Speed 18315.66 samples/sec Loss 8.0374 LearningRate 0.1936 Epoch: 7 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:24,726-Speed 18528.39 samples/sec Loss 8.0003 LearningRate 0.1936 Epoch: 7 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:29,153-Speed 18511.19 samples/sec Loss 8.0871 LearningRate 0.1935 Epoch: 7 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:33,617-Speed 18354.42 samples/sec Loss 8.0472 LearningRate 0.1934 Epoch: 7 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:38,027-Speed 18581.99 samples/sec Loss 8.0259 LearningRate 0.1934 Epoch: 7 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:45,900-Speed 10407.35 samples/sec Loss 8.0739 LearningRate 0.1933 Epoch: 7 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:50,368-Speed 18340.08 samples/sec Loss 8.0331 LearningRate 0.1933 Epoch: 7 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:54,848-Speed 18301.84 samples/sec Loss 8.0527 LearningRate 0.1932 Epoch: 7 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:47:59,286-Speed 18471.30 samples/sec Loss 8.0494 LearningRate 0.1931 Epoch: 7 Global Step: 38840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:48:03,726-Speed 18453.79 samples/sec Loss 8.0725 LearningRate 0.1931 Epoch: 7 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:48:08,160-Speed 18481.65 samples/sec Loss 8.0513 LearningRate 0.1930 Epoch: 7 Global Step: 38860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:48:12,601-Speed 18449.52 samples/sec Loss 8.0572 LearningRate 0.1930 Epoch: 7 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:48:17,025-Speed 18522.39 samples/sec Loss 8.0825 LearningRate 0.1929 Epoch: 7 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:48:21,484-Speed 18378.03 samples/sec Loss 8.0743 LearningRate 0.1928 Epoch: 7 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:25,925-Speed 18452.61 samples/sec Loss 8.0848 LearningRate 0.1928 Epoch: 7 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:30,333-Speed 18588.69 samples/sec Loss 8.0684 LearningRate 0.1927 Epoch: 7 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:34,791-Speed 18384.75 samples/sec Loss 8.0908 LearningRate 0.1927 Epoch: 7 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:39,233-Speed 18444.97 samples/sec Loss 8.0714 LearningRate 0.1926 Epoch: 7 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:43,672-Speed 18459.57 samples/sec Loss 8.0081 LearningRate 0.1925 Epoch: 7 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:48,105-Speed 18487.58 samples/sec Loss 8.0239 LearningRate 0.1925 Epoch: 7 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:52,565-Speed 18368.89 samples/sec Loss 8.0657 LearningRate 0.1924 Epoch: 7 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:48:57,023-Speed 18383.12 samples/sec Loss 8.0952 LearningRate 0.1924 Epoch: 7 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:01,456-Speed 18484.12 samples/sec Loss 8.0805 LearningRate 0.1923 Epoch: 7 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:05,895-Speed 18468.99 samples/sec Loss 8.0525 LearningRate 0.1922 Epoch: 7 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:49:10,358-Speed 18360.94 samples/sec Loss 8.0567 LearningRate 0.1922 Epoch: 7 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:14,880-Speed 18119.76 samples/sec Loss 8.0302 LearningRate 0.1921 Epoch: 7 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:19,386-Speed 18184.86 samples/sec Loss 7.9711 LearningRate 0.1921 Epoch: 7 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:23,827-Speed 18454.35 samples/sec Loss 8.0389 LearningRate 0.1920 Epoch: 7 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:28,284-Speed 18385.83 samples/sec Loss 8.0301 LearningRate 0.1919 Epoch: 7 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:32,717-Speed 18483.63 samples/sec Loss 8.0340 LearningRate 0.1919 Epoch: 7 Global Step: 39050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:37,149-Speed 18489.13 samples/sec Loss 7.9957 LearningRate 0.1918 Epoch: 7 Global Step: 39060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:41,585-Speed 18469.39 samples/sec Loss 8.0875 LearningRate 0.1918 Epoch: 7 Global Step: 39070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:46,040-Speed 18395.59 samples/sec Loss 8.0994 LearningRate 0.1917 Epoch: 7 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:50,511-Speed 18328.84 samples/sec Loss 8.0533 LearningRate 0.1917 Epoch: 7 Global Step: 39090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:49:54,995-Speed 18273.97 samples/sec Loss 8.0369 LearningRate 0.1916 Epoch: 7 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:49:59,408-Speed 18569.71 samples/sec Loss 8.0671 LearningRate 0.1915 Epoch: 7 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:50:03,800-Speed 18657.77 samples/sec Loss 8.0676 LearningRate 0.1915 Epoch: 7 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:50:08,209-Speed 18586.96 samples/sec Loss 8.0361 LearningRate 0.1914 Epoch: 7 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:50:12,661-Speed 18400.93 samples/sec Loss 8.0845 LearningRate 0.1914 Epoch: 7 Global Step: 39140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:50:17,100-Speed 18459.28 samples/sec Loss 8.0881 LearningRate 0.1913 Epoch: 7 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:21,523-Speed 18526.16 samples/sec Loss 8.0817 LearningRate 0.1912 Epoch: 7 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:25,989-Speed 18350.85 samples/sec Loss 8.0386 LearningRate 0.1912 Epoch: 7 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:30,410-Speed 18539.41 samples/sec Loss 8.0404 LearningRate 0.1911 Epoch: 7 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:34,895-Speed 18271.45 samples/sec Loss 8.0239 LearningRate 0.1911 Epoch: 7 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:39,362-Speed 18342.90 samples/sec Loss 8.0491 LearningRate 0.1910 Epoch: 7 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:43,783-Speed 18534.19 samples/sec Loss 8.0254 LearningRate 0.1909 Epoch: 7 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:48,216-Speed 18487.48 samples/sec Loss 8.0668 LearningRate 0.1909 Epoch: 7 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:52,633-Speed 18556.37 samples/sec Loss 7.9973 LearningRate 0.1908 Epoch: 7 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:50:57,025-Speed 18661.41 samples/sec Loss 8.0079 LearningRate 0.1908 Epoch: 7 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:01,428-Speed 18607.63 samples/sec Loss 8.0470 LearningRate 0.1907 Epoch: 7 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:51:05,844-Speed 18570.89 samples/sec Loss 8.0186 LearningRate 0.1906 Epoch: 7 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:51:10,246-Speed 18615.01 samples/sec Loss 8.0762 LearningRate 0.1906 Epoch: 7 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:51:14,665-Speed 18543.30 samples/sec Loss 8.0421 LearningRate 0.1905 Epoch: 7 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:51:19,097-Speed 18488.91 samples/sec Loss 8.0520 LearningRate 0.1905 Epoch: 7 Global Step: 39290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:23,527-Speed 18501.20 samples/sec Loss 8.0502 LearningRate 0.1904 Epoch: 7 Global Step: 39300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:27,962-Speed 18478.14 samples/sec Loss 8.0029 LearningRate 0.1903 Epoch: 7 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:32,342-Speed 18704.54 samples/sec Loss 8.0077 LearningRate 0.1903 Epoch: 7 Global Step: 39320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:36,747-Speed 18602.21 samples/sec Loss 7.9887 LearningRate 0.1902 Epoch: 7 Global Step: 39330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:41,163-Speed 18557.58 samples/sec Loss 8.0354 LearningRate 0.1902 Epoch: 7 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:45,598-Speed 18475.15 samples/sec Loss 8.0156 LearningRate 0.1901 Epoch: 7 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:49,994-Speed 18642.79 samples/sec Loss 8.0209 LearningRate 0.1901 Epoch: 7 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:54,381-Speed 18679.06 samples/sec Loss 8.0298 LearningRate 0.1900 Epoch: 7 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:51:58,811-Speed 18495.91 samples/sec Loss 8.0227 LearningRate 0.1899 Epoch: 7 Global Step: 39380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:03,199-Speed 18678.44 samples/sec Loss 8.0278 LearningRate 0.1899 Epoch: 7 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:52:07,596-Speed 18635.87 samples/sec Loss 8.0186 LearningRate 0.1898 Epoch: 7 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:52:11,995-Speed 18627.06 samples/sec Loss 7.9951 LearningRate 0.1898 Epoch: 7 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:52:16,393-Speed 18631.87 samples/sec Loss 8.0200 LearningRate 0.1897 Epoch: 7 Global Step: 39420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:20,864-Speed 18329.54 samples/sec Loss 7.9981 LearningRate 0.1896 Epoch: 7 Global Step: 39430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:25,391-Speed 18101.64 samples/sec Loss 8.0259 LearningRate 0.1896 Epoch: 7 Global Step: 39440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:29,841-Speed 18418.76 samples/sec Loss 8.0199 LearningRate 0.1895 Epoch: 7 Global Step: 39450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:34,271-Speed 18495.40 samples/sec Loss 8.0198 LearningRate 0.1895 Epoch: 7 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:38,678-Speed 18593.62 samples/sec Loss 8.0386 LearningRate 0.1894 Epoch: 7 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:43,104-Speed 18512.91 samples/sec Loss 8.0036 LearningRate 0.1893 Epoch: 7 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:47,600-Speed 18226.16 samples/sec Loss 8.0188 LearningRate 0.1893 Epoch: 7 Global Step: 39490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:52,012-Speed 18576.27 samples/sec Loss 8.0256 LearningRate 0.1892 Epoch: 7 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:52:56,413-Speed 18617.99 samples/sec Loss 7.9754 LearningRate 0.1892 Epoch: 7 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:00,827-Speed 18561.86 samples/sec Loss 8.0105 LearningRate 0.1891 Epoch: 7 Global Step: 39520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:53:05,225-Speed 18633.77 samples/sec Loss 7.9961 LearningRate 0.1891 Epoch: 7 Global Step: 39530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:53:09,624-Speed 18627.04 samples/sec Loss 7.9831 LearningRate 0.1890 Epoch: 7 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:53:14,037-Speed 18567.08 samples/sec Loss 7.9579 LearningRate 0.1889 Epoch: 7 Global Step: 39550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:18,458-Speed 18539.86 samples/sec Loss 8.0172 LearningRate 0.1889 Epoch: 7 Global Step: 39560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:22,888-Speed 18497.07 samples/sec Loss 7.9762 LearningRate 0.1888 Epoch: 7 Global Step: 39570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:27,304-Speed 18558.32 samples/sec Loss 8.0121 LearningRate 0.1888 Epoch: 7 Global Step: 39580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:31,808-Speed 18193.90 samples/sec Loss 7.9988 LearningRate 0.1887 Epoch: 7 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:36,242-Speed 18478.87 samples/sec Loss 8.0167 LearningRate 0.1886 Epoch: 7 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:40,625-Speed 18695.67 samples/sec Loss 7.9400 LearningRate 0.1886 Epoch: 7 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:53:45,037-Speed 18577.21 samples/sec Loss 7.9555 LearningRate 0.1885 Epoch: 7 Global Step: 39620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:53:49,438-Speed 18615.74 samples/sec Loss 7.9944 LearningRate 0.1885 Epoch: 7 Global Step: 39630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:53:53,914-Speed 18306.82 samples/sec Loss 8.0066 LearningRate 0.1884 Epoch: 7 Global Step: 39640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:53:58,357-Speed 18447.03 samples/sec Loss 7.9905 LearningRate 0.1883 Epoch: 7 Global Step: 39650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:02,808-Speed 18406.64 samples/sec Loss 7.9435 LearningRate 0.1883 Epoch: 7 Global Step: 39660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:07,349-Speed 18048.23 samples/sec Loss 7.9972 LearningRate 0.1882 Epoch: 7 Global Step: 39670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:11,836-Speed 18258.99 samples/sec Loss 8.0008 LearningRate 0.1882 Epoch: 7 Global Step: 39680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:16,225-Speed 18668.64 samples/sec Loss 7.9300 LearningRate 0.1881 Epoch: 7 Global Step: 39690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:20,695-Speed 18334.23 samples/sec Loss 8.0185 LearningRate 0.1881 Epoch: 7 Global Step: 39700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:25,137-Speed 18447.75 samples/sec Loss 8.0160 LearningRate 0.1880 Epoch: 7 Global Step: 39710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:29,528-Speed 18662.99 samples/sec Loss 7.9977 LearningRate 0.1879 Epoch: 7 Global Step: 39720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:54:33,957-Speed 18498.39 samples/sec Loss 7.9752 LearningRate 0.1879 Epoch: 7 Global Step: 39730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:54:38,379-Speed 18536.88 samples/sec Loss 7.9731 LearningRate 0.1878 Epoch: 7 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:54:42,765-Speed 18684.13 samples/sec Loss 7.9715 LearningRate 0.1878 Epoch: 7 Global Step: 39750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:54:47,234-Speed 18331.90 samples/sec Loss 7.9681 LearningRate 0.1877 Epoch: 7 Global Step: 39760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:54:51,638-Speed 18610.90 samples/sec Loss 8.0208 LearningRate 0.1876 Epoch: 7 Global Step: 39770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:54:56,110-Speed 18326.71 samples/sec Loss 7.9624 LearningRate 0.1876 Epoch: 7 Global Step: 39780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:00,576-Speed 18349.79 samples/sec Loss 8.0353 LearningRate 0.1875 Epoch: 7 Global Step: 39790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:05,071-Speed 18228.20 samples/sec Loss 8.0142 LearningRate 0.1875 Epoch: 7 Global Step: 39800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:09,508-Speed 18469.52 samples/sec Loss 7.9764 LearningRate 0.1874 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:14,035-Speed 18097.14 samples/sec Loss 7.9850 LearningRate 0.1873 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:18,487-Speed 18407.19 samples/sec Loss 7.9724 LearningRate 0.1873 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:22,933-Speed 18428.26 samples/sec Loss 7.9429 LearningRate 0.1872 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:27,346-Speed 18570.57 samples/sec Loss 7.9594 LearningRate 0.1872 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:31,752-Speed 18592.88 samples/sec Loss 8.0070 LearningRate 0.1871 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:36,160-Speed 18595.49 samples/sec Loss 7.9925 LearningRate 0.1871 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:55:40,590-Speed 18495.69 samples/sec Loss 7.9923 LearningRate 0.1870 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:45,010-Speed 18538.20 samples/sec Loss 8.0116 LearningRate 0.1869 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:49,452-Speed 18451.38 samples/sec Loss 7.9562 LearningRate 0.1869 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:53,870-Speed 18542.71 samples/sec Loss 7.9637 LearningRate 0.1868 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:55:58,351-Speed 18286.31 samples/sec Loss 7.9996 LearningRate 0.1868 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:56:02,826-Speed 18312.35 samples/sec Loss 7.9465 LearningRate 0.1867 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:56:07,247-Speed 18536.15 samples/sec Loss 7.9316 LearningRate 0.1866 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:56:11,661-Speed 18565.17 samples/sec Loss 7.9751 LearningRate 0.1866 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:56:16,072-Speed 18573.86 samples/sec Loss 8.0036 LearningRate 0.1865 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:56:20,517-Speed 18437.62 samples/sec Loss 7.9669 LearningRate 0.1865 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:56:24,933-Speed 18552.92 samples/sec Loss 7.9603 LearningRate 0.1864 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:56:29,344-Speed 18579.51 samples/sec Loss 8.0084 LearningRate 0.1863 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:56:33,757-Speed 18564.90 samples/sec Loss 7.9499 LearningRate 0.1863 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:56:38,176-Speed 18544.03 samples/sec Loss 7.9857 LearningRate 0.1862 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:56:42,572-Speed 18641.91 samples/sec Loss 7.9204 LearningRate 0.1862 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:56:46,976-Speed 18605.09 samples/sec Loss 7.9907 LearningRate 0.1861 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:56:51,363-Speed 18681.85 samples/sec Loss 7.9774 LearningRate 0.1861 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:56:55,779-Speed 18553.46 samples/sec Loss 7.9924 LearningRate 0.1860 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:57:00,183-Speed 18606.54 samples/sec Loss 7.9265 LearningRate 0.1859 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:57:04,572-Speed 18670.72 samples/sec Loss 7.9214 LearningRate 0.1859 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:57:08,966-Speed 18647.39 samples/sec Loss 7.9058 LearningRate 0.1858 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:57:13,359-Speed 18654.60 samples/sec Loss 7.9236 LearningRate 0.1858 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:57:17,785-Speed 18512.70 samples/sec Loss 7.9722 LearningRate 0.1857 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:57:22,210-Speed 18518.47 samples/sec Loss 7.9898 LearningRate 0.1856 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:26,680-Speed 18333.61 samples/sec Loss 7.9772 LearningRate 0.1856 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:31,057-Speed 18720.80 samples/sec Loss 8.0047 LearningRate 0.1855 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:35,469-Speed 18574.15 samples/sec Loss 7.9454 LearningRate 0.1855 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:39,865-Speed 18640.09 samples/sec Loss 7.9586 LearningRate 0.1854 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:44,285-Speed 18540.47 samples/sec Loss 7.9621 LearningRate 0.1854 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:48,692-Speed 18593.30 samples/sec Loss 7.9313 LearningRate 0.1853 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:53,086-Speed 18644.41 samples/sec Loss 7.9650 LearningRate 0.1852 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:57:57,549-Speed 18362.53 samples/sec Loss 7.9351 LearningRate 0.1852 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:58:02,015-Speed 18349.88 samples/sec Loss 7.9488 LearningRate 0.1851 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 02:58:06,451-Speed 18471.77 samples/sec Loss 7.9370 LearningRate 0.1851 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:10,870-Speed 18541.91 samples/sec Loss 7.9738 LearningRate 0.1850 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:15,281-Speed 18576.80 samples/sec Loss 7.9505 LearningRate 0.1849 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:19,684-Speed 18613.44 samples/sec Loss 7.9390 LearningRate 0.1849 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:24,141-Speed 18383.58 samples/sec Loss 7.9740 LearningRate 0.1848 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:28,625-Speed 18271.93 samples/sec Loss 7.9675 LearningRate 0.1848 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:33,046-Speed 18534.26 samples/sec Loss 7.9343 LearningRate 0.1847 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:37,497-Speed 18411.25 samples/sec Loss 7.9455 LearningRate 0.1847 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:41,917-Speed 18536.95 samples/sec Loss 7.9225 LearningRate 0.1846 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:46,343-Speed 18514.25 samples/sec Loss 7.9722 LearningRate 0.1845 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:58:50,781-Speed 18466.55 samples/sec Loss 7.9763 LearningRate 0.1845 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:58:55,167-Speed 18683.06 samples/sec Loss 7.9493 LearningRate 0.1844 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 02:58:59,567-Speed 18624.73 samples/sec Loss 7.9539 LearningRate 0.1844 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:03,982-Speed 18562.77 samples/sec Loss 7.9376 LearningRate 0.1843 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:08,368-Speed 18686.14 samples/sec Loss 7.9339 LearningRate 0.1842 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:12,799-Speed 18491.12 samples/sec Loss 7.9682 LearningRate 0.1842 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:17,205-Speed 18598.17 samples/sec Loss 7.9585 LearningRate 0.1841 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:21,632-Speed 18510.86 samples/sec Loss 7.9873 LearningRate 0.1841 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:26,035-Speed 18614.98 samples/sec Loss 7.9462 LearningRate 0.1840 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:30,426-Speed 18663.17 samples/sec Loss 7.9502 LearningRate 0.1840 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:34,813-Speed 18679.13 samples/sec Loss 7.9659 LearningRate 0.1839 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:39,223-Speed 18583.16 samples/sec Loss 7.9651 LearningRate 0.1838 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:43,660-Speed 18467.07 samples/sec Loss 7.9412 LearningRate 0.1838 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:48,069-Speed 18586.78 samples/sec Loss 7.9663 LearningRate 0.1837 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:52,479-Speed 18581.33 samples/sec Loss 7.9439 LearningRate 0.1837 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 02:59:56,896-Speed 18549.73 samples/sec Loss 7.9436 LearningRate 0.1836 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:01,293-Speed 18638.66 samples/sec Loss 7.9331 LearningRate 0.1836 Epoch: 7 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:05,751-Speed 18382.42 samples/sec Loss 7.9049 LearningRate 0.1835 Epoch: 7 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:10,173-Speed 18528.80 samples/sec Loss 7.9315 LearningRate 0.1834 Epoch: 7 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:14,604-Speed 18491.91 samples/sec Loss 7.9621 LearningRate 0.1834 Epoch: 7 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:19,048-Speed 18436.91 samples/sec Loss 7.9305 LearningRate 0.1833 Epoch: 7 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:23,474-Speed 18517.33 samples/sec Loss 7.9558 LearningRate 0.1833 Epoch: 7 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:27,853-Speed 18712.61 samples/sec Loss 7.9468 LearningRate 0.1832 Epoch: 7 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:00:32,273-Speed 18536.61 samples/sec Loss 7.9393 LearningRate 0.1831 Epoch: 7 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:00:36,699-Speed 18514.52 samples/sec Loss 7.9417 LearningRate 0.1831 Epoch: 7 Global Step: 40550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:41,137-Speed 18462.42 samples/sec Loss 7.9711 LearningRate 0.1830 Epoch: 7 Global Step: 40560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:45,603-Speed 18349.01 samples/sec Loss 7.8704 LearningRate 0.1830 Epoch: 7 Global Step: 40570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:49,999-Speed 18640.21 samples/sec Loss 7.9816 LearningRate 0.1829 Epoch: 7 Global Step: 40580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:54,391-Speed 18656.64 samples/sec Loss 7.9775 LearningRate 0.1829 Epoch: 7 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:00:58,818-Speed 18514.87 samples/sec Loss 7.9314 LearningRate 0.1828 Epoch: 7 Global Step: 40600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:03,238-Speed 18536.56 samples/sec Loss 7.9029 LearningRate 0.1827 Epoch: 7 Global Step: 40610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:07,620-Speed 18699.24 samples/sec Loss 7.9205 LearningRate 0.1827 Epoch: 7 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:12,069-Speed 18418.65 samples/sec Loss 7.9348 LearningRate 0.1826 Epoch: 7 Global Step: 40630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:16,499-Speed 18508.23 samples/sec Loss 7.9287 LearningRate 0.1826 Epoch: 7 Global Step: 40640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:20,993-Speed 18231.59 samples/sec Loss 7.8670 LearningRate 0.1825 Epoch: 7 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:01:25,414-Speed 18535.28 samples/sec Loss 7.8875 LearningRate 0.1824 Epoch: 7 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:01:29,834-Speed 18541.82 samples/sec Loss 7.9041 LearningRate 0.1824 Epoch: 7 Global Step: 40670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:34,263-Speed 18497.98 samples/sec Loss 7.9605 LearningRate 0.1823 Epoch: 7 Global Step: 40680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:38,678-Speed 18563.58 samples/sec Loss 7.9687 LearningRate 0.1823 Epoch: 7 Global Step: 40690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:43,097-Speed 18539.90 samples/sec Loss 7.9048 LearningRate 0.1822 Epoch: 7 Global Step: 40700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:47,537-Speed 18455.02 samples/sec Loss 7.9013 LearningRate 0.1822 Epoch: 7 Global Step: 40710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:51,983-Speed 18431.93 samples/sec Loss 7.9196 LearningRate 0.1821 Epoch: 7 Global Step: 40720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:01:56,389-Speed 18599.99 samples/sec Loss 7.8210 LearningRate 0.1820 Epoch: 7 Global Step: 40730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:00,815-Speed 18513.93 samples/sec Loss 7.9364 LearningRate 0.1820 Epoch: 7 Global Step: 40740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:05,229-Speed 18564.61 samples/sec Loss 7.8784 LearningRate 0.1819 Epoch: 7 Global Step: 40750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:09,628-Speed 18628.45 samples/sec Loss 7.9001 LearningRate 0.1819 Epoch: 7 Global Step: 40760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:14,141-Speed 18157.72 samples/sec Loss 7.9190 LearningRate 0.1818 Epoch: 7 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:02:18,538-Speed 18637.75 samples/sec Loss 7.8990 LearningRate 0.1818 Epoch: 7 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:22,937-Speed 18624.92 samples/sec Loss 7.9140 LearningRate 0.1817 Epoch: 7 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:27,336-Speed 18628.04 samples/sec Loss 7.8983 LearningRate 0.1816 Epoch: 7 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:31,735-Speed 18623.96 samples/sec Loss 7.9021 LearningRate 0.1816 Epoch: 7 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:36,172-Speed 18469.91 samples/sec Loss 7.8831 LearningRate 0.1815 Epoch: 7 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:40,593-Speed 18532.02 samples/sec Loss 7.8941 LearningRate 0.1815 Epoch: 7 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:44,989-Speed 18639.24 samples/sec Loss 7.9096 LearningRate 0.1814 Epoch: 7 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:49,408-Speed 18547.40 samples/sec Loss 7.8913 LearningRate 0.1814 Epoch: 7 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:53,827-Speed 18537.66 samples/sec Loss 7.9369 LearningRate 0.1813 Epoch: 7 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:02:58,218-Speed 18664.13 samples/sec Loss 7.9636 LearningRate 0.1812 Epoch: 7 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:03:02,636-Speed 18546.24 samples/sec Loss 7.9037 LearningRate 0.1812 Epoch: 7 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:03:07,048-Speed 18570.99 samples/sec Loss 7.9140 LearningRate 0.1811 Epoch: 7 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:03:11,495-Speed 18428.03 samples/sec Loss 7.9289 LearningRate 0.1811 Epoch: 7 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:03:15,877-Speed 18697.75 samples/sec Loss 7.9175 LearningRate 0.1810 Epoch: 7 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:03:20,365-Speed 18262.59 samples/sec Loss 7.9330 LearningRate 0.1809 Epoch: 7 Global Step: 40920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:03:24,834-Speed 18330.02 samples/sec Loss 7.8520 LearningRate 0.1809 Epoch: 7 Global Step: 40930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:03:29,239-Speed 18607.32 samples/sec Loss 7.9209 LearningRate 0.1808 Epoch: 7 Global Step: 40940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:03:33,647-Speed 18588.94 samples/sec Loss 7.9263 LearningRate 0.1808 Epoch: 7 Global Step: 40950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:03:38,029-Speed 18700.82 samples/sec Loss 7.8888 LearningRate 0.1807 Epoch: 7 Global Step: 40960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:03:42,434-Speed 18604.23 samples/sec Loss 7.9186 LearningRate 0.1807 Epoch: 7 Global Step: 40970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:03:46,847-Speed 18565.68 samples/sec Loss 7.9317 LearningRate 0.1806 Epoch: 7 Global Step: 40980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:03:51,266-Speed 18541.67 samples/sec Loss 7.8786 LearningRate 0.1805 Epoch: 7 Global Step: 40990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:03:55,689-Speed 18527.73 samples/sec Loss 7.8689 LearningRate 0.1805 Epoch: 7 Global Step: 41000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:04:00,099-Speed 18582.90 samples/sec Loss 7.9069 LearningRate 0.1804 Epoch: 7 Global Step: 41010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:04:04,488-Speed 18667.42 samples/sec Loss 7.8683 LearningRate 0.1804 Epoch: 7 Global Step: 41020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:04:08,874-Speed 18685.29 samples/sec Loss 7.8865 LearningRate 0.1803 Epoch: 7 Global Step: 41030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:04:13,279-Speed 18598.97 samples/sec Loss 7.9208 LearningRate 0.1803 Epoch: 7 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:17,687-Speed 18586.89 samples/sec Loss 7.9318 LearningRate 0.1802 Epoch: 7 Global Step: 41050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:22,091-Speed 18618.93 samples/sec Loss 7.9286 LearningRate 0.1801 Epoch: 7 Global Step: 41060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:26,499-Speed 18590.81 samples/sec Loss 7.9233 LearningRate 0.1801 Epoch: 7 Global Step: 41070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:30,890-Speed 18663.40 samples/sec Loss 7.9232 LearningRate 0.1800 Epoch: 7 Global Step: 41080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:35,273-Speed 18696.00 samples/sec Loss 7.8371 LearningRate 0.1800 Epoch: 7 Global Step: 41090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:39,671-Speed 18633.58 samples/sec Loss 7.8713 LearningRate 0.1799 Epoch: 7 Global Step: 41100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:44,076-Speed 18600.94 samples/sec Loss 7.8562 LearningRate 0.1799 Epoch: 7 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:48,476-Speed 18624.55 samples/sec Loss 7.8484 LearningRate 0.1798 Epoch: 7 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:52,868-Speed 18655.17 samples/sec Loss 7.8636 LearningRate 0.1797 Epoch: 7 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:04:57,258-Speed 18667.74 samples/sec Loss 7.8839 LearningRate 0.1797 Epoch: 7 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:05:01,656-Speed 18628.13 samples/sec Loss 7.8677 LearningRate 0.1796 Epoch: 7 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:05:06,050-Speed 18653.41 samples/sec Loss 7.8552 LearningRate 0.1796 Epoch: 7 Global Step: 41160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:05:10,474-Speed 18520.04 samples/sec Loss 7.8736 LearningRate 0.1795 Epoch: 7 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:14,850-Speed 18722.44 samples/sec Loss 7.8665 LearningRate 0.1795 Epoch: 7 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:19,280-Speed 18500.44 samples/sec Loss 7.8770 LearningRate 0.1794 Epoch: 7 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:23,692-Speed 18570.98 samples/sec Loss 7.8775 LearningRate 0.1793 Epoch: 7 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:28,190-Speed 18215.00 samples/sec Loss 7.8658 LearningRate 0.1793 Epoch: 7 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:32,612-Speed 18530.28 samples/sec Loss 7.8196 LearningRate 0.1792 Epoch: 7 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:37,056-Speed 18438.07 samples/sec Loss 7.9230 LearningRate 0.1792 Epoch: 7 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:41,483-Speed 18510.73 samples/sec Loss 7.8604 LearningRate 0.1791 Epoch: 7 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:45,938-Speed 18392.31 samples/sec Loss 7.8786 LearningRate 0.1790 Epoch: 7 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:50,351-Speed 18569.39 samples/sec Loss 7.9197 LearningRate 0.1790 Epoch: 7 Global Step: 41260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:05:54,841-Speed 18251.40 samples/sec Loss 7.9064 LearningRate 0.1789 Epoch: 7 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:05:59,262-Speed 18536.44 samples/sec Loss 7.8559 LearningRate 0.1789 Epoch: 7 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:06:03,702-Speed 18455.98 samples/sec Loss 7.8878 LearningRate 0.1788 Epoch: 7 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:08,100-Speed 18630.37 samples/sec Loss 7.8653 LearningRate 0.1788 Epoch: 7 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:12,540-Speed 18452.38 samples/sec Loss 7.8982 LearningRate 0.1787 Epoch: 7 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:17,004-Speed 18356.62 samples/sec Loss 7.8554 LearningRate 0.1786 Epoch: 7 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:21,420-Speed 18557.02 samples/sec Loss 7.8572 LearningRate 0.1786 Epoch: 7 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:25,878-Speed 18378.76 samples/sec Loss 7.9145 LearningRate 0.1785 Epoch: 7 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:30,302-Speed 18525.54 samples/sec Loss 7.8367 LearningRate 0.1785 Epoch: 7 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:34,714-Speed 18574.58 samples/sec Loss 7.8512 LearningRate 0.1784 Epoch: 7 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:39,171-Speed 18385.83 samples/sec Loss 7.8291 LearningRate 0.1784 Epoch: 7 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:43,575-Speed 18607.92 samples/sec Loss 7.9031 LearningRate 0.1783 Epoch: 7 Global Step: 41380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:06:48,025-Speed 18418.86 samples/sec Loss 7.8875 LearningRate 0.1782 Epoch: 7 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:06:52,441-Speed 18567.96 samples/sec Loss 7.8600 LearningRate 0.1782 Epoch: 7 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:06:56,901-Speed 18375.86 samples/sec Loss 7.8678 LearningRate 0.1781 Epoch: 7 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:01,362-Speed 18367.80 samples/sec Loss 7.8834 LearningRate 0.1781 Epoch: 7 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:05,780-Speed 18547.15 samples/sec Loss 7.8639 LearningRate 0.1780 Epoch: 7 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:10,225-Speed 18435.64 samples/sec Loss 7.8636 LearningRate 0.1780 Epoch: 7 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:14,662-Speed 18467.34 samples/sec Loss 7.8521 LearningRate 0.1779 Epoch: 7 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:19,115-Speed 18400.60 samples/sec Loss 7.8659 LearningRate 0.1778 Epoch: 7 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:23,535-Speed 18538.82 samples/sec Loss 7.8389 LearningRate 0.1778 Epoch: 7 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:28,021-Speed 18266.30 samples/sec Loss 7.8335 LearningRate 0.1777 Epoch: 7 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:07:46,471-Speed 4440.46 samples/sec Loss 7.8624 LearningRate 0.1777 Epoch: 8 Global Step: 41490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:07:50,869-Speed 18633.32 samples/sec Loss 7.8359 LearningRate 0.1776 Epoch: 8 Global Step: 41500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:07:55,287-Speed 18548.01 samples/sec Loss 7.8733 LearningRate 0.1776 Epoch: 8 Global Step: 41510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:07:59,704-Speed 18561.57 samples/sec Loss 7.8207 LearningRate 0.1775 Epoch: 8 Global Step: 41520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:04,148-Speed 18441.95 samples/sec Loss 7.8512 LearningRate 0.1774 Epoch: 8 Global Step: 41530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:08,525-Speed 18719.58 samples/sec Loss 7.8590 LearningRate 0.1774 Epoch: 8 Global Step: 41540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:12,932-Speed 18591.69 samples/sec Loss 7.8769 LearningRate 0.1773 Epoch: 8 Global Step: 41550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:17,334-Speed 18617.98 samples/sec Loss 7.8597 LearningRate 0.1773 Epoch: 8 Global Step: 41560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:21,808-Speed 18317.44 samples/sec Loss 7.8370 LearningRate 0.1772 Epoch: 8 Global Step: 41570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:26,240-Speed 18487.76 samples/sec Loss 7.8348 LearningRate 0.1772 Epoch: 8 Global Step: 41580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:30,687-Speed 18428.21 samples/sec Loss 7.8415 LearningRate 0.1771 Epoch: 8 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:08:35,095-Speed 18596.24 samples/sec Loss 7.8234 LearningRate 0.1770 Epoch: 8 Global Step: 41600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:08:39,584-Speed 18251.09 samples/sec Loss 7.8079 LearningRate 0.1770 Epoch: 8 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:08:43,969-Speed 18688.11 samples/sec Loss 7.8504 LearningRate 0.1769 Epoch: 8 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:08:48,409-Speed 18452.62 samples/sec Loss 7.8258 LearningRate 0.1769 Epoch: 8 Global Step: 41630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:52,851-Speed 18448.20 samples/sec Loss 7.8331 LearningRate 0.1768 Epoch: 8 Global Step: 41640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:08:57,260-Speed 18584.19 samples/sec Loss 7.8532 LearningRate 0.1768 Epoch: 8 Global Step: 41650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:09:01,684-Speed 18523.14 samples/sec Loss 7.8203 LearningRate 0.1767 Epoch: 8 Global Step: 41660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:09:06,071-Speed 18680.44 samples/sec Loss 7.8212 LearningRate 0.1766 Epoch: 8 Global Step: 41670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:09:10,484-Speed 18567.69 samples/sec Loss 7.8479 LearningRate 0.1766 Epoch: 8 Global Step: 41680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:09:14,919-Speed 18475.59 samples/sec Loss 7.8142 LearningRate 0.1765 Epoch: 8 Global Step: 41690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:19,320-Speed 18619.70 samples/sec Loss 7.8754 LearningRate 0.1765 Epoch: 8 Global Step: 41700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:23,737-Speed 18553.61 samples/sec Loss 7.8125 LearningRate 0.1764 Epoch: 8 Global Step: 41710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:28,160-Speed 18527.91 samples/sec Loss 7.8692 LearningRate 0.1764 Epoch: 8 Global Step: 41720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:32,535-Speed 18732.17 samples/sec Loss 7.8391 LearningRate 0.1763 Epoch: 8 Global Step: 41730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:36,979-Speed 18440.69 samples/sec Loss 7.8450 LearningRate 0.1762 Epoch: 8 Global Step: 41740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:41,372-Speed 18658.56 samples/sec Loss 7.8526 LearningRate 0.1762 Epoch: 8 Global Step: 41750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:45,752-Speed 18714.07 samples/sec Loss 7.8434 LearningRate 0.1761 Epoch: 8 Global Step: 41760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:50,200-Speed 18427.08 samples/sec Loss 7.8131 LearningRate 0.1761 Epoch: 8 Global Step: 41770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:54,587-Speed 18679.49 samples/sec Loss 7.8137 LearningRate 0.1760 Epoch: 8 Global Step: 41780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:09:59,053-Speed 18348.06 samples/sec Loss 7.8636 LearningRate 0.1760 Epoch: 8 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:03,434-Speed 18712.32 samples/sec Loss 7.7760 LearningRate 0.1759 Epoch: 8 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:07,875-Speed 18449.64 samples/sec Loss 7.8280 LearningRate 0.1759 Epoch: 8 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:12,335-Speed 18374.87 samples/sec Loss 7.8688 LearningRate 0.1758 Epoch: 8 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:16,762-Speed 18508.93 samples/sec Loss 7.8225 LearningRate 0.1757 Epoch: 8 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:21,192-Speed 18499.36 samples/sec Loss 7.8319 LearningRate 0.1757 Epoch: 8 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:25,655-Speed 18361.77 samples/sec Loss 7.8386 LearningRate 0.1756 Epoch: 8 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:30,083-Speed 18504.72 samples/sec Loss 7.8310 LearningRate 0.1756 Epoch: 8 Global Step: 41860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:34,592-Speed 18171.44 samples/sec Loss 7.8331 LearningRate 0.1755 Epoch: 8 Global Step: 41870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:39,027-Speed 18473.71 samples/sec Loss 7.7876 LearningRate 0.1755 Epoch: 8 Global Step: 41880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:43,483-Speed 18390.21 samples/sec Loss 7.8403 LearningRate 0.1754 Epoch: 8 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:10:47,921-Speed 18462.90 samples/sec Loss 7.8334 LearningRate 0.1753 Epoch: 8 Global Step: 41900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:52,347-Speed 18513.37 samples/sec Loss 7.8728 LearningRate 0.1753 Epoch: 8 Global Step: 41910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:10:56,818-Speed 18329.79 samples/sec Loss 7.8222 LearningRate 0.1752 Epoch: 8 Global Step: 41920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:01,269-Speed 18408.54 samples/sec Loss 7.8049 LearningRate 0.1752 Epoch: 8 Global Step: 41930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:05,711-Speed 18448.91 samples/sec Loss 7.8487 LearningRate 0.1751 Epoch: 8 Global Step: 41940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:10,166-Speed 18392.67 samples/sec Loss 7.8748 LearningRate 0.1751 Epoch: 8 Global Step: 41950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:14,673-Speed 18184.50 samples/sec Loss 7.8496 LearningRate 0.1750 Epoch: 8 Global Step: 41960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:19,112-Speed 18458.36 samples/sec Loss 7.8343 LearningRate 0.1749 Epoch: 8 Global Step: 41970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:23,515-Speed 18608.63 samples/sec Loss 7.8476 LearningRate 0.1749 Epoch: 8 Global Step: 41980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:27,929-Speed 18565.92 samples/sec Loss 7.8615 LearningRate 0.1748 Epoch: 8 Global Step: 41990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:11:32,365-Speed 18476.34 samples/sec Loss 7.8331 LearningRate 0.1748 Epoch: 8 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:11:36,776-Speed 18573.55 samples/sec Loss 7.8379 LearningRate 0.1747 Epoch: 8 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:11:41,226-Speed 18417.88 samples/sec Loss 7.8240 LearningRate 0.1747 Epoch: 8 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:11:45,674-Speed 18424.28 samples/sec Loss 7.8403 LearningRate 0.1746 Epoch: 8 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:11:50,579-Speed 16706.63 samples/sec Loss 7.8026 LearningRate 0.1745 Epoch: 8 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:11:55,024-Speed 18438.97 samples/sec Loss 7.8436 LearningRate 0.1745 Epoch: 8 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:11:59,484-Speed 18368.59 samples/sec Loss 7.7789 LearningRate 0.1744 Epoch: 8 Global Step: 42060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:03,926-Speed 18452.13 samples/sec Loss 7.7949 LearningRate 0.1744 Epoch: 8 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:08,301-Speed 18729.40 samples/sec Loss 7.8647 LearningRate 0.1743 Epoch: 8 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:12,691-Speed 18665.06 samples/sec Loss 7.8271 LearningRate 0.1743 Epoch: 8 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:17,099-Speed 18586.31 samples/sec Loss 7.7888 LearningRate 0.1742 Epoch: 8 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:21,502-Speed 18609.89 samples/sec Loss 7.8388 LearningRate 0.1741 Epoch: 8 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:25,967-Speed 18357.32 samples/sec Loss 7.7908 LearningRate 0.1741 Epoch: 8 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:30,347-Speed 18704.34 samples/sec Loss 7.8123 LearningRate 0.1740 Epoch: 8 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:34,754-Speed 18604.66 samples/sec Loss 7.7900 LearningRate 0.1740 Epoch: 8 Global Step: 42140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:39,163-Speed 18585.12 samples/sec Loss 7.7632 LearningRate 0.1739 Epoch: 8 Global Step: 42150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:12:43,631-Speed 18337.70 samples/sec Loss 7.7862 LearningRate 0.1739 Epoch: 8 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:12:48,068-Speed 18470.52 samples/sec Loss 7.7716 LearningRate 0.1738 Epoch: 8 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:12:52,518-Speed 18413.81 samples/sec Loss 7.7781 LearningRate 0.1738 Epoch: 8 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:12:56,937-Speed 18545.11 samples/sec Loss 7.8238 LearningRate 0.1737 Epoch: 8 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:01,372-Speed 18475.03 samples/sec Loss 7.7984 LearningRate 0.1736 Epoch: 8 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:05,816-Speed 18440.59 samples/sec Loss 7.8107 LearningRate 0.1736 Epoch: 8 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:10,252-Speed 18471.84 samples/sec Loss 7.7516 LearningRate 0.1735 Epoch: 8 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:14,648-Speed 18641.24 samples/sec Loss 7.8107 LearningRate 0.1735 Epoch: 8 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:19,083-Speed 18473.69 samples/sec Loss 7.8115 LearningRate 0.1734 Epoch: 8 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:23,498-Speed 18561.16 samples/sec Loss 7.8018 LearningRate 0.1734 Epoch: 8 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:27,907-Speed 18585.25 samples/sec Loss 7.7887 LearningRate 0.1733 Epoch: 8 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:13:32,347-Speed 18452.84 samples/sec Loss 7.7941 LearningRate 0.1732 Epoch: 8 Global Step: 42270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:13:36,739-Speed 18659.76 samples/sec Loss 7.7699 LearningRate 0.1732 Epoch: 8 Global Step: 42280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:13:41,139-Speed 18619.38 samples/sec Loss 7.8106 LearningRate 0.1731 Epoch: 8 Global Step: 42290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:13:45,611-Speed 18324.69 samples/sec Loss 7.8198 LearningRate 0.1731 Epoch: 8 Global Step: 42300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:13:50,047-Speed 18472.32 samples/sec Loss 7.7916 LearningRate 0.1730 Epoch: 8 Global Step: 42310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:13:54,445-Speed 18628.47 samples/sec Loss 7.8052 LearningRate 0.1730 Epoch: 8 Global Step: 42320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:13:58,847-Speed 18613.85 samples/sec Loss 7.7887 LearningRate 0.1729 Epoch: 8 Global Step: 42330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:14:03,264-Speed 18554.19 samples/sec Loss 7.8151 LearningRate 0.1729 Epoch: 8 Global Step: 42340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:14:07,741-Speed 18299.67 samples/sec Loss 7.7644 LearningRate 0.1728 Epoch: 8 Global Step: 42350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:14:12,158-Speed 18556.70 samples/sec Loss 7.8196 LearningRate 0.1727 Epoch: 8 Global Step: 42360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:14:16,568-Speed 18579.94 samples/sec Loss 7.8112 LearningRate 0.1727 Epoch: 8 Global Step: 42370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:14:20,965-Speed 18635.31 samples/sec Loss 7.8045 LearningRate 0.1726 Epoch: 8 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:25,384-Speed 18546.56 samples/sec Loss 7.7705 LearningRate 0.1726 Epoch: 8 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:29,802-Speed 18545.53 samples/sec Loss 7.8208 LearningRate 0.1725 Epoch: 8 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:34,274-Speed 18322.00 samples/sec Loss 7.7974 LearningRate 0.1725 Epoch: 8 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:38,707-Speed 18486.46 samples/sec Loss 7.7620 LearningRate 0.1724 Epoch: 8 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:43,106-Speed 18630.11 samples/sec Loss 7.8109 LearningRate 0.1723 Epoch: 8 Global Step: 42430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:47,528-Speed 18530.90 samples/sec Loss 7.7748 LearningRate 0.1723 Epoch: 8 Global Step: 42440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:51,936-Speed 18587.74 samples/sec Loss 7.7999 LearningRate 0.1722 Epoch: 8 Global Step: 42450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:14:56,326-Speed 18666.92 samples/sec Loss 7.7770 LearningRate 0.1722 Epoch: 8 Global Step: 42460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:15:00,749-Speed 18528.82 samples/sec Loss 7.7525 LearningRate 0.1721 Epoch: 8 Global Step: 42470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:15:05,131-Speed 18703.08 samples/sec Loss 7.7891 LearningRate 0.1721 Epoch: 8 Global Step: 42480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:15:09,557-Speed 18510.31 samples/sec Loss 7.7750 LearningRate 0.1720 Epoch: 8 Global Step: 42490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:13,981-Speed 18525.40 samples/sec Loss 7.8190 LearningRate 0.1720 Epoch: 8 Global Step: 42500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:18,392-Speed 18572.13 samples/sec Loss 7.7641 LearningRate 0.1719 Epoch: 8 Global Step: 42510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:22,862-Speed 18332.28 samples/sec Loss 7.8039 LearningRate 0.1718 Epoch: 8 Global Step: 42520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:27,285-Speed 18529.96 samples/sec Loss 7.7554 LearningRate 0.1718 Epoch: 8 Global Step: 42530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:31,702-Speed 18553.04 samples/sec Loss 7.7665 LearningRate 0.1717 Epoch: 8 Global Step: 42540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:36,109-Speed 18593.33 samples/sec Loss 7.7649 LearningRate 0.1717 Epoch: 8 Global Step: 42550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:40,514-Speed 18601.05 samples/sec Loss 7.7461 LearningRate 0.1716 Epoch: 8 Global Step: 42560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:44,977-Speed 18358.77 samples/sec Loss 7.7846 LearningRate 0.1716 Epoch: 8 Global Step: 42570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:49,428-Speed 18409.51 samples/sec Loss 7.7735 LearningRate 0.1715 Epoch: 8 Global Step: 42580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:15:53,843-Speed 18559.06 samples/sec Loss 7.7647 LearningRate 0.1714 Epoch: 8 Global Step: 42590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:15:58,265-Speed 18531.15 samples/sec Loss 7.7808 LearningRate 0.1714 Epoch: 8 Global Step: 42600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:16:02,688-Speed 18526.75 samples/sec Loss 7.7547 LearningRate 0.1713 Epoch: 8 Global Step: 42610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:16:07,088-Speed 18624.32 samples/sec Loss 7.8252 LearningRate 0.1713 Epoch: 8 Global Step: 42620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:11,557-Speed 18336.46 samples/sec Loss 7.7900 LearningRate 0.1712 Epoch: 8 Global Step: 42630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:15,988-Speed 18492.56 samples/sec Loss 7.8208 LearningRate 0.1712 Epoch: 8 Global Step: 42640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:20,422-Speed 18484.54 samples/sec Loss 7.8250 LearningRate 0.1711 Epoch: 8 Global Step: 42650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:24,856-Speed 18487.16 samples/sec Loss 7.7602 LearningRate 0.1711 Epoch: 8 Global Step: 42660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:29,250-Speed 18646.05 samples/sec Loss 7.7898 LearningRate 0.1710 Epoch: 8 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:33,707-Speed 18390.70 samples/sec Loss 7.7453 LearningRate 0.1709 Epoch: 8 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:38,105-Speed 18630.41 samples/sec Loss 7.7714 LearningRate 0.1709 Epoch: 8 Global Step: 42690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:42,596-Speed 18249.96 samples/sec Loss 7.7449 LearningRate 0.1708 Epoch: 8 Global Step: 42700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:47,006-Speed 18577.94 samples/sec Loss 7.7526 LearningRate 0.1708 Epoch: 8 Global Step: 42710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:51,408-Speed 18613.08 samples/sec Loss 7.8046 LearningRate 0.1707 Epoch: 8 Global Step: 42720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:16:55,802-Speed 18650.36 samples/sec Loss 7.7863 LearningRate 0.1707 Epoch: 8 Global Step: 42730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:00,249-Speed 18424.46 samples/sec Loss 7.7388 LearningRate 0.1706 Epoch: 8 Global Step: 42740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:04,657-Speed 18592.97 samples/sec Loss 7.7440 LearningRate 0.1705 Epoch: 8 Global Step: 42750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:09,075-Speed 18543.56 samples/sec Loss 7.7867 LearningRate 0.1705 Epoch: 8 Global Step: 42760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:13,485-Speed 18584.14 samples/sec Loss 7.7703 LearningRate 0.1704 Epoch: 8 Global Step: 42770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:17,980-Speed 18231.05 samples/sec Loss 7.7777 LearningRate 0.1704 Epoch: 8 Global Step: 42780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:22,423-Speed 18445.23 samples/sec Loss 7.7546 LearningRate 0.1703 Epoch: 8 Global Step: 42790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:26,833-Speed 18583.76 samples/sec Loss 7.7153 LearningRate 0.1703 Epoch: 8 Global Step: 42800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:31,245-Speed 18569.98 samples/sec Loss 7.7618 LearningRate 0.1702 Epoch: 8 Global Step: 42810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:17:35,646-Speed 18620.12 samples/sec Loss 7.7264 LearningRate 0.1702 Epoch: 8 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:17:40,063-Speed 18552.83 samples/sec Loss 7.7213 LearningRate 0.1701 Epoch: 8 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:17:44,507-Speed 18438.78 samples/sec Loss 7.7398 LearningRate 0.1700 Epoch: 8 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:17:48,934-Speed 18510.37 samples/sec Loss 7.7720 LearningRate 0.1700 Epoch: 8 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:17:53,358-Speed 18519.94 samples/sec Loss 7.7130 LearningRate 0.1699 Epoch: 8 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:17:57,779-Speed 18536.86 samples/sec Loss 7.7827 LearningRate 0.1699 Epoch: 8 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:02,204-Speed 18513.34 samples/sec Loss 7.7686 LearningRate 0.1698 Epoch: 8 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:06,627-Speed 18528.71 samples/sec Loss 7.7928 LearningRate 0.1698 Epoch: 8 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:11,058-Speed 18489.68 samples/sec Loss 7.7268 LearningRate 0.1697 Epoch: 8 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:15,513-Speed 18394.20 samples/sec Loss 7.7343 LearningRate 0.1697 Epoch: 8 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:20,015-Speed 18199.50 samples/sec Loss 7.7886 LearningRate 0.1696 Epoch: 8 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:18:24,454-Speed 18460.78 samples/sec Loss 7.8135 LearningRate 0.1695 Epoch: 8 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:28,887-Speed 18484.46 samples/sec Loss 7.7517 LearningRate 0.1695 Epoch: 8 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:33,293-Speed 18598.59 samples/sec Loss 7.7629 LearningRate 0.1694 Epoch: 8 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:37,702-Speed 18586.91 samples/sec Loss 7.7331 LearningRate 0.1694 Epoch: 8 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:42,096-Speed 18647.87 samples/sec Loss 7.7853 LearningRate 0.1693 Epoch: 8 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:46,515-Speed 18541.40 samples/sec Loss 7.7541 LearningRate 0.1693 Epoch: 8 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:50,955-Speed 18454.88 samples/sec Loss 7.7582 LearningRate 0.1692 Epoch: 8 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:55,387-Speed 18489.53 samples/sec Loss 7.7918 LearningRate 0.1692 Epoch: 8 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:18:59,798-Speed 18576.57 samples/sec Loss 7.7259 LearningRate 0.1691 Epoch: 8 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:04,192-Speed 18651.61 samples/sec Loss 7.7784 LearningRate 0.1690 Epoch: 8 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:08,647-Speed 18394.17 samples/sec Loss 7.7491 LearningRate 0.1690 Epoch: 8 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:19:13,123-Speed 18307.29 samples/sec Loss 7.7102 LearningRate 0.1689 Epoch: 8 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:17,567-Speed 18438.81 samples/sec Loss 7.7461 LearningRate 0.1689 Epoch: 8 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:22,015-Speed 18417.58 samples/sec Loss 7.7187 LearningRate 0.1688 Epoch: 8 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:26,406-Speed 18662.37 samples/sec Loss 7.7503 LearningRate 0.1688 Epoch: 8 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:30,849-Speed 18443.83 samples/sec Loss 7.7733 LearningRate 0.1687 Epoch: 8 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:35,276-Speed 18509.40 samples/sec Loss 7.6795 LearningRate 0.1687 Epoch: 8 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:39,669-Speed 18657.46 samples/sec Loss 7.7410 LearningRate 0.1686 Epoch: 8 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:44,082-Speed 18578.16 samples/sec Loss 7.7663 LearningRate 0.1685 Epoch: 8 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:48,481-Speed 18628.27 samples/sec Loss 7.6868 LearningRate 0.1685 Epoch: 8 Global Step: 43120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:52,890-Speed 18583.43 samples/sec Loss 7.7207 LearningRate 0.1684 Epoch: 8 Global Step: 43130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:19:57,307-Speed 18553.23 samples/sec Loss 7.6992 LearningRate 0.1684 Epoch: 8 Global Step: 43140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:01,723-Speed 18556.49 samples/sec Loss 7.7195 LearningRate 0.1683 Epoch: 8 Global Step: 43150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:06,149-Speed 18514.97 samples/sec Loss 7.6837 LearningRate 0.1683 Epoch: 8 Global Step: 43160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:10,541-Speed 18653.83 samples/sec Loss 7.7419 LearningRate 0.1682 Epoch: 8 Global Step: 43170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:14,933-Speed 18658.70 samples/sec Loss 7.7210 LearningRate 0.1681 Epoch: 8 Global Step: 43180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:19,333-Speed 18621.65 samples/sec Loss 7.7141 LearningRate 0.1681 Epoch: 8 Global Step: 43190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:23,756-Speed 18525.65 samples/sec Loss 7.7299 LearningRate 0.1680 Epoch: 8 Global Step: 43200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:28,167-Speed 18578.15 samples/sec Loss 7.7259 LearningRate 0.1680 Epoch: 8 Global Step: 43210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:32,591-Speed 18523.91 samples/sec Loss 7.7387 LearningRate 0.1679 Epoch: 8 Global Step: 43220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:37,065-Speed 18313.80 samples/sec Loss 7.7529 LearningRate 0.1679 Epoch: 8 Global Step: 43230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-14 03:20:41,455-Speed 18664.73 samples/sec Loss 7.7297 LearningRate 0.1678 Epoch: 8 Global Step: 43240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:20:45,852-Speed 18639.41 samples/sec Loss 7.7522 LearningRate 0.1678 Epoch: 8 Global Step: 43250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:20:50,266-Speed 18562.75 samples/sec Loss 7.7202 LearningRate 0.1677 Epoch: 8 Global Step: 43260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:20:54,677-Speed 18575.65 samples/sec Loss 7.7337 LearningRate 0.1676 Epoch: 8 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:20:59,112-Speed 18477.04 samples/sec Loss 7.7706 LearningRate 0.1676 Epoch: 8 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:03,556-Speed 18441.94 samples/sec Loss 7.7009 LearningRate 0.1675 Epoch: 8 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:08,037-Speed 18285.50 samples/sec Loss 7.7499 LearningRate 0.1675 Epoch: 8 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:12,441-Speed 18608.32 samples/sec Loss 7.7128 LearningRate 0.1674 Epoch: 8 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:16,885-Speed 18436.45 samples/sec Loss 7.7356 LearningRate 0.1674 Epoch: 8 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:21,267-Speed 18701.46 samples/sec Loss 7.7234 LearningRate 0.1673 Epoch: 8 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:25,654-Speed 18675.27 samples/sec Loss 7.7627 LearningRate 0.1673 Epoch: 8 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-14 03:21:30,053-Speed 18627.59 samples/sec Loss 7.7305 LearningRate 0.1672 Epoch: 8 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:34,424-Speed 18747.45 samples/sec Loss 7.7067 LearningRate 0.1672 Epoch: 8 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:38,848-Speed 18520.65 samples/sec Loss 7.6887 LearningRate 0.1671 Epoch: 8 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:43,346-Speed 18218.99 samples/sec Loss 7.7442 LearningRate 0.1670 Epoch: 8 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:47,789-Speed 18444.04 samples/sec Loss 7.7393 LearningRate 0.1670 Epoch: 8 Global Step: 43390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:52,210-Speed 18536.07 samples/sec Loss 7.6964 LearningRate 0.1669 Epoch: 8 Global Step: 43400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:21:56,659-Speed 18421.06 samples/sec Loss 7.6896 LearningRate 0.1669 Epoch: 8 Global Step: 43410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:22:01,062-Speed 18610.31 samples/sec Loss 7.7270 LearningRate 0.1668 Epoch: 8 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:22:05,477-Speed 18563.09 samples/sec Loss 7.6766 LearningRate 0.1668 Epoch: 8 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:22:09,881-Speed 18602.87 samples/sec Loss 7.7108 LearningRate 0.1667 Epoch: 8 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-14 03:22:14,276-Speed 18646.08 samples/sec Loss 7.7136 LearningRate 0.1667 Epoch: 8 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:22:18,673-Speed 18635.53 samples/sec Loss 7.6895 LearningRate 0.1666 Epoch: 8 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:22:23,139-Speed 18354.17 samples/sec Loss 7.6986 LearningRate 0.1665 Epoch: 8 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:22:27,534-Speed 18649.06 samples/sec Loss 7.7287 LearningRate 0.1665 Epoch: 8 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:22:31,985-Speed 18410.75 samples/sec Loss 7.6653 LearningRate 0.1664 Epoch: 8 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:22:36,405-Speed 18541.99 samples/sec Loss 7.7085 LearningRate 0.1664 Epoch: 8 Global Step: 43500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:22:40,844-Speed 18463.63 samples/sec Loss 7.7449 LearningRate 0.1663 Epoch: 8 Global Step: 43510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:22:45,248-Speed 18606.97 samples/sec Loss 7.7437 LearningRate 0.1663 Epoch: 8 Global Step: 43520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:22:49,675-Speed 18508.78 samples/sec Loss 7.6734 LearningRate 0.1662 Epoch: 8 Global Step: 43530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:22:54,114-Speed 18459.26 samples/sec Loss 7.7235 LearningRate 0.1662 Epoch: 8 Global Step: 43540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:22:58,525-Speed 18579.14 samples/sec Loss 7.6673 LearningRate 0.1661 Epoch: 8 Global Step: 43550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:23:02,928-Speed 18613.12 samples/sec Loss 7.6901 LearningRate 0.1660 Epoch: 8 Global Step: 43560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:23:07,336-Speed 18596.55 samples/sec Loss 7.6523 LearningRate 0.1660 Epoch: 8 Global Step: 43570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:23:11,755-Speed 18546.21 samples/sec Loss 7.6964 LearningRate 0.1659 Epoch: 8 Global Step: 43580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:23:16,171-Speed 18554.11 samples/sec Loss 7.6988 LearningRate 0.1659 Epoch: 8 Global Step: 43590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:23:20,548-Speed 18727.46 samples/sec Loss 7.6453 LearningRate 0.1658 Epoch: 8 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:24,988-Speed 18456.95 samples/sec Loss 7.7408 LearningRate 0.1658 Epoch: 8 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:29,533-Speed 18030.48 samples/sec Loss 7.6776 LearningRate 0.1657 Epoch: 8 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:33,982-Speed 18416.02 samples/sec Loss 7.6412 LearningRate 0.1657 Epoch: 8 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:38,397-Speed 18560.59 samples/sec Loss 7.7192 LearningRate 0.1656 Epoch: 8 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:42,790-Speed 18653.56 samples/sec Loss 7.6854 LearningRate 0.1655 Epoch: 8 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:47,205-Speed 18557.82 samples/sec Loss 7.6868 LearningRate 0.1655 Epoch: 8 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:51,605-Speed 18619.68 samples/sec Loss 7.7496 LearningRate 0.1654 Epoch: 8 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:23:56,002-Speed 18637.98 samples/sec Loss 7.6670 LearningRate 0.1654 Epoch: 8 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:24:00,427-Speed 18517.37 samples/sec Loss 7.7127 LearningRate 0.1653 Epoch: 8 Global Step: 43690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:04,869-Speed 18448.58 samples/sec Loss 7.6780 LearningRate 0.1653 Epoch: 8 Global Step: 43700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:09,305-Speed 18473.15 samples/sec Loss 7.6719 LearningRate 0.1652 Epoch: 8 Global Step: 43710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:13,775-Speed 18329.98 samples/sec Loss 7.6863 LearningRate 0.1652 Epoch: 8 Global Step: 43720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:18,204-Speed 18497.77 samples/sec Loss 7.6969 LearningRate 0.1651 Epoch: 8 Global Step: 43730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:22,618-Speed 18566.36 samples/sec Loss 7.6966 LearningRate 0.1651 Epoch: 8 Global Step: 43740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:27,063-Speed 18433.43 samples/sec Loss 7.6967 LearningRate 0.1650 Epoch: 8 Global Step: 43750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:31,537-Speed 18314.24 samples/sec Loss 7.7337 LearningRate 0.1649 Epoch: 8 Global Step: 43760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:41,768-Speed 8008.57 samples/sec Loss 7.7323 LearningRate 0.1649 Epoch: 8 Global Step: 43770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:46,149-Speed 18706.09 samples/sec Loss 7.6866 LearningRate 0.1648 Epoch: 8 Global Step: 43780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:24:50,541-Speed 18655.19 samples/sec Loss 7.6442 LearningRate 0.1648 Epoch: 8 Global Step: 43790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:24:54,947-Speed 18598.20 samples/sec Loss 7.7190 LearningRate 0.1647 Epoch: 8 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:24:59,342-Speed 18644.54 samples/sec Loss 7.6820 LearningRate 0.1647 Epoch: 8 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:03,797-Speed 18393.81 samples/sec Loss 7.6575 LearningRate 0.1646 Epoch: 8 Global Step: 43820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:08,183-Speed 18682.32 samples/sec Loss 7.7109 LearningRate 0.1646 Epoch: 8 Global Step: 43830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:12,594-Speed 18578.24 samples/sec Loss 7.6980 LearningRate 0.1645 Epoch: 8 Global Step: 43840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:16,997-Speed 18610.71 samples/sec Loss 7.6395 LearningRate 0.1644 Epoch: 8 Global Step: 43850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:21,471-Speed 18314.01 samples/sec Loss 7.6984 LearningRate 0.1644 Epoch: 8 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:25,880-Speed 18587.98 samples/sec Loss 7.6578 LearningRate 0.1643 Epoch: 8 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:30,301-Speed 18533.60 samples/sec Loss 7.6557 LearningRate 0.1643 Epoch: 8 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:34,767-Speed 18347.41 samples/sec Loss 7.6679 LearningRate 0.1642 Epoch: 8 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:25:39,222-Speed 18391.29 samples/sec Loss 7.7040 LearningRate 0.1642 Epoch: 8 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:25:43,629-Speed 18594.60 samples/sec Loss 7.6864 LearningRate 0.1641 Epoch: 8 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:48,082-Speed 18424.56 samples/sec Loss 7.7150 LearningRate 0.1641 Epoch: 8 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:52,499-Speed 18553.61 samples/sec Loss 7.6708 LearningRate 0.1640 Epoch: 8 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:25:56,902-Speed 18613.06 samples/sec Loss 7.6734 LearningRate 0.1640 Epoch: 8 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:26:01,282-Speed 18707.19 samples/sec Loss 7.6501 LearningRate 0.1639 Epoch: 8 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:26:05,678-Speed 18638.34 samples/sec Loss 7.6522 LearningRate 0.1638 Epoch: 8 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:26:10,065-Speed 18678.68 samples/sec Loss 7.6514 LearningRate 0.1638 Epoch: 8 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:26:14,508-Speed 18441.12 samples/sec Loss 7.6841 LearningRate 0.1637 Epoch: 8 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:26:18,916-Speed 18591.03 samples/sec Loss 7.6835 LearningRate 0.1637 Epoch: 8 Global Step: 43990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:23,318-Speed 18614.68 samples/sec Loss 7.6687 LearningRate 0.1636 Epoch: 8 Global Step: 44000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:27,731-Speed 18570.35 samples/sec Loss 7.6439 LearningRate 0.1636 Epoch: 8 Global Step: 44010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:32,119-Speed 18674.65 samples/sec Loss 7.7084 LearningRate 0.1635 Epoch: 8 Global Step: 44020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:36,499-Speed 18705.94 samples/sec Loss 7.7040 LearningRate 0.1635 Epoch: 8 Global Step: 44030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:40,899-Speed 18622.61 samples/sec Loss 7.6615 LearningRate 0.1634 Epoch: 8 Global Step: 44040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:45,284-Speed 18688.23 samples/sec Loss 7.6823 LearningRate 0.1633 Epoch: 8 Global Step: 44050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:49,659-Speed 18733.26 samples/sec Loss 7.7071 LearningRate 0.1633 Epoch: 8 Global Step: 44060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:54,048-Speed 18665.95 samples/sec Loss 7.6776 LearningRate 0.1632 Epoch: 8 Global Step: 44070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:26:58,452-Speed 18612.09 samples/sec Loss 7.6661 LearningRate 0.1632 Epoch: 8 Global Step: 44080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:27:02,874-Speed 18529.57 samples/sec Loss 7.6785 LearningRate 0.1631 Epoch: 8 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:07,338-Speed 18358.44 samples/sec Loss 7.7006 LearningRate 0.1631 Epoch: 8 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:11,724-Speed 18678.97 samples/sec Loss 7.6708 LearningRate 0.1630 Epoch: 8 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:16,109-Speed 18690.50 samples/sec Loss 7.6417 LearningRate 0.1630 Epoch: 8 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:20,537-Speed 18505.34 samples/sec Loss 7.6641 LearningRate 0.1629 Epoch: 8 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:24,926-Speed 18668.06 samples/sec Loss 7.6586 LearningRate 0.1629 Epoch: 8 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:29,322-Speed 18642.49 samples/sec Loss 7.6360 LearningRate 0.1628 Epoch: 8 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:33,728-Speed 18598.92 samples/sec Loss 7.6882 LearningRate 0.1627 Epoch: 8 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:38,177-Speed 18416.97 samples/sec Loss 7.6458 LearningRate 0.1627 Epoch: 8 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:42,600-Speed 18528.58 samples/sec Loss 7.6446 LearningRate 0.1626 Epoch: 8 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:47,023-Speed 18528.38 samples/sec Loss 7.6483 LearningRate 0.1626 Epoch: 8 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:51,451-Speed 18505.19 samples/sec Loss 7.6274 LearningRate 0.1625 Epoch: 8 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:27:55,959-Speed 18178.67 samples/sec Loss 7.6467 LearningRate 0.1625 Epoch: 8 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:00,391-Speed 18486.46 samples/sec Loss 7.6559 LearningRate 0.1624 Epoch: 8 Global Step: 44220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:04,824-Speed 18484.79 samples/sec Loss 7.6987 LearningRate 0.1624 Epoch: 8 Global Step: 44230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:09,248-Speed 18521.62 samples/sec Loss 7.6593 LearningRate 0.1623 Epoch: 8 Global Step: 44240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:13,650-Speed 18618.09 samples/sec Loss 7.6352 LearningRate 0.1623 Epoch: 8 Global Step: 44250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:18,051-Speed 18616.26 samples/sec Loss 7.6948 LearningRate 0.1622 Epoch: 8 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:22,546-Speed 18229.13 samples/sec Loss 7.6478 LearningRate 0.1621 Epoch: 8 Global Step: 44270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:26,954-Speed 18590.70 samples/sec Loss 7.6694 LearningRate 0.1621 Epoch: 8 Global Step: 44280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:31,368-Speed 18564.81 samples/sec Loss 7.6255 LearningRate 0.1620 Epoch: 8 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:28:35,801-Speed 18485.01 samples/sec Loss 7.6592 LearningRate 0.1620 Epoch: 8 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:28:40,228-Speed 18511.70 samples/sec Loss 7.6683 LearningRate 0.1619 Epoch: 8 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:28:44,711-Speed 18275.34 samples/sec Loss 7.6743 LearningRate 0.1619 Epoch: 8 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:28:49,156-Speed 18435.23 samples/sec Loss 7.6936 LearningRate 0.1618 Epoch: 8 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:28:53,576-Speed 18540.46 samples/sec Loss 7.6444 LearningRate 0.1618 Epoch: 8 Global Step: 44340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:28:57,982-Speed 18595.17 samples/sec Loss 7.6534 LearningRate 0.1617 Epoch: 8 Global Step: 44350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:02,373-Speed 18663.31 samples/sec Loss 7.6812 LearningRate 0.1617 Epoch: 8 Global Step: 44360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:06,818-Speed 18432.71 samples/sec Loss 7.6021 LearningRate 0.1616 Epoch: 8 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:11,277-Speed 18375.51 samples/sec Loss 7.6688 LearningRate 0.1615 Epoch: 8 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:15,734-Speed 18389.37 samples/sec Loss 7.6732 LearningRate 0.1615 Epoch: 8 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:20,216-Speed 18281.10 samples/sec Loss 7.6393 LearningRate 0.1614 Epoch: 8 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:24,687-Speed 18324.08 samples/sec Loss 7.6672 LearningRate 0.1614 Epoch: 8 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:29,123-Speed 18475.81 samples/sec Loss 7.6791 LearningRate 0.1613 Epoch: 8 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:33,524-Speed 18614.95 samples/sec Loss 7.6154 LearningRate 0.1613 Epoch: 8 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:37,947-Speed 18527.96 samples/sec Loss 7.6136 LearningRate 0.1612 Epoch: 8 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:42,384-Speed 18469.13 samples/sec Loss 7.6976 LearningRate 0.1612 Epoch: 8 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:46,791-Speed 18594.07 samples/sec Loss 7.6451 LearningRate 0.1611 Epoch: 8 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:51,321-Speed 18086.93 samples/sec Loss 7.6705 LearningRate 0.1611 Epoch: 8 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:29:55,736-Speed 18560.28 samples/sec Loss 7.6778 LearningRate 0.1610 Epoch: 8 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:00,183-Speed 18428.18 samples/sec Loss 7.6173 LearningRate 0.1609 Epoch: 8 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:04,585-Speed 18614.55 samples/sec Loss 7.6231 LearningRate 0.1609 Epoch: 8 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:09,036-Speed 18408.07 samples/sec Loss 7.6530 LearningRate 0.1608 Epoch: 8 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:13,538-Speed 18204.86 samples/sec Loss 7.6223 LearningRate 0.1608 Epoch: 8 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:17,958-Speed 18533.13 samples/sec Loss 7.6432 LearningRate 0.1607 Epoch: 8 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:22,359-Speed 18617.40 samples/sec Loss 7.6298 LearningRate 0.1607 Epoch: 8 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:26,781-Speed 18530.09 samples/sec Loss 7.6437 LearningRate 0.1606 Epoch: 8 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:31,193-Speed 18572.28 samples/sec Loss 7.6110 LearningRate 0.1606 Epoch: 8 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:35,627-Speed 18483.42 samples/sec Loss 7.6294 LearningRate 0.1605 Epoch: 8 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:40,018-Speed 18660.40 samples/sec Loss 7.6110 LearningRate 0.1605 Epoch: 8 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:44,457-Speed 18459.99 samples/sec Loss 7.6717 LearningRate 0.1604 Epoch: 8 Global Step: 44590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:30:48,889-Speed 18488.89 samples/sec Loss 7.6341 LearningRate 0.1603 Epoch: 8 Global Step: 44600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:30:53,278-Speed 18669.37 samples/sec Loss 7.6229 LearningRate 0.1603 Epoch: 8 Global Step: 44610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:30:57,675-Speed 18636.78 samples/sec Loss 7.6309 LearningRate 0.1602 Epoch: 8 Global Step: 44620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:02,103-Speed 18508.78 samples/sec Loss 7.6219 LearningRate 0.1602 Epoch: 8 Global Step: 44630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:06,519-Speed 18556.61 samples/sec Loss 7.6513 LearningRate 0.1601 Epoch: 8 Global Step: 44640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:10,930-Speed 18574.19 samples/sec Loss 7.5936 LearningRate 0.1601 Epoch: 8 Global Step: 44650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:15,343-Speed 18570.85 samples/sec Loss 7.6308 LearningRate 0.1600 Epoch: 8 Global Step: 44660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:19,770-Speed 18511.74 samples/sec Loss 7.5949 LearningRate 0.1600 Epoch: 8 Global Step: 44670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:24,158-Speed 18670.57 samples/sec Loss 7.5835 LearningRate 0.1599 Epoch: 8 Global Step: 44680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:28,561-Speed 18614.49 samples/sec Loss 7.6640 LearningRate 0.1599 Epoch: 8 Global Step: 44690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:31:32,962-Speed 18616.78 samples/sec Loss 7.6376 LearningRate 0.1598 Epoch: 8 Global Step: 44700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:31:37,398-Speed 18474.74 samples/sec Loss 7.5997 LearningRate 0.1598 Epoch: 8 Global Step: 44710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:31:41,793-Speed 18643.01 samples/sec Loss 7.6127 LearningRate 0.1597 Epoch: 8 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:31:46,193-Speed 18624.85 samples/sec Loss 7.5842 LearningRate 0.1596 Epoch: 8 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:31:50,602-Speed 18582.59 samples/sec Loss 7.6023 LearningRate 0.1596 Epoch: 8 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:31:55,010-Speed 18592.16 samples/sec Loss 7.5915 LearningRate 0.1595 Epoch: 8 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:31:59,407-Speed 18637.58 samples/sec Loss 7.6397 LearningRate 0.1595 Epoch: 8 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:03,841-Speed 18480.92 samples/sec Loss 7.5881 LearningRate 0.1594 Epoch: 8 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:08,283-Speed 18445.41 samples/sec Loss 7.5932 LearningRate 0.1594 Epoch: 8 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:12,704-Speed 18535.40 samples/sec Loss 7.5839 LearningRate 0.1593 Epoch: 8 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:17,098-Speed 18653.12 samples/sec Loss 7.5940 LearningRate 0.1593 Epoch: 8 Global Step: 44800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:32:21,525-Speed 18507.17 samples/sec Loss 7.6194 LearningRate 0.1592 Epoch: 8 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:25,992-Speed 18344.97 samples/sec Loss 7.6396 LearningRate 0.1592 Epoch: 8 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:30,443-Speed 18410.15 samples/sec Loss 7.6006 LearningRate 0.1591 Epoch: 8 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:34,855-Speed 18582.28 samples/sec Loss 7.5979 LearningRate 0.1590 Epoch: 8 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:39,324-Speed 18337.52 samples/sec Loss 7.5975 LearningRate 0.1590 Epoch: 8 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:43,706-Speed 18702.68 samples/sec Loss 7.6194 LearningRate 0.1589 Epoch: 8 Global Step: 44860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:48,147-Speed 18448.26 samples/sec Loss 7.5520 LearningRate 0.1589 Epoch: 8 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:52,565-Speed 18549.87 samples/sec Loss 7.6240 LearningRate 0.1588 Epoch: 8 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:32:56,988-Speed 18531.02 samples/sec Loss 7.6142 LearningRate 0.1588 Epoch: 8 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:01,403-Speed 18561.70 samples/sec Loss 7.6923 LearningRate 0.1587 Epoch: 8 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:05,802-Speed 18629.28 samples/sec Loss 7.5940 LearningRate 0.1587 Epoch: 8 Global Step: 44910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:33:10,229-Speed 18510.26 samples/sec Loss 7.6029 LearningRate 0.1586 Epoch: 8 Global Step: 44920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:33:14,654-Speed 18518.37 samples/sec Loss 7.6258 LearningRate 0.1586 Epoch: 8 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:19,050-Speed 18641.58 samples/sec Loss 7.6139 LearningRate 0.1585 Epoch: 8 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:23,523-Speed 18318.28 samples/sec Loss 7.6314 LearningRate 0.1585 Epoch: 8 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:28,066-Speed 18037.76 samples/sec Loss 7.5808 LearningRate 0.1584 Epoch: 8 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:32,481-Speed 18560.49 samples/sec Loss 7.6045 LearningRate 0.1583 Epoch: 8 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:36,904-Speed 18526.47 samples/sec Loss 7.6075 LearningRate 0.1583 Epoch: 8 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:41,321-Speed 18549.71 samples/sec Loss 7.6071 LearningRate 0.1582 Epoch: 8 Global Step: 44990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:45,748-Speed 18513.79 samples/sec Loss 7.6257 LearningRate 0.1582 Epoch: 8 Global Step: 45000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:50,186-Speed 18461.67 samples/sec Loss 7.5901 LearningRate 0.1581 Epoch: 8 Global Step: 45010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:54,626-Speed 18456.44 samples/sec Loss 7.5843 LearningRate 0.1581 Epoch: 8 Global Step: 45020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:33:59,052-Speed 18511.70 samples/sec Loss 7.5969 LearningRate 0.1580 Epoch: 8 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:34:03,450-Speed 18632.47 samples/sec Loss 7.5993 LearningRate 0.1580 Epoch: 8 Global Step: 45040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:07,867-Speed 18561.01 samples/sec Loss 7.6021 LearningRate 0.1579 Epoch: 8 Global Step: 45050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:12,322-Speed 18391.05 samples/sec Loss 7.6718 LearningRate 0.1579 Epoch: 8 Global Step: 45060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:16,770-Speed 18424.65 samples/sec Loss 7.5822 LearningRate 0.1578 Epoch: 8 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:21,246-Speed 18309.42 samples/sec Loss 7.6391 LearningRate 0.1578 Epoch: 8 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:25,690-Speed 18437.63 samples/sec Loss 7.5940 LearningRate 0.1577 Epoch: 8 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:30,114-Speed 18522.04 samples/sec Loss 7.6201 LearningRate 0.1576 Epoch: 8 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:34,534-Speed 18542.57 samples/sec Loss 7.5854 LearningRate 0.1576 Epoch: 8 Global Step: 45110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:38,950-Speed 18554.58 samples/sec Loss 7.6062 LearningRate 0.1575 Epoch: 8 Global Step: 45120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:43,393-Speed 18445.26 samples/sec Loss 7.5801 LearningRate 0.1575 Epoch: 8 Global Step: 45130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:47,823-Speed 18496.05 samples/sec Loss 7.5238 LearningRate 0.1574 Epoch: 8 Global Step: 45140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:52,280-Speed 18383.93 samples/sec Loss 7.6202 LearningRate 0.1574 Epoch: 8 Global Step: 45150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:34:56,706-Speed 18518.26 samples/sec Loss 7.6035 LearningRate 0.1573 Epoch: 8 Global Step: 45160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:01,108-Speed 18616.52 samples/sec Loss 7.5757 LearningRate 0.1573 Epoch: 8 Global Step: 45170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:05,495-Speed 18685.88 samples/sec Loss 7.6379 LearningRate 0.1572 Epoch: 8 Global Step: 45180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:09,909-Speed 18565.03 samples/sec Loss 7.6090 LearningRate 0.1572 Epoch: 8 Global Step: 45190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:14,341-Speed 18489.45 samples/sec Loss 7.6420 LearningRate 0.1571 Epoch: 8 Global Step: 45200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:18,758-Speed 18552.39 samples/sec Loss 7.5844 LearningRate 0.1571 Epoch: 8 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:23,197-Speed 18458.37 samples/sec Loss 7.5858 LearningRate 0.1570 Epoch: 8 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:27,612-Speed 18561.99 samples/sec Loss 7.6137 LearningRate 0.1569 Epoch: 8 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:32,033-Speed 18535.27 samples/sec Loss 7.6029 LearningRate 0.1569 Epoch: 8 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:36,462-Speed 18502.80 samples/sec Loss 7.5975 LearningRate 0.1568 Epoch: 8 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:40,850-Speed 18671.90 samples/sec Loss 7.5122 LearningRate 0.1568 Epoch: 8 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:45,242-Speed 18657.08 samples/sec Loss 7.5518 LearningRate 0.1567 Epoch: 8 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:35:49,645-Speed 18613.73 samples/sec Loss 7.5752 LearningRate 0.1567 Epoch: 8 Global Step: 45280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:35:54,071-Speed 18514.06 samples/sec Loss 7.5659 LearningRate 0.1566 Epoch: 8 Global Step: 45290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:35:58,474-Speed 18616.88 samples/sec Loss 7.5582 LearningRate 0.1566 Epoch: 8 Global Step: 45300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:02,917-Speed 18445.25 samples/sec Loss 7.5694 LearningRate 0.1565 Epoch: 8 Global Step: 45310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:07,334-Speed 18552.17 samples/sec Loss 7.6051 LearningRate 0.1565 Epoch: 8 Global Step: 45320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:11,735-Speed 18616.51 samples/sec Loss 7.5670 LearningRate 0.1564 Epoch: 8 Global Step: 45330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:16,150-Speed 18560.35 samples/sec Loss 7.6101 LearningRate 0.1564 Epoch: 8 Global Step: 45340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:20,554-Speed 18608.34 samples/sec Loss 7.5273 LearningRate 0.1563 Epoch: 8 Global Step: 45350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:24,951-Speed 18632.97 samples/sec Loss 7.5429 LearningRate 0.1562 Epoch: 8 Global Step: 45360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:29,324-Speed 18737.78 samples/sec Loss 7.6481 LearningRate 0.1562 Epoch: 8 Global Step: 45370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:36:33,694-Speed 18750.79 samples/sec Loss 7.5739 LearningRate 0.1561 Epoch: 8 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:36:38,092-Speed 18633.29 samples/sec Loss 7.5884 LearningRate 0.1561 Epoch: 8 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:36:42,534-Speed 18448.30 samples/sec Loss 7.5520 LearningRate 0.1560 Epoch: 8 Global Step: 45400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:36:46,948-Speed 18567.75 samples/sec Loss 7.5574 LearningRate 0.1560 Epoch: 8 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:36:51,364-Speed 18554.45 samples/sec Loss 7.5568 LearningRate 0.1559 Epoch: 8 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:36:55,894-Speed 18086.94 samples/sec Loss 7.5735 LearningRate 0.1559 Epoch: 8 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:00,299-Speed 18603.53 samples/sec Loss 7.5576 LearningRate 0.1558 Epoch: 8 Global Step: 45440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:04,765-Speed 18347.64 samples/sec Loss 7.5868 LearningRate 0.1558 Epoch: 8 Global Step: 45450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:09,186-Speed 18539.06 samples/sec Loss 7.5470 LearningRate 0.1557 Epoch: 8 Global Step: 45460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:13,588-Speed 18614.19 samples/sec Loss 7.5739 LearningRate 0.1557 Epoch: 8 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:18,006-Speed 18545.45 samples/sec Loss 7.5867 LearningRate 0.1556 Epoch: 8 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:22,417-Speed 18574.72 samples/sec Loss 7.5697 LearningRate 0.1556 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:26,819-Speed 18614.71 samples/sec Loss 7.6083 LearningRate 0.1555 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:31,212-Speed 18653.07 samples/sec Loss 7.5878 LearningRate 0.1554 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:35,633-Speed 18536.96 samples/sec Loss 7.5606 LearningRate 0.1554 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:40,072-Speed 18456.60 samples/sec Loss 7.5739 LearningRate 0.1553 Epoch: 8 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:44,513-Speed 18454.87 samples/sec Loss 7.5573 LearningRate 0.1553 Epoch: 8 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:48,998-Speed 18273.42 samples/sec Loss 7.6004 LearningRate 0.1552 Epoch: 8 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:53,456-Speed 18384.63 samples/sec Loss 7.5900 LearningRate 0.1552 Epoch: 8 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:37:57,852-Speed 18646.42 samples/sec Loss 7.5754 LearningRate 0.1551 Epoch: 8 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:38:02,262-Speed 18578.61 samples/sec Loss 7.5299 LearningRate 0.1551 Epoch: 8 Global Step: 45580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:06,736-Speed 18317.44 samples/sec Loss 7.5895 LearningRate 0.1550 Epoch: 8 Global Step: 45590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:11,123-Speed 18680.18 samples/sec Loss 7.5716 LearningRate 0.1550 Epoch: 8 Global Step: 45600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:15,552-Speed 18507.70 samples/sec Loss 7.5470 LearningRate 0.1549 Epoch: 8 Global Step: 45610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:19,949-Speed 18636.85 samples/sec Loss 7.5792 LearningRate 0.1549 Epoch: 8 Global Step: 45620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:24,367-Speed 18549.84 samples/sec Loss 7.5833 LearningRate 0.1548 Epoch: 8 Global Step: 45630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:28,797-Speed 18497.30 samples/sec Loss 7.5611 LearningRate 0.1548 Epoch: 8 Global Step: 45640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:33,211-Speed 18563.63 samples/sec Loss 7.5479 LearningRate 0.1547 Epoch: 8 Global Step: 45650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:37,609-Speed 18637.75 samples/sec Loss 7.5502 LearningRate 0.1546 Epoch: 8 Global Step: 45660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:42,031-Speed 18526.99 samples/sec Loss 7.5145 LearningRate 0.1546 Epoch: 8 Global Step: 45670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:38:46,440-Speed 18586.69 samples/sec Loss 7.4932 LearningRate 0.1545 Epoch: 8 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:38:50,872-Speed 18485.48 samples/sec Loss 7.5720 LearningRate 0.1545 Epoch: 8 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:38:55,314-Speed 18449.06 samples/sec Loss 7.5897 LearningRate 0.1544 Epoch: 8 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:38:59,767-Speed 18401.29 samples/sec Loss 7.5757 LearningRate 0.1544 Epoch: 8 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:04,165-Speed 18631.62 samples/sec Loss 7.5117 LearningRate 0.1543 Epoch: 8 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:08,578-Speed 18570.82 samples/sec Loss 7.5792 LearningRate 0.1543 Epoch: 8 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:12,999-Speed 18532.26 samples/sec Loss 7.5611 LearningRate 0.1542 Epoch: 8 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:17,425-Speed 18514.80 samples/sec Loss 7.5738 LearningRate 0.1542 Epoch: 8 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:21,819-Speed 18649.37 samples/sec Loss 7.5524 LearningRate 0.1541 Epoch: 8 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:26,276-Speed 18384.09 samples/sec Loss 7.4927 LearningRate 0.1541 Epoch: 8 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:30,733-Speed 18387.64 samples/sec Loss 7.5313 LearningRate 0.1540 Epoch: 8 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:35,179-Speed 18432.67 samples/sec Loss 7.4678 LearningRate 0.1540 Epoch: 8 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:39,621-Speed 18447.62 samples/sec Loss 7.5688 LearningRate 0.1539 Epoch: 8 Global Step: 45800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:44,100-Speed 18294.05 samples/sec Loss 7.6099 LearningRate 0.1538 Epoch: 8 Global Step: 45810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:48,543-Speed 18442.81 samples/sec Loss 7.5471 LearningRate 0.1538 Epoch: 8 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:52,983-Speed 18455.08 samples/sec Loss 7.5146 LearningRate 0.1537 Epoch: 8 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:39:57,432-Speed 18418.76 samples/sec Loss 7.5263 LearningRate 0.1537 Epoch: 8 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:40:01,877-Speed 18431.74 samples/sec Loss 7.5385 LearningRate 0.1536 Epoch: 8 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:40:06,290-Speed 18574.25 samples/sec Loss 7.5571 LearningRate 0.1536 Epoch: 8 Global Step: 45860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:40:10,750-Speed 18370.68 samples/sec Loss 7.5560 LearningRate 0.1535 Epoch: 8 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:40:15,179-Speed 18501.24 samples/sec Loss 7.5224 LearningRate 0.1535 Epoch: 8 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:40:19,610-Speed 18491.95 samples/sec Loss 7.5753 LearningRate 0.1534 Epoch: 8 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:40:24,056-Speed 18428.99 samples/sec Loss 7.5482 LearningRate 0.1534 Epoch: 8 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:40:28,507-Speed 18411.96 samples/sec Loss 7.5373 LearningRate 0.1533 Epoch: 8 Global Step: 45910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:40:32,936-Speed 18502.35 samples/sec Loss 7.5271 LearningRate 0.1533 Epoch: 8 Global Step: 45920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:40:37,384-Speed 18423.14 samples/sec Loss 7.4987 LearningRate 0.1532 Epoch: 8 Global Step: 45930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:40:41,822-Speed 18467.91 samples/sec Loss 7.5054 LearningRate 0.1532 Epoch: 8 Global Step: 45940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:40:46,265-Speed 18447.77 samples/sec Loss 7.5492 LearningRate 0.1531 Epoch: 8 Global Step: 45950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:40:50,696-Speed 18491.62 samples/sec Loss 7.5427 LearningRate 0.1531 Epoch: 8 Global Step: 45960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:40:55,150-Speed 18398.20 samples/sec Loss 7.5466 LearningRate 0.1530 Epoch: 8 Global Step: 45970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:40:59,528-Speed 18723.00 samples/sec Loss 7.5670 LearningRate 0.1529 Epoch: 8 Global Step: 45980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:03,962-Speed 18479.56 samples/sec Loss 7.5188 LearningRate 0.1529 Epoch: 8 Global Step: 45990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:08,353-Speed 18664.34 samples/sec Loss 7.5570 LearningRate 0.1528 Epoch: 8 Global Step: 46000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:12,786-Speed 18483.58 samples/sec Loss 7.5177 LearningRate 0.1528 Epoch: 8 Global Step: 46010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:17,200-Speed 18564.84 samples/sec Loss 7.5233 LearningRate 0.1527 Epoch: 8 Global Step: 46020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:21,578-Speed 18717.05 samples/sec Loss 7.5118 LearningRate 0.1527 Epoch: 8 Global Step: 46030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:25,967-Speed 18672.38 samples/sec Loss 7.5233 LearningRate 0.1526 Epoch: 8 Global Step: 46040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:30,398-Speed 18495.53 samples/sec Loss 7.5144 LearningRate 0.1526 Epoch: 8 Global Step: 46050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:34,841-Speed 18439.31 samples/sec Loss 7.5733 LearningRate 0.1525 Epoch: 8 Global Step: 46060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:39,267-Speed 18515.79 samples/sec Loss 7.5052 LearningRate 0.1525 Epoch: 8 Global Step: 46070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 03:41:43,687-Speed 18538.80 samples/sec Loss 7.5292 LearningRate 0.1524 Epoch: 8 Global Step: 46080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:41:48,133-Speed 18432.11 samples/sec Loss 7.5344 LearningRate 0.1524 Epoch: 8 Global Step: 46090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:41:52,539-Speed 18598.97 samples/sec Loss 7.5458 LearningRate 0.1523 Epoch: 8 Global Step: 46100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:41:56,957-Speed 18544.08 samples/sec Loss 7.4786 LearningRate 0.1523 Epoch: 8 Global Step: 46110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:42:01,373-Speed 18560.69 samples/sec Loss 7.5457 LearningRate 0.1522 Epoch: 8 Global Step: 46120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:42:05,803-Speed 18497.73 samples/sec Loss 7.5622 LearningRate 0.1522 Epoch: 8 Global Step: 46130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:42:10,201-Speed 18629.47 samples/sec Loss 7.5559 LearningRate 0.1521 Epoch: 8 Global Step: 46140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:42:14,647-Speed 18432.58 samples/sec Loss 7.5359 LearningRate 0.1520 Epoch: 8 Global Step: 46150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:42:19,058-Speed 18575.88 samples/sec Loss 7.5498 LearningRate 0.1520 Epoch: 8 Global Step: 46160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:42:23,468-Speed 18579.84 samples/sec Loss 7.5151 LearningRate 0.1519 Epoch: 8 Global Step: 46170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:42:27,844-Speed 18725.06 samples/sec Loss 7.5590 LearningRate 0.1519 Epoch: 8 Global Step: 46180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:42:32,276-Speed 18489.22 samples/sec Loss 7.5470 LearningRate 0.1518 Epoch: 8 Global Step: 46190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:42:36,670-Speed 18651.15 samples/sec Loss 7.5205 LearningRate 0.1518 Epoch: 8 Global Step: 46200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:42:41,073-Speed 18611.06 samples/sec Loss 7.5120 LearningRate 0.1517 Epoch: 8 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:42:45,465-Speed 18659.70 samples/sec Loss 7.4957 LearningRate 0.1517 Epoch: 8 Global Step: 46220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:42:49,868-Speed 18611.77 samples/sec Loss 7.5175 LearningRate 0.1516 Epoch: 8 Global Step: 46230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:42:54,315-Speed 18426.39 samples/sec Loss 7.5024 LearningRate 0.1516 Epoch: 8 Global Step: 46240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:42:58,721-Speed 18595.95 samples/sec Loss 7.4756 LearningRate 0.1515 Epoch: 8 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:03,111-Speed 18668.82 samples/sec Loss 7.5056 LearningRate 0.1515 Epoch: 8 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:07,582-Speed 18329.15 samples/sec Loss 7.4963 LearningRate 0.1514 Epoch: 8 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:12,019-Speed 18464.04 samples/sec Loss 7.5201 LearningRate 0.1514 Epoch: 8 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:43:16,431-Speed 18574.48 samples/sec Loss 7.4920 LearningRate 0.1513 Epoch: 8 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:20,844-Speed 18567.25 samples/sec Loss 7.4877 LearningRate 0.1513 Epoch: 8 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:25,247-Speed 18612.85 samples/sec Loss 7.5092 LearningRate 0.1512 Epoch: 8 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:29,729-Speed 18287.87 samples/sec Loss 7.5039 LearningRate 0.1511 Epoch: 8 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:34,163-Speed 18479.05 samples/sec Loss 7.4997 LearningRate 0.1511 Epoch: 8 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:38,583-Speed 18538.05 samples/sec Loss 7.4855 LearningRate 0.1510 Epoch: 8 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:42,995-Speed 18575.25 samples/sec Loss 7.5221 LearningRate 0.1510 Epoch: 8 Global Step: 46350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:47,438-Speed 18443.84 samples/sec Loss 7.4970 LearningRate 0.1509 Epoch: 8 Global Step: 46360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:51,832-Speed 18648.54 samples/sec Loss 7.4915 LearningRate 0.1509 Epoch: 8 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:43:56,229-Speed 18636.39 samples/sec Loss 7.5644 LearningRate 0.1508 Epoch: 8 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:00,656-Speed 18516.83 samples/sec Loss 7.5053 LearningRate 0.1508 Epoch: 8 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:44:05,079-Speed 18525.23 samples/sec Loss 7.5014 LearningRate 0.1507 Epoch: 8 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:44:09,493-Speed 18565.19 samples/sec Loss 7.5117 LearningRate 0.1507 Epoch: 8 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:13,889-Speed 18641.85 samples/sec Loss 7.4527 LearningRate 0.1506 Epoch: 8 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:18,297-Speed 18593.66 samples/sec Loss 7.5342 LearningRate 0.1506 Epoch: 8 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:22,671-Speed 18733.68 samples/sec Loss 7.5025 LearningRate 0.1505 Epoch: 8 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:27,045-Speed 18732.66 samples/sec Loss 7.4976 LearningRate 0.1505 Epoch: 8 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:31,442-Speed 18636.59 samples/sec Loss 7.5044 LearningRate 0.1504 Epoch: 8 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:35,826-Speed 18692.93 samples/sec Loss 7.5009 LearningRate 0.1504 Epoch: 8 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:44:40,206-Speed 18707.40 samples/sec Loss 7.4876 LearningRate 0.1503 Epoch: 8 Global Step: 46480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:44:44,616-Speed 18581.72 samples/sec Loss 7.5009 LearningRate 0.1503 Epoch: 8 Global Step: 46490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:44:49,018-Speed 18609.17 samples/sec Loss 7.5272 LearningRate 0.1502 Epoch: 8 Global Step: 46500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:44:53,410-Speed 18654.46 samples/sec Loss 7.4966 LearningRate 0.1501 Epoch: 8 Global Step: 46510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:44:57,811-Speed 18618.49 samples/sec Loss 7.5134 LearningRate 0.1501 Epoch: 8 Global Step: 46520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:45:02,245-Speed 18483.68 samples/sec Loss 7.5219 LearningRate 0.1500 Epoch: 8 Global Step: 46530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:45:06,646-Speed 18621.39 samples/sec Loss 7.5206 LearningRate 0.1500 Epoch: 8 Global Step: 46540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:45:11,050-Speed 18603.97 samples/sec Loss 7.4691 LearningRate 0.1499 Epoch: 8 Global Step: 46550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:45:15,494-Speed 18441.48 samples/sec Loss 7.4694 LearningRate 0.1499 Epoch: 8 Global Step: 46560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:45:19,907-Speed 18568.41 samples/sec Loss 7.5006 LearningRate 0.1498 Epoch: 8 Global Step: 46570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:45:24,319-Speed 18568.69 samples/sec Loss 7.4648 LearningRate 0.1498 Epoch: 8 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:28,724-Speed 18605.17 samples/sec Loss 7.5181 LearningRate 0.1497 Epoch: 8 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:33,152-Speed 18501.55 samples/sec Loss 7.4821 LearningRate 0.1497 Epoch: 8 Global Step: 46600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:37,580-Speed 18506.44 samples/sec Loss 7.4706 LearningRate 0.1496 Epoch: 8 Global Step: 46610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:41,991-Speed 18581.41 samples/sec Loss 7.5418 LearningRate 0.1496 Epoch: 8 Global Step: 46620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:46,406-Speed 18560.43 samples/sec Loss 7.4842 LearningRate 0.1495 Epoch: 8 Global Step: 46630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:50,867-Speed 18366.78 samples/sec Loss 7.4987 LearningRate 0.1495 Epoch: 8 Global Step: 46640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:55,274-Speed 18595.95 samples/sec Loss 7.4694 LearningRate 0.1494 Epoch: 8 Global Step: 46650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:45:59,700-Speed 18514.37 samples/sec Loss 7.4799 LearningRate 0.1494 Epoch: 8 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:18,319-Speed 4400.26 samples/sec Loss 7.5364 LearningRate 0.1493 Epoch: 9 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:22,739-Speed 18539.05 samples/sec Loss 7.4788 LearningRate 0.1493 Epoch: 9 Global Step: 46680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:27,200-Speed 18368.65 samples/sec Loss 7.4926 LearningRate 0.1492 Epoch: 9 Global Step: 46690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:31,608-Speed 18591.07 samples/sec Loss 7.4311 LearningRate 0.1492 Epoch: 9 Global Step: 46700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:36,018-Speed 18583.14 samples/sec Loss 7.4637 LearningRate 0.1491 Epoch: 9 Global Step: 46710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:40,455-Speed 18467.49 samples/sec Loss 7.4800 LearningRate 0.1490 Epoch: 9 Global Step: 46720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:44,922-Speed 18344.92 samples/sec Loss 7.4611 LearningRate 0.1490 Epoch: 9 Global Step: 46730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:49,401-Speed 18294.90 samples/sec Loss 7.5125 LearningRate 0.1489 Epoch: 9 Global Step: 46740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:53,841-Speed 18457.48 samples/sec Loss 7.4916 LearningRate 0.1489 Epoch: 9 Global Step: 46750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:46:58,251-Speed 18581.39 samples/sec Loss 7.4483 LearningRate 0.1488 Epoch: 9 Global Step: 46760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:02,663-Speed 18571.24 samples/sec Loss 7.4569 LearningRate 0.1488 Epoch: 9 Global Step: 46770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:07,071-Speed 18592.38 samples/sec Loss 7.5108 LearningRate 0.1487 Epoch: 9 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:47:11,470-Speed 18626.56 samples/sec Loss 7.4232 LearningRate 0.1487 Epoch: 9 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:47:15,912-Speed 18448.38 samples/sec Loss 7.4179 LearningRate 0.1486 Epoch: 9 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:47:20,338-Speed 18512.51 samples/sec Loss 7.4197 LearningRate 0.1486 Epoch: 9 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:47:24,764-Speed 18513.71 samples/sec Loss 7.4455 LearningRate 0.1485 Epoch: 9 Global Step: 46820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:29,180-Speed 18557.72 samples/sec Loss 7.4820 LearningRate 0.1485 Epoch: 9 Global Step: 46830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:33,594-Speed 18562.04 samples/sec Loss 7.4381 LearningRate 0.1484 Epoch: 9 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:37,990-Speed 18640.46 samples/sec Loss 7.5044 LearningRate 0.1484 Epoch: 9 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:42,395-Speed 18600.39 samples/sec Loss 7.4593 LearningRate 0.1483 Epoch: 9 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:46,836-Speed 18456.57 samples/sec Loss 7.5158 LearningRate 0.1483 Epoch: 9 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:51,248-Speed 18569.92 samples/sec Loss 7.4621 LearningRate 0.1482 Epoch: 9 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:47:55,665-Speed 18550.58 samples/sec Loss 7.4883 LearningRate 0.1482 Epoch: 9 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:48:00,075-Speed 18581.67 samples/sec Loss 7.4570 LearningRate 0.1481 Epoch: 9 Global Step: 46900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:04,484-Speed 18582.86 samples/sec Loss 7.4542 LearningRate 0.1481 Epoch: 9 Global Step: 46910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:08,885-Speed 18622.78 samples/sec Loss 7.4448 LearningRate 0.1480 Epoch: 9 Global Step: 46920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:13,295-Speed 18576.67 samples/sec Loss 7.5227 LearningRate 0.1480 Epoch: 9 Global Step: 46930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:17,702-Speed 18594.59 samples/sec Loss 7.4973 LearningRate 0.1479 Epoch: 9 Global Step: 46940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:22,114-Speed 18572.43 samples/sec Loss 7.5157 LearningRate 0.1478 Epoch: 9 Global Step: 46950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:26,546-Speed 18489.53 samples/sec Loss 7.4861 LearningRate 0.1478 Epoch: 9 Global Step: 46960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:30,947-Speed 18617.36 samples/sec Loss 7.4438 LearningRate 0.1477 Epoch: 9 Global Step: 46970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:35,350-Speed 18611.57 samples/sec Loss 7.4622 LearningRate 0.1477 Epoch: 9 Global Step: 46980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:39,764-Speed 18565.47 samples/sec Loss 7.4311 LearningRate 0.1476 Epoch: 9 Global Step: 46990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:48:44,136-Speed 18741.19 samples/sec Loss 7.4828 LearningRate 0.1476 Epoch: 9 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:48:48,573-Speed 18465.83 samples/sec Loss 7.4603 LearningRate 0.1475 Epoch: 9 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:48:52,994-Speed 18533.17 samples/sec Loss 7.4071 LearningRate 0.1475 Epoch: 9 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:48:57,404-Speed 18583.66 samples/sec Loss 7.4975 LearningRate 0.1474 Epoch: 9 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:01,824-Speed 18538.29 samples/sec Loss 7.4820 LearningRate 0.1474 Epoch: 9 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:06,228-Speed 18610.60 samples/sec Loss 7.4719 LearningRate 0.1473 Epoch: 9 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:10,644-Speed 18557.44 samples/sec Loss 7.4402 LearningRate 0.1473 Epoch: 9 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:15,027-Speed 18697.32 samples/sec Loss 7.4318 LearningRate 0.1472 Epoch: 9 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:19,478-Speed 18411.24 samples/sec Loss 7.4989 LearningRate 0.1472 Epoch: 9 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:23,945-Speed 18343.84 samples/sec Loss 7.4894 LearningRate 0.1471 Epoch: 9 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:28,350-Speed 18604.15 samples/sec Loss 7.4310 LearningRate 0.1471 Epoch: 9 Global Step: 47100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:49:32,746-Speed 18640.15 samples/sec Loss 7.4491 LearningRate 0.1470 Epoch: 9 Global Step: 47110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:49:37,138-Speed 18658.64 samples/sec Loss 7.4640 LearningRate 0.1470 Epoch: 9 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:49:41,552-Speed 18563.87 samples/sec Loss 7.4256 LearningRate 0.1469 Epoch: 9 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:49:45,946-Speed 18651.30 samples/sec Loss 7.4316 LearningRate 0.1469 Epoch: 9 Global Step: 47140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:49:50,321-Speed 18729.28 samples/sec Loss 7.4995 LearningRate 0.1468 Epoch: 9 Global Step: 47150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:49:54,736-Speed 18560.20 samples/sec Loss 7.4605 LearningRate 0.1468 Epoch: 9 Global Step: 47160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:49:59,171-Speed 18476.14 samples/sec Loss 7.4080 LearningRate 0.1467 Epoch: 9 Global Step: 47170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:03,600-Speed 18501.32 samples/sec Loss 7.4294 LearningRate 0.1466 Epoch: 9 Global Step: 47180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:08,097-Speed 18223.34 samples/sec Loss 7.4150 LearningRate 0.1466 Epoch: 9 Global Step: 47190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:12,510-Speed 18565.36 samples/sec Loss 7.5123 LearningRate 0.1465 Epoch: 9 Global Step: 47200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:16,911-Speed 18624.23 samples/sec Loss 7.4475 LearningRate 0.1465 Epoch: 9 Global Step: 47210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:21,315-Speed 18605.03 samples/sec Loss 7.4095 LearningRate 0.1464 Epoch: 9 Global Step: 47220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:25,769-Speed 18395.20 samples/sec Loss 7.4429 LearningRate 0.1464 Epoch: 9 Global Step: 47230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:30,205-Speed 18471.69 samples/sec Loss 7.4486 LearningRate 0.1463 Epoch: 9 Global Step: 47240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:50:34,614-Speed 18584.66 samples/sec Loss 7.4274 LearningRate 0.1463 Epoch: 9 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:50:39,043-Speed 18502.86 samples/sec Loss 7.4817 LearningRate 0.1462 Epoch: 9 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:50:43,501-Speed 18379.30 samples/sec Loss 7.4708 LearningRate 0.1462 Epoch: 9 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:50:47,931-Speed 18500.91 samples/sec Loss 7.4595 LearningRate 0.1461 Epoch: 9 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:50:52,353-Speed 18528.31 samples/sec Loss 7.4444 LearningRate 0.1461 Epoch: 9 Global Step: 47290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:50:56,832-Speed 18293.77 samples/sec Loss 7.4149 LearningRate 0.1460 Epoch: 9 Global Step: 47300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:51:01,231-Speed 18629.89 samples/sec Loss 7.4556 LearningRate 0.1460 Epoch: 9 Global Step: 47310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:05,668-Speed 18470.02 samples/sec Loss 7.4583 LearningRate 0.1459 Epoch: 9 Global Step: 47320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:10,066-Speed 18628.95 samples/sec Loss 7.4008 LearningRate 0.1459 Epoch: 9 Global Step: 47330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:14,473-Speed 18595.65 samples/sec Loss 7.4358 LearningRate 0.1458 Epoch: 9 Global Step: 47340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:18,916-Speed 18441.26 samples/sec Loss 7.4563 LearningRate 0.1458 Epoch: 9 Global Step: 47350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:23,368-Speed 18405.25 samples/sec Loss 7.4234 LearningRate 0.1457 Epoch: 9 Global Step: 47360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:27,871-Speed 18197.05 samples/sec Loss 7.4317 LearningRate 0.1457 Epoch: 9 Global Step: 47370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:32,263-Speed 18658.84 samples/sec Loss 7.4442 LearningRate 0.1456 Epoch: 9 Global Step: 47380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:36,661-Speed 18629.00 samples/sec Loss 7.4128 LearningRate 0.1456 Epoch: 9 Global Step: 47390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:41,075-Speed 18563.09 samples/sec Loss 7.4582 LearningRate 0.1455 Epoch: 9 Global Step: 47400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:51:45,498-Speed 18529.99 samples/sec Loss 7.4148 LearningRate 0.1455 Epoch: 9 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:51:49,914-Speed 18551.14 samples/sec Loss 7.3955 LearningRate 0.1454 Epoch: 9 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:51:59,418-Speed 8620.88 samples/sec Loss 7.5070 LearningRate 0.1454 Epoch: 9 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:03,862-Speed 18438.84 samples/sec Loss 7.4379 LearningRate 0.1453 Epoch: 9 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:08,261-Speed 18625.87 samples/sec Loss 7.4301 LearningRate 0.1453 Epoch: 9 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:12,701-Speed 18456.61 samples/sec Loss 7.4044 LearningRate 0.1452 Epoch: 9 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:17,136-Speed 18476.39 samples/sec Loss 7.4678 LearningRate 0.1451 Epoch: 9 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:21,534-Speed 18631.33 samples/sec Loss 7.4614 LearningRate 0.1451 Epoch: 9 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:25,948-Speed 18562.44 samples/sec Loss 7.4146 LearningRate 0.1450 Epoch: 9 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:30,398-Speed 18414.97 samples/sec Loss 7.3842 LearningRate 0.1450 Epoch: 9 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:34,803-Speed 18604.39 samples/sec Loss 7.4222 LearningRate 0.1449 Epoch: 9 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:52:39,210-Speed 18592.14 samples/sec Loss 7.4351 LearningRate 0.1449 Epoch: 9 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:52:43,660-Speed 18416.82 samples/sec Loss 7.3426 LearningRate 0.1448 Epoch: 9 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:48,061-Speed 18620.07 samples/sec Loss 7.4335 LearningRate 0.1448 Epoch: 9 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:52,490-Speed 18500.60 samples/sec Loss 7.3826 LearningRate 0.1447 Epoch: 9 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:52:56,914-Speed 18520.22 samples/sec Loss 7.4319 LearningRate 0.1447 Epoch: 9 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:53:01,317-Speed 18610.96 samples/sec Loss 7.4188 LearningRate 0.1446 Epoch: 9 Global Step: 47570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:05,736-Speed 18545.63 samples/sec Loss 7.4326 LearningRate 0.1446 Epoch: 9 Global Step: 47580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:10,203-Speed 18347.17 samples/sec Loss 7.3972 LearningRate 0.1445 Epoch: 9 Global Step: 47590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:14,621-Speed 18551.51 samples/sec Loss 7.4551 LearningRate 0.1445 Epoch: 9 Global Step: 47600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:19,017-Speed 18641.01 samples/sec Loss 7.4243 LearningRate 0.1444 Epoch: 9 Global Step: 47610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:23,469-Speed 18403.15 samples/sec Loss 7.4113 LearningRate 0.1444 Epoch: 9 Global Step: 47620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:27,902-Speed 18485.91 samples/sec Loss 7.3599 LearningRate 0.1443 Epoch: 9 Global Step: 47630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:32,362-Speed 18376.34 samples/sec Loss 7.4334 LearningRate 0.1443 Epoch: 9 Global Step: 47640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:36,795-Speed 18489.06 samples/sec Loss 7.4092 LearningRate 0.1442 Epoch: 9 Global Step: 47650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:41,193-Speed 18628.93 samples/sec Loss 7.4399 LearningRate 0.1442 Epoch: 9 Global Step: 47660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:53:45,579-Speed 18681.71 samples/sec Loss 7.3737 LearningRate 0.1441 Epoch: 9 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:53:50,029-Speed 18412.96 samples/sec Loss 7.4451 LearningRate 0.1441 Epoch: 9 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:53:54,481-Speed 18405.02 samples/sec Loss 7.3858 LearningRate 0.1440 Epoch: 9 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:53:58,904-Speed 18530.04 samples/sec Loss 7.3513 LearningRate 0.1440 Epoch: 9 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:54:03,324-Speed 18537.79 samples/sec Loss 7.4494 LearningRate 0.1439 Epoch: 9 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:54:07,735-Speed 18575.63 samples/sec Loss 7.3863 LearningRate 0.1439 Epoch: 9 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:54:12,120-Speed 18684.66 samples/sec Loss 7.3900 LearningRate 0.1438 Epoch: 9 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:54:16,507-Speed 18680.19 samples/sec Loss 7.4334 LearningRate 0.1438 Epoch: 9 Global Step: 47740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:20,918-Speed 18579.11 samples/sec Loss 7.4338 LearningRate 0.1437 Epoch: 9 Global Step: 47750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:25,332-Speed 18561.38 samples/sec Loss 7.3642 LearningRate 0.1437 Epoch: 9 Global Step: 47760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:29,770-Speed 18469.23 samples/sec Loss 7.4009 LearningRate 0.1436 Epoch: 9 Global Step: 47770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:34,176-Speed 18594.86 samples/sec Loss 7.4290 LearningRate 0.1436 Epoch: 9 Global Step: 47780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:38,608-Speed 18490.62 samples/sec Loss 7.4247 LearningRate 0.1435 Epoch: 9 Global Step: 47790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:43,034-Speed 18512.37 samples/sec Loss 7.4666 LearningRate 0.1434 Epoch: 9 Global Step: 47800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:47,422-Speed 18677.73 samples/sec Loss 7.4200 LearningRate 0.1434 Epoch: 9 Global Step: 47810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:51,861-Speed 18456.78 samples/sec Loss 7.3935 LearningRate 0.1433 Epoch: 9 Global Step: 47820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:54:56,251-Speed 18661.39 samples/sec Loss 7.4168 LearningRate 0.1433 Epoch: 9 Global Step: 47830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:55:00,705-Speed 18400.23 samples/sec Loss 7.3842 LearningRate 0.1432 Epoch: 9 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:05,188-Speed 18277.18 samples/sec Loss 7.4324 LearningRate 0.1432 Epoch: 9 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:09,589-Speed 18620.51 samples/sec Loss 7.4363 LearningRate 0.1431 Epoch: 9 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:13,997-Speed 18589.68 samples/sec Loss 7.3921 LearningRate 0.1431 Epoch: 9 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:18,426-Speed 18501.17 samples/sec Loss 7.3365 LearningRate 0.1430 Epoch: 9 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:22,801-Speed 18727.14 samples/sec Loss 7.4101 LearningRate 0.1430 Epoch: 9 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:27,271-Speed 18331.08 samples/sec Loss 7.4294 LearningRate 0.1429 Epoch: 9 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:31,679-Speed 18593.10 samples/sec Loss 7.4085 LearningRate 0.1429 Epoch: 9 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:36,098-Speed 18543.18 samples/sec Loss 7.4212 LearningRate 0.1428 Epoch: 9 Global Step: 47920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:40,572-Speed 18316.25 samples/sec Loss 7.3936 LearningRate 0.1428 Epoch: 9 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:44,989-Speed 18552.36 samples/sec Loss 7.3791 LearningRate 0.1427 Epoch: 9 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:55:49,398-Speed 18584.03 samples/sec Loss 7.3897 LearningRate 0.1427 Epoch: 9 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:53,804-Speed 18597.91 samples/sec Loss 7.4594 LearningRate 0.1426 Epoch: 9 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:55:58,260-Speed 18389.59 samples/sec Loss 7.3620 LearningRate 0.1426 Epoch: 9 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:02,738-Speed 18298.80 samples/sec Loss 7.3980 LearningRate 0.1425 Epoch: 9 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:07,204-Speed 18348.79 samples/sec Loss 7.4209 LearningRate 0.1425 Epoch: 9 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:11,625-Speed 18534.63 samples/sec Loss 7.3587 LearningRate 0.1424 Epoch: 9 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:16,032-Speed 18594.05 samples/sec Loss 7.3874 LearningRate 0.1424 Epoch: 9 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:20,443-Speed 18576.14 samples/sec Loss 7.3723 LearningRate 0.1423 Epoch: 9 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:24,842-Speed 18623.64 samples/sec Loss 7.3876 LearningRate 0.1423 Epoch: 9 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:29,246-Speed 18612.08 samples/sec Loss 7.3897 LearningRate 0.1422 Epoch: 9 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:33,662-Speed 18551.33 samples/sec Loss 7.4085 LearningRate 0.1422 Epoch: 9 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:38,092-Speed 18501.44 samples/sec Loss 7.4116 LearningRate 0.1421 Epoch: 9 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:42,494-Speed 18614.59 samples/sec Loss 7.4402 LearningRate 0.1421 Epoch: 9 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:46,892-Speed 18632.27 samples/sec Loss 7.3657 LearningRate 0.1420 Epoch: 9 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:51,286-Speed 18649.87 samples/sec Loss 7.4262 LearningRate 0.1420 Epoch: 9 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:56:55,711-Speed 18517.35 samples/sec Loss 7.3855 LearningRate 0.1419 Epoch: 9 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:57:00,131-Speed 18540.46 samples/sec Loss 7.3824 LearningRate 0.1419 Epoch: 9 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:57:04,603-Speed 18322.21 samples/sec Loss 7.3524 LearningRate 0.1418 Epoch: 9 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:57:09,033-Speed 18501.04 samples/sec Loss 7.4108 LearningRate 0.1418 Epoch: 9 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:57:13,441-Speed 18587.46 samples/sec Loss 7.3842 LearningRate 0.1417 Epoch: 9 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:57:17,881-Speed 18454.88 samples/sec Loss 7.3912 LearningRate 0.1417 Epoch: 9 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:57:22,348-Speed 18347.05 samples/sec Loss 7.3774 LearningRate 0.1416 Epoch: 9 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 03:57:26,833-Speed 18269.26 samples/sec Loss 7.4247 LearningRate 0.1416 Epoch: 9 Global Step: 48170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:57:31,316-Speed 18279.04 samples/sec Loss 7.3352 LearningRate 0.1415 Epoch: 9 Global Step: 48180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:57:35,743-Speed 18506.63 samples/sec Loss 7.3992 LearningRate 0.1415 Epoch: 9 Global Step: 48190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:57:40,146-Speed 18615.23 samples/sec Loss 7.3959 LearningRate 0.1414 Epoch: 9 Global Step: 48200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:57:44,545-Speed 18625.71 samples/sec Loss 7.3841 LearningRate 0.1414 Epoch: 9 Global Step: 48210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:57:48,938-Speed 18655.01 samples/sec Loss 7.4107 LearningRate 0.1413 Epoch: 9 Global Step: 48220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:57:53,365-Speed 18510.70 samples/sec Loss 7.3609 LearningRate 0.1412 Epoch: 9 Global Step: 48230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:57:57,751-Speed 18680.85 samples/sec Loss 7.3649 LearningRate 0.1412 Epoch: 9 Global Step: 48240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:02,162-Speed 18579.20 samples/sec Loss 7.4251 LearningRate 0.1411 Epoch: 9 Global Step: 48250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:06,573-Speed 18574.90 samples/sec Loss 7.3584 LearningRate 0.1411 Epoch: 9 Global Step: 48260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:11,034-Speed 18370.62 samples/sec Loss 7.3729 LearningRate 0.1410 Epoch: 9 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:58:15,496-Speed 18364.07 samples/sec Loss 7.3794 LearningRate 0.1410 Epoch: 9 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:58:19,901-Speed 18600.65 samples/sec Loss 7.3755 LearningRate 0.1409 Epoch: 9 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:58:24,321-Speed 18539.35 samples/sec Loss 7.3684 LearningRate 0.1409 Epoch: 9 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:58:28,744-Speed 18526.40 samples/sec Loss 7.3609 LearningRate 0.1408 Epoch: 9 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:58:33,179-Speed 18476.22 samples/sec Loss 7.3736 LearningRate 0.1408 Epoch: 9 Global Step: 48320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:37,610-Speed 18498.36 samples/sec Loss 7.3761 LearningRate 0.1407 Epoch: 9 Global Step: 48330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:42,058-Speed 18419.20 samples/sec Loss 7.3591 LearningRate 0.1407 Epoch: 9 Global Step: 48340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:46,495-Speed 18470.54 samples/sec Loss 7.3701 LearningRate 0.1406 Epoch: 9 Global Step: 48350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:50,920-Speed 18518.12 samples/sec Loss 7.3417 LearningRate 0.1406 Epoch: 9 Global Step: 48360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:55,342-Speed 18538.23 samples/sec Loss 7.3727 LearningRate 0.1405 Epoch: 9 Global Step: 48370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:58:59,766-Speed 18524.24 samples/sec Loss 7.3533 LearningRate 0.1405 Epoch: 9 Global Step: 48380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:04,167-Speed 18618.06 samples/sec Loss 7.3719 LearningRate 0.1404 Epoch: 9 Global Step: 48390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:08,593-Speed 18518.00 samples/sec Loss 7.3878 LearningRate 0.1404 Epoch: 9 Global Step: 48400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:12,997-Speed 18607.57 samples/sec Loss 7.3407 LearningRate 0.1403 Epoch: 9 Global Step: 48410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:17,452-Speed 18393.50 samples/sec Loss 7.3375 LearningRate 0.1403 Epoch: 9 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:59:21,891-Speed 18461.32 samples/sec Loss 7.3447 LearningRate 0.1402 Epoch: 9 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:59:26,311-Speed 18538.90 samples/sec Loss 7.3841 LearningRate 0.1402 Epoch: 9 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 03:59:30,710-Speed 18628.89 samples/sec Loss 7.3262 LearningRate 0.1401 Epoch: 9 Global Step: 48450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:35,117-Speed 18597.18 samples/sec Loss 7.3693 LearningRate 0.1401 Epoch: 9 Global Step: 48460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:39,500-Speed 18703.12 samples/sec Loss 7.3054 LearningRate 0.1400 Epoch: 9 Global Step: 48470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:43,927-Speed 18507.56 samples/sec Loss 7.3617 LearningRate 0.1400 Epoch: 9 Global Step: 48480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:48,339-Speed 18578.77 samples/sec Loss 7.3480 LearningRate 0.1399 Epoch: 9 Global Step: 48490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:52,784-Speed 18437.87 samples/sec Loss 7.3876 LearningRate 0.1399 Epoch: 9 Global Step: 48500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 03:59:57,194-Speed 18581.69 samples/sec Loss 7.3657 LearningRate 0.1398 Epoch: 9 Global Step: 48510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:00:01,606-Speed 18578.74 samples/sec Loss 7.3635 LearningRate 0.1398 Epoch: 9 Global Step: 48520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:00:06,016-Speed 18583.02 samples/sec Loss 7.3612 LearningRate 0.1397 Epoch: 9 Global Step: 48530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:00:10,406-Speed 18666.50 samples/sec Loss 7.3606 LearningRate 0.1397 Epoch: 9 Global Step: 48540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:00:14,853-Speed 18426.59 samples/sec Loss 7.3719 LearningRate 0.1396 Epoch: 9 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:19,258-Speed 18603.07 samples/sec Loss 7.3431 LearningRate 0.1396 Epoch: 9 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:23,652-Speed 18647.26 samples/sec Loss 7.3436 LearningRate 0.1395 Epoch: 9 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:28,096-Speed 18441.60 samples/sec Loss 7.3471 LearningRate 0.1395 Epoch: 9 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:32,520-Speed 18522.59 samples/sec Loss 7.3585 LearningRate 0.1394 Epoch: 9 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:36,913-Speed 18653.16 samples/sec Loss 7.3406 LearningRate 0.1394 Epoch: 9 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:41,325-Speed 18571.55 samples/sec Loss 7.2833 LearningRate 0.1393 Epoch: 9 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:45,724-Speed 18632.26 samples/sec Loss 7.3729 LearningRate 0.1393 Epoch: 9 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:50,132-Speed 18588.71 samples/sec Loss 7.3647 LearningRate 0.1392 Epoch: 9 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:00:54,537-Speed 18601.02 samples/sec Loss 7.3877 LearningRate 0.1392 Epoch: 9 Global Step: 48640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:00:58,986-Speed 18419.08 samples/sec Loss 7.3802 LearningRate 0.1391 Epoch: 9 Global Step: 48650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:03,399-Speed 18567.78 samples/sec Loss 7.3506 LearningRate 0.1391 Epoch: 9 Global Step: 48660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:07,821-Speed 18532.53 samples/sec Loss 7.3771 LearningRate 0.1390 Epoch: 9 Global Step: 48670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:12,224-Speed 18609.34 samples/sec Loss 7.3278 LearningRate 0.1390 Epoch: 9 Global Step: 48680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:16,613-Speed 18670.44 samples/sec Loss 7.3033 LearningRate 0.1389 Epoch: 9 Global Step: 48690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:21,037-Speed 18520.07 samples/sec Loss 7.3489 LearningRate 0.1389 Epoch: 9 Global Step: 48700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:25,471-Speed 18481.99 samples/sec Loss 7.3124 LearningRate 0.1388 Epoch: 9 Global Step: 48710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:29,855-Speed 18689.54 samples/sec Loss 7.3609 LearningRate 0.1388 Epoch: 9 Global Step: 48720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:34,268-Speed 18571.21 samples/sec Loss 7.3593 LearningRate 0.1387 Epoch: 9 Global Step: 48730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:01:38,708-Speed 18455.92 samples/sec Loss 7.3152 LearningRate 0.1387 Epoch: 9 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:01:43,110-Speed 18612.98 samples/sec Loss 7.3137 LearningRate 0.1386 Epoch: 9 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:01:47,503-Speed 18652.99 samples/sec Loss 7.2987 LearningRate 0.1386 Epoch: 9 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:01:51,942-Speed 18459.81 samples/sec Loss 7.3228 LearningRate 0.1385 Epoch: 9 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:01:56,328-Speed 18683.52 samples/sec Loss 7.3352 LearningRate 0.1385 Epoch: 9 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:00,728-Speed 18622.53 samples/sec Loss 7.2921 LearningRate 0.1384 Epoch: 9 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:05,126-Speed 18632.13 samples/sec Loss 7.3388 LearningRate 0.1384 Epoch: 9 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:09,560-Speed 18483.82 samples/sec Loss 7.3301 LearningRate 0.1383 Epoch: 9 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:14,011-Speed 18408.42 samples/sec Loss 7.3360 LearningRate 0.1383 Epoch: 9 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:18,413-Speed 18618.95 samples/sec Loss 7.3196 LearningRate 0.1382 Epoch: 9 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:22,864-Speed 18407.66 samples/sec Loss 7.2932 LearningRate 0.1382 Epoch: 9 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:02:27,324-Speed 18373.30 samples/sec Loss 7.3648 LearningRate 0.1381 Epoch: 9 Global Step: 48850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:31,744-Speed 18541.72 samples/sec Loss 7.3315 LearningRate 0.1381 Epoch: 9 Global Step: 48860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:36,149-Speed 18603.67 samples/sec Loss 7.3169 LearningRate 0.1380 Epoch: 9 Global Step: 48870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:40,654-Speed 18191.68 samples/sec Loss 7.3151 LearningRate 0.1380 Epoch: 9 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:45,076-Speed 18528.93 samples/sec Loss 7.3068 LearningRate 0.1379 Epoch: 9 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:49,544-Speed 18340.57 samples/sec Loss 7.3346 LearningRate 0.1379 Epoch: 9 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:53,933-Speed 18672.14 samples/sec Loss 7.3676 LearningRate 0.1378 Epoch: 9 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:02:58,355-Speed 18527.45 samples/sec Loss 7.3368 LearningRate 0.1378 Epoch: 9 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:03:02,761-Speed 18599.10 samples/sec Loss 7.3024 LearningRate 0.1377 Epoch: 9 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:03:07,184-Speed 18528.57 samples/sec Loss 7.3479 LearningRate 0.1377 Epoch: 9 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:03:11,562-Speed 18722.57 samples/sec Loss 7.3198 LearningRate 0.1376 Epoch: 9 Global Step: 48950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:03:15,973-Speed 18578.06 samples/sec Loss 7.3361 LearningRate 0.1376 Epoch: 9 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:03:20,409-Speed 18475.90 samples/sec Loss 7.3260 LearningRate 0.1375 Epoch: 9 Global Step: 48970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:24,797-Speed 18671.65 samples/sec Loss 7.2854 LearningRate 0.1375 Epoch: 9 Global Step: 48980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:29,211-Speed 18568.06 samples/sec Loss 7.3323 LearningRate 0.1374 Epoch: 9 Global Step: 48990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:33,647-Speed 18469.77 samples/sec Loss 7.3838 LearningRate 0.1374 Epoch: 9 Global Step: 49000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:38,053-Speed 18602.11 samples/sec Loss 7.3307 LearningRate 0.1373 Epoch: 9 Global Step: 49010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:42,458-Speed 18598.16 samples/sec Loss 7.3594 LearningRate 0.1373 Epoch: 9 Global Step: 49020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:46,858-Speed 18627.92 samples/sec Loss 7.3553 LearningRate 0.1372 Epoch: 9 Global Step: 49030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:51,228-Speed 18748.11 samples/sec Loss 7.2692 LearningRate 0.1372 Epoch: 9 Global Step: 49040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:03:55,648-Speed 18540.78 samples/sec Loss 7.2976 LearningRate 0.1371 Epoch: 9 Global Step: 49050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:00,051-Speed 18607.76 samples/sec Loss 7.3373 LearningRate 0.1371 Epoch: 9 Global Step: 49060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:04,477-Speed 18516.17 samples/sec Loss 7.3195 LearningRate 0.1370 Epoch: 9 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:04:08,889-Speed 18573.22 samples/sec Loss 7.3334 LearningRate 0.1370 Epoch: 9 Global Step: 49080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:13,278-Speed 18669.94 samples/sec Loss 7.3304 LearningRate 0.1369 Epoch: 9 Global Step: 49090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:17,652-Speed 18737.39 samples/sec Loss 7.2653 LearningRate 0.1369 Epoch: 9 Global Step: 49100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:22,013-Speed 18788.28 samples/sec Loss 7.3064 LearningRate 0.1368 Epoch: 9 Global Step: 49110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:26,385-Speed 18744.96 samples/sec Loss 7.3063 LearningRate 0.1368 Epoch: 9 Global Step: 49120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:30,776-Speed 18664.41 samples/sec Loss 7.2852 LearningRate 0.1367 Epoch: 9 Global Step: 49130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:35,173-Speed 18639.35 samples/sec Loss 7.3087 LearningRate 0.1367 Epoch: 9 Global Step: 49140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:39,574-Speed 18621.62 samples/sec Loss 7.3368 LearningRate 0.1366 Epoch: 9 Global Step: 49150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:43,941-Speed 18763.21 samples/sec Loss 7.3648 LearningRate 0.1366 Epoch: 9 Global Step: 49160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:48,325-Speed 18693.28 samples/sec Loss 7.2903 LearningRate 0.1365 Epoch: 9 Global Step: 49170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:04:52,722-Speed 18635.84 samples/sec Loss 7.2780 LearningRate 0.1365 Epoch: 9 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:04:57,102-Speed 18709.87 samples/sec Loss 7.3238 LearningRate 0.1364 Epoch: 9 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:05:01,550-Speed 18424.35 samples/sec Loss 7.2935 LearningRate 0.1364 Epoch: 9 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:05:05,987-Speed 18465.03 samples/sec Loss 7.3256 LearningRate 0.1363 Epoch: 9 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:05:10,374-Speed 18682.92 samples/sec Loss 7.3166 LearningRate 0.1363 Epoch: 9 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:05:14,759-Speed 18682.75 samples/sec Loss 7.2942 LearningRate 0.1362 Epoch: 9 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:05:19,178-Speed 18546.29 samples/sec Loss 7.3248 LearningRate 0.1362 Epoch: 9 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:05:23,715-Speed 18059.46 samples/sec Loss 7.3310 LearningRate 0.1361 Epoch: 9 Global Step: 49250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:28,136-Speed 18533.92 samples/sec Loss 7.3787 LearningRate 0.1361 Epoch: 9 Global Step: 49260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:32,533-Speed 18632.84 samples/sec Loss 7.2834 LearningRate 0.1360 Epoch: 9 Global Step: 49270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:37,006-Speed 18319.50 samples/sec Loss 7.2936 LearningRate 0.1360 Epoch: 9 Global Step: 49280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:41,417-Speed 18579.56 samples/sec Loss 7.3046 LearningRate 0.1359 Epoch: 9 Global Step: 49290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:45,862-Speed 18432.18 samples/sec Loss 7.3033 LearningRate 0.1359 Epoch: 9 Global Step: 49300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:50,312-Speed 18414.26 samples/sec Loss 7.2851 LearningRate 0.1358 Epoch: 9 Global Step: 49310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:54,717-Speed 18599.44 samples/sec Loss 7.3209 LearningRate 0.1358 Epoch: 9 Global Step: 49320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:05:59,128-Speed 18573.54 samples/sec Loss 7.2687 LearningRate 0.1357 Epoch: 9 Global Step: 49330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:03,514-Speed 18685.13 samples/sec Loss 7.3413 LearningRate 0.1357 Epoch: 9 Global Step: 49340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:07,919-Speed 18603.92 samples/sec Loss 7.3507 LearningRate 0.1356 Epoch: 9 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:06:12,352-Speed 18483.12 samples/sec Loss 7.3093 LearningRate 0.1356 Epoch: 9 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:06:16,769-Speed 18557.34 samples/sec Loss 7.3127 LearningRate 0.1355 Epoch: 9 Global Step: 49370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:21,161-Speed 18656.45 samples/sec Loss 7.2835 LearningRate 0.1355 Epoch: 9 Global Step: 49380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:25,627-Speed 18345.48 samples/sec Loss 7.2806 LearningRate 0.1354 Epoch: 9 Global Step: 49390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:30,016-Speed 18669.62 samples/sec Loss 7.2787 LearningRate 0.1354 Epoch: 9 Global Step: 49400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:34,426-Speed 18580.21 samples/sec Loss 7.2750 LearningRate 0.1353 Epoch: 9 Global Step: 49410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:38,889-Speed 18362.42 samples/sec Loss 7.3105 LearningRate 0.1353 Epoch: 9 Global Step: 49420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:43,293-Speed 18603.66 samples/sec Loss 7.3249 LearningRate 0.1352 Epoch: 9 Global Step: 49430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:47,667-Speed 18742.58 samples/sec Loss 7.2977 LearningRate 0.1352 Epoch: 9 Global Step: 49440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:52,079-Speed 18574.97 samples/sec Loss 7.2478 LearningRate 0.1351 Epoch: 9 Global Step: 49450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:06:56,512-Speed 18483.22 samples/sec Loss 7.2726 LearningRate 0.1351 Epoch: 9 Global Step: 49460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:07:00,905-Speed 18658.56 samples/sec Loss 7.3478 LearningRate 0.1350 Epoch: 9 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:05,347-Speed 18450.57 samples/sec Loss 7.3258 LearningRate 0.1350 Epoch: 9 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:09,767-Speed 18540.99 samples/sec Loss 7.2916 LearningRate 0.1349 Epoch: 9 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:14,175-Speed 18588.18 samples/sec Loss 7.2851 LearningRate 0.1349 Epoch: 9 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:18,583-Speed 18589.24 samples/sec Loss 7.2480 LearningRate 0.1348 Epoch: 9 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:22,988-Speed 18600.90 samples/sec Loss 7.3391 LearningRate 0.1348 Epoch: 9 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:27,438-Speed 18417.34 samples/sec Loss 7.3093 LearningRate 0.1347 Epoch: 9 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:31,840-Speed 18615.97 samples/sec Loss 7.2746 LearningRate 0.1347 Epoch: 9 Global Step: 49540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:36,289-Speed 18421.40 samples/sec Loss 7.2662 LearningRate 0.1346 Epoch: 9 Global Step: 49550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:40,718-Speed 18501.85 samples/sec Loss 7.2485 LearningRate 0.1346 Epoch: 9 Global Step: 49560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:45,143-Speed 18518.06 samples/sec Loss 7.2928 LearningRate 0.1345 Epoch: 9 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:07:49,555-Speed 18575.90 samples/sec Loss 7.3155 LearningRate 0.1345 Epoch: 9 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:07:53,935-Speed 18706.88 samples/sec Loss 7.2819 LearningRate 0.1344 Epoch: 9 Global Step: 49590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:07:58,344-Speed 18585.98 samples/sec Loss 7.2966 LearningRate 0.1344 Epoch: 9 Global Step: 49600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:02,746-Speed 18617.68 samples/sec Loss 7.3324 LearningRate 0.1343 Epoch: 9 Global Step: 49610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:07,137-Speed 18661.23 samples/sec Loss 7.2856 LearningRate 0.1343 Epoch: 9 Global Step: 49620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:11,546-Speed 18585.66 samples/sec Loss 7.3195 LearningRate 0.1342 Epoch: 9 Global Step: 49630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:15,941-Speed 18640.71 samples/sec Loss 7.2599 LearningRate 0.1342 Epoch: 9 Global Step: 49640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:20,332-Speed 18660.91 samples/sec Loss 7.3003 LearningRate 0.1341 Epoch: 9 Global Step: 49650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:24,738-Speed 18600.80 samples/sec Loss 7.2354 LearningRate 0.1341 Epoch: 9 Global Step: 49660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:29,285-Speed 18019.04 samples/sec Loss 7.2789 LearningRate 0.1340 Epoch: 9 Global Step: 49670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:33,730-Speed 18438.13 samples/sec Loss 7.2579 LearningRate 0.1340 Epoch: 9 Global Step: 49680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:08:38,243-Speed 18159.83 samples/sec Loss 7.2852 LearningRate 0.1339 Epoch: 9 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:08:42,711-Speed 18339.79 samples/sec Loss 7.2896 LearningRate 0.1339 Epoch: 9 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:08:47,098-Speed 18676.50 samples/sec Loss 7.2650 LearningRate 0.1338 Epoch: 9 Global Step: 49710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:08:51,475-Speed 18720.20 samples/sec Loss 7.3259 LearningRate 0.1338 Epoch: 9 Global Step: 49720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:08:55,894-Speed 18545.39 samples/sec Loss 7.2683 LearningRate 0.1337 Epoch: 9 Global Step: 49730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:00,298-Speed 18605.03 samples/sec Loss 7.2924 LearningRate 0.1337 Epoch: 9 Global Step: 49740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:04,709-Speed 18577.75 samples/sec Loss 7.2734 LearningRate 0.1336 Epoch: 9 Global Step: 49750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:09,118-Speed 18581.70 samples/sec Loss 7.3038 LearningRate 0.1336 Epoch: 9 Global Step: 49760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:13,540-Speed 18531.13 samples/sec Loss 7.2470 LearningRate 0.1335 Epoch: 9 Global Step: 49770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:17,952-Speed 18576.81 samples/sec Loss 7.3160 LearningRate 0.1335 Epoch: 9 Global Step: 49780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:22,377-Speed 18516.19 samples/sec Loss 7.2576 LearningRate 0.1334 Epoch: 9 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:09:26,819-Speed 18448.32 samples/sec Loss 7.2827 LearningRate 0.1334 Epoch: 9 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:31,275-Speed 18386.89 samples/sec Loss 7.2229 LearningRate 0.1333 Epoch: 9 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:35,658-Speed 18697.44 samples/sec Loss 7.2273 LearningRate 0.1333 Epoch: 9 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:40,048-Speed 18664.09 samples/sec Loss 7.2846 LearningRate 0.1332 Epoch: 9 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:44,467-Speed 18545.65 samples/sec Loss 7.2738 LearningRate 0.1332 Epoch: 9 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:09:48,885-Speed 18554.56 samples/sec Loss 7.2383 LearningRate 0.1331 Epoch: 9 Global Step: 49850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:09:53,334-Speed 18419.31 samples/sec Loss 7.2252 LearningRate 0.1331 Epoch: 9 Global Step: 49860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:09:57,796-Speed 18364.24 samples/sec Loss 7.2298 LearningRate 0.1330 Epoch: 9 Global Step: 49870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:02,256-Speed 18374.34 samples/sec Loss 7.2544 LearningRate 0.1330 Epoch: 9 Global Step: 49880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:06,667-Speed 18575.07 samples/sec Loss 7.2654 LearningRate 0.1329 Epoch: 9 Global Step: 49890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:11,078-Speed 18581.88 samples/sec Loss 7.2410 LearningRate 0.1329 Epoch: 9 Global Step: 49900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:15,484-Speed 18600.75 samples/sec Loss 7.2788 LearningRate 0.1328 Epoch: 9 Global Step: 49910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:19,893-Speed 18584.84 samples/sec Loss 7.1891 LearningRate 0.1328 Epoch: 9 Global Step: 49920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:24,349-Speed 18389.65 samples/sec Loss 7.2642 LearningRate 0.1327 Epoch: 9 Global Step: 49930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:28,828-Speed 18294.58 samples/sec Loss 7.2533 LearningRate 0.1327 Epoch: 9 Global Step: 49940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:33,304-Speed 18309.59 samples/sec Loss 7.1997 LearningRate 0.1326 Epoch: 9 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:10:37,721-Speed 18549.71 samples/sec Loss 7.2664 LearningRate 0.1326 Epoch: 9 Global Step: 49960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:42,145-Speed 18526.59 samples/sec Loss 7.3230 LearningRate 0.1325 Epoch: 9 Global Step: 49970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:46,581-Speed 18468.68 samples/sec Loss 7.2637 LearningRate 0.1325 Epoch: 9 Global Step: 49980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:51,027-Speed 18432.16 samples/sec Loss 7.2527 LearningRate 0.1324 Epoch: 9 Global Step: 49990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:10:55,529-Speed 18201.95 samples/sec Loss 7.2122 LearningRate 0.1324 Epoch: 9 Global Step: 50000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:11:00,010-Speed 18289.52 samples/sec Loss 7.2223 LearningRate 0.1323 Epoch: 9 Global Step: 50010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:11:04,419-Speed 18587.91 samples/sec Loss 7.2348 LearningRate 0.1323 Epoch: 9 Global Step: 50020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:11:08,827-Speed 18590.17 samples/sec Loss 7.2358 LearningRate 0.1322 Epoch: 9 Global Step: 50030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:11:13,284-Speed 18383.43 samples/sec Loss 7.2441 LearningRate 0.1322 Epoch: 9 Global Step: 50040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:11:17,730-Speed 18433.10 samples/sec Loss 7.2557 LearningRate 0.1321 Epoch: 9 Global Step: 50050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:11:22,141-Speed 18573.77 samples/sec Loss 7.2304 LearningRate 0.1321 Epoch: 9 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:11:26,572-Speed 18493.40 samples/sec Loss 7.2830 LearningRate 0.1320 Epoch: 9 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:11:31,002-Speed 18502.60 samples/sec Loss 7.2843 LearningRate 0.1320 Epoch: 9 Global Step: 50080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:11:35,411-Speed 18588.44 samples/sec Loss 7.2836 LearningRate 0.1319 Epoch: 9 Global Step: 50090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:11:39,839-Speed 18509.20 samples/sec Loss 7.2345 LearningRate 0.1319 Epoch: 9 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:11:44,251-Speed 18572.20 samples/sec Loss 7.2898 LearningRate 0.1318 Epoch: 9 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:11:48,661-Speed 18594.37 samples/sec Loss 7.2417 LearningRate 0.1318 Epoch: 9 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:11:53,080-Speed 18548.31 samples/sec Loss 7.2666 LearningRate 0.1317 Epoch: 9 Global Step: 50130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:11:57,493-Speed 18568.51 samples/sec Loss 7.2233 LearningRate 0.1317 Epoch: 9 Global Step: 50140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:01,948-Speed 18393.81 samples/sec Loss 7.2252 LearningRate 0.1316 Epoch: 9 Global Step: 50150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:06,384-Speed 18474.77 samples/sec Loss 7.2219 LearningRate 0.1316 Epoch: 9 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:10,775-Speed 18663.21 samples/sec Loss 7.2082 LearningRate 0.1315 Epoch: 9 Global Step: 50170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:15,197-Speed 18532.70 samples/sec Loss 7.2744 LearningRate 0.1315 Epoch: 9 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:19,614-Speed 18548.33 samples/sec Loss 7.2472 LearningRate 0.1314 Epoch: 9 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:24,043-Speed 18501.15 samples/sec Loss 7.2021 LearningRate 0.1314 Epoch: 9 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:28,433-Speed 18667.60 samples/sec Loss 7.2228 LearningRate 0.1313 Epoch: 9 Global Step: 50210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:32,851-Speed 18549.13 samples/sec Loss 7.2680 LearningRate 0.1313 Epoch: 9 Global Step: 50220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:37,316-Speed 18351.07 samples/sec Loss 7.2352 LearningRate 0.1312 Epoch: 9 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:12:41,802-Speed 18272.47 samples/sec Loss 7.2850 LearningRate 0.1312 Epoch: 9 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:12:46,204-Speed 18611.85 samples/sec Loss 7.2704 LearningRate 0.1311 Epoch: 9 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:12:50,652-Speed 18426.21 samples/sec Loss 7.2418 LearningRate 0.1311 Epoch: 9 Global Step: 50260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:55,101-Speed 18418.02 samples/sec Loss 7.2714 LearningRate 0.1310 Epoch: 9 Global Step: 50270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:12:59,515-Speed 18566.18 samples/sec Loss 7.2463 LearningRate 0.1310 Epoch: 9 Global Step: 50280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:03,928-Speed 18570.26 samples/sec Loss 7.2623 LearningRate 0.1309 Epoch: 9 Global Step: 50290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:08,300-Speed 18742.46 samples/sec Loss 7.2773 LearningRate 0.1309 Epoch: 9 Global Step: 50300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:12,723-Speed 18526.80 samples/sec Loss 7.2192 LearningRate 0.1309 Epoch: 9 Global Step: 50310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:17,116-Speed 18652.22 samples/sec Loss 7.2531 LearningRate 0.1308 Epoch: 9 Global Step: 50320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:21,608-Speed 18241.31 samples/sec Loss 7.1968 LearningRate 0.1308 Epoch: 9 Global Step: 50330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:26,016-Speed 18590.40 samples/sec Loss 7.2099 LearningRate 0.1307 Epoch: 9 Global Step: 50340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:30,427-Speed 18577.89 samples/sec Loss 7.2371 LearningRate 0.1307 Epoch: 9 Global Step: 50350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:13:34,914-Speed 18259.22 samples/sec Loss 7.2127 LearningRate 0.1306 Epoch: 9 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:13:39,303-Speed 18677.72 samples/sec Loss 7.2229 LearningRate 0.1306 Epoch: 9 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:13:43,692-Speed 18676.53 samples/sec Loss 7.1963 LearningRate 0.1305 Epoch: 9 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:13:48,161-Speed 18340.12 samples/sec Loss 7.2366 LearningRate 0.1305 Epoch: 9 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:13:52,617-Speed 18395.03 samples/sec Loss 7.1998 LearningRate 0.1304 Epoch: 9 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:13:57,003-Speed 18681.93 samples/sec Loss 7.2251 LearningRate 0.1304 Epoch: 9 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:14:01,398-Speed 18648.58 samples/sec Loss 7.2547 LearningRate 0.1303 Epoch: 9 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:14:05,810-Speed 18572.83 samples/sec Loss 7.2473 LearningRate 0.1303 Epoch: 9 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:14:10,229-Speed 18538.40 samples/sec Loss 7.2289 LearningRate 0.1302 Epoch: 9 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:14:14,642-Speed 18572.78 samples/sec Loss 7.2406 LearningRate 0.1302 Epoch: 9 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:14:19,059-Speed 18555.07 samples/sec Loss 7.2303 LearningRate 0.1301 Epoch: 9 Global Step: 50460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:23,464-Speed 18601.75 samples/sec Loss 7.2207 LearningRate 0.1301 Epoch: 9 Global Step: 50470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:27,919-Speed 18396.12 samples/sec Loss 7.2441 LearningRate 0.1300 Epoch: 9 Global Step: 50480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:32,374-Speed 18393.87 samples/sec Loss 7.2114 LearningRate 0.1300 Epoch: 9 Global Step: 50490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:36,818-Speed 18437.30 samples/sec Loss 7.1901 LearningRate 0.1299 Epoch: 9 Global Step: 50500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:41,302-Speed 18271.94 samples/sec Loss 7.2264 LearningRate 0.1299 Epoch: 9 Global Step: 50510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:45,720-Speed 18550.43 samples/sec Loss 7.2011 LearningRate 0.1298 Epoch: 9 Global Step: 50520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:50,162-Speed 18447.21 samples/sec Loss 7.2059 LearningRate 0.1298 Epoch: 9 Global Step: 50530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:54,551-Speed 18669.12 samples/sec Loss 7.1859 LearningRate 0.1297 Epoch: 9 Global Step: 50540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:14:58,971-Speed 18542.18 samples/sec Loss 7.1954 LearningRate 0.1297 Epoch: 9 Global Step: 50550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:03,405-Speed 18479.88 samples/sec Loss 7.1945 LearningRate 0.1296 Epoch: 9 Global Step: 50560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:07,823-Speed 18548.36 samples/sec Loss 7.2122 LearningRate 0.1296 Epoch: 9 Global Step: 50570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:12,292-Speed 18333.53 samples/sec Loss 7.1934 LearningRate 0.1295 Epoch: 9 Global Step: 50580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:16,715-Speed 18524.95 samples/sec Loss 7.1932 LearningRate 0.1295 Epoch: 9 Global Step: 50590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:21,190-Speed 18312.50 samples/sec Loss 7.2424 LearningRate 0.1294 Epoch: 9 Global Step: 50600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:25,599-Speed 18585.88 samples/sec Loss 7.2093 LearningRate 0.1294 Epoch: 9 Global Step: 50610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:29,985-Speed 18682.62 samples/sec Loss 7.2453 LearningRate 0.1293 Epoch: 9 Global Step: 50620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:34,420-Speed 18479.54 samples/sec Loss 7.2367 LearningRate 0.1293 Epoch: 9 Global Step: 50630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:38,829-Speed 18581.75 samples/sec Loss 7.2276 LearningRate 0.1292 Epoch: 9 Global Step: 50640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-01-14 04:15:43,264-Speed 18480.79 samples/sec Loss 7.2343 LearningRate 0.1292 Epoch: 9 Global Step: 50650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:15:47,671-Speed 18595.45 samples/sec Loss 7.2074 LearningRate 0.1291 Epoch: 9 Global Step: 50660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:15:52,123-Speed 18405.50 samples/sec Loss 7.2470 LearningRate 0.1291 Epoch: 9 Global Step: 50670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:15:56,546-Speed 18527.32 samples/sec Loss 7.2228 LearningRate 0.1290 Epoch: 9 Global Step: 50680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:16:00,980-Speed 18481.14 samples/sec Loss 7.2192 LearningRate 0.1290 Epoch: 9 Global Step: 50690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:16:05,375-Speed 18644.32 samples/sec Loss 7.2049 LearningRate 0.1289 Epoch: 9 Global Step: 50700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:16:09,777-Speed 18611.92 samples/sec Loss 7.2015 LearningRate 0.1289 Epoch: 9 Global Step: 50710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:16:14,212-Speed 18477.06 samples/sec Loss 7.2246 LearningRate 0.1288 Epoch: 9 Global Step: 50720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:16:18,671-Speed 18377.76 samples/sec Loss 7.2098 LearningRate 0.1288 Epoch: 9 Global Step: 50730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:16:23,110-Speed 18468.40 samples/sec Loss 7.2093 LearningRate 0.1288 Epoch: 9 Global Step: 50740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:16:27,531-Speed 18540.60 samples/sec Loss 7.2080 LearningRate 0.1287 Epoch: 9 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:16:31,951-Speed 18538.66 samples/sec Loss 7.1885 LearningRate 0.1287 Epoch: 9 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:16:36,354-Speed 18609.69 samples/sec Loss 7.2293 LearningRate 0.1286 Epoch: 9 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:16:40,751-Speed 18638.46 samples/sec Loss 7.2168 LearningRate 0.1286 Epoch: 9 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:16:45,149-Speed 18628.95 samples/sec Loss 7.1744 LearningRate 0.1285 Epoch: 9 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:16:49,571-Speed 18535.45 samples/sec Loss 7.2058 LearningRate 0.1285 Epoch: 9 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:16:54,023-Speed 18407.30 samples/sec Loss 7.2210 LearningRate 0.1284 Epoch: 9 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:16:58,429-Speed 18598.99 samples/sec Loss 7.2231 LearningRate 0.1284 Epoch: 9 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:02,831-Speed 18613.18 samples/sec Loss 7.1688 LearningRate 0.1283 Epoch: 9 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:07,221-Speed 18669.05 samples/sec Loss 7.2270 LearningRate 0.1283 Epoch: 9 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:11,624-Speed 18613.38 samples/sec Loss 7.1884 LearningRate 0.1282 Epoch: 9 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:16,078-Speed 18399.95 samples/sec Loss 7.1881 LearningRate 0.1282 Epoch: 9 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:20,485-Speed 18594.41 samples/sec Loss 7.2021 LearningRate 0.1281 Epoch: 9 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:24,874-Speed 18672.13 samples/sec Loss 7.1991 LearningRate 0.1281 Epoch: 9 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:29,272-Speed 18633.07 samples/sec Loss 7.1519 LearningRate 0.1280 Epoch: 9 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:33,732-Speed 18373.10 samples/sec Loss 7.2117 LearningRate 0.1280 Epoch: 9 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:38,207-Speed 18310.11 samples/sec Loss 7.1895 LearningRate 0.1279 Epoch: 9 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:42,633-Speed 18516.86 samples/sec Loss 7.2356 LearningRate 0.1279 Epoch: 9 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:47,076-Speed 18442.99 samples/sec Loss 7.1742 LearningRate 0.1278 Epoch: 9 Global Step: 50930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:51,523-Speed 18427.37 samples/sec Loss 7.1624 LearningRate 0.1278 Epoch: 9 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:17:55,993-Speed 18334.35 samples/sec Loss 7.1739 LearningRate 0.1277 Epoch: 9 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:18:00,397-Speed 18602.89 samples/sec Loss 7.2045 LearningRate 0.1277 Epoch: 9 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:18:04,833-Speed 18474.23 samples/sec Loss 7.1719 LearningRate 0.1276 Epoch: 9 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:09,267-Speed 18479.65 samples/sec Loss 7.1674 LearningRate 0.1276 Epoch: 9 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:13,672-Speed 18601.05 samples/sec Loss 7.1775 LearningRate 0.1275 Epoch: 9 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:18,078-Speed 18600.43 samples/sec Loss 7.1768 LearningRate 0.1275 Epoch: 9 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:22,481-Speed 18608.11 samples/sec Loss 7.2009 LearningRate 0.1274 Epoch: 9 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:26,990-Speed 18178.70 samples/sec Loss 7.1989 LearningRate 0.1274 Epoch: 9 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:31,444-Speed 18405.89 samples/sec Loss 7.1730 LearningRate 0.1273 Epoch: 9 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:35,878-Speed 18480.71 samples/sec Loss 7.2497 LearningRate 0.1273 Epoch: 9 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:40,300-Speed 18536.00 samples/sec Loss 7.1829 LearningRate 0.1272 Epoch: 9 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:44,719-Speed 18541.63 samples/sec Loss 7.1904 LearningRate 0.1272 Epoch: 9 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:18:49,187-Speed 18342.38 samples/sec Loss 7.2105 LearningRate 0.1272 Epoch: 9 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:18:53,650-Speed 18359.59 samples/sec Loss 7.2570 LearningRate 0.1271 Epoch: 9 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:18:58,079-Speed 18505.54 samples/sec Loss 7.1796 LearningRate 0.1271 Epoch: 9 Global Step: 51090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:02,503-Speed 18520.33 samples/sec Loss 7.1533 LearningRate 0.1270 Epoch: 9 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:06,929-Speed 18512.00 samples/sec Loss 7.1994 LearningRate 0.1270 Epoch: 9 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:11,360-Speed 18492.88 samples/sec Loss 7.2295 LearningRate 0.1269 Epoch: 9 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:15,764-Speed 18606.64 samples/sec Loss 7.1847 LearningRate 0.1269 Epoch: 9 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:20,199-Speed 18473.73 samples/sec Loss 7.1647 LearningRate 0.1268 Epoch: 9 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:24,598-Speed 18628.56 samples/sec Loss 7.1715 LearningRate 0.1268 Epoch: 9 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:29,022-Speed 18523.51 samples/sec Loss 7.1959 LearningRate 0.1267 Epoch: 9 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:33,442-Speed 18536.05 samples/sec Loss 7.1857 LearningRate 0.1267 Epoch: 9 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:37,861-Speed 18539.88 samples/sec Loss 7.2036 LearningRate 0.1266 Epoch: 9 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:19:42,305-Speed 18437.29 samples/sec Loss 7.1835 LearningRate 0.1266 Epoch: 9 Global Step: 51190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:19:46,734-Speed 18503.85 samples/sec Loss 7.1781 LearningRate 0.1265 Epoch: 9 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:19:51,130-Speed 18642.76 samples/sec Loss 7.1697 LearningRate 0.1265 Epoch: 9 Global Step: 51210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:19:55,589-Speed 18378.01 samples/sec Loss 7.1683 LearningRate 0.1264 Epoch: 9 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:20:00,010-Speed 18534.11 samples/sec Loss 7.1512 LearningRate 0.1264 Epoch: 9 Global Step: 51230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-14 04:20:04,415-Speed 18606.27 samples/sec Loss 7.1476 LearningRate 0.1263 Epoch: 9 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:08,784-Speed 18752.72 samples/sec Loss 7.1634 LearningRate 0.1263 Epoch: 9 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:13,173-Speed 18670.37 samples/sec Loss 7.1436 LearningRate 0.1262 Epoch: 9 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:17,584-Speed 18577.85 samples/sec Loss 7.2022 LearningRate 0.1262 Epoch: 9 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:22,020-Speed 18471.55 samples/sec Loss 7.1796 LearningRate 0.1261 Epoch: 9 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:26,488-Speed 18337.92 samples/sec Loss 7.2062 LearningRate 0.1261 Epoch: 9 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:30,887-Speed 18635.07 samples/sec Loss 7.1839 LearningRate 0.1260 Epoch: 9 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:35,368-Speed 18287.95 samples/sec Loss 7.1636 LearningRate 0.1260 Epoch: 9 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:39,817-Speed 18417.13 samples/sec Loss 7.1610 LearningRate 0.1259 Epoch: 9 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:44,244-Speed 18507.65 samples/sec Loss 7.1828 LearningRate 0.1259 Epoch: 9 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:48,636-Speed 18685.78 samples/sec Loss 7.2005 LearningRate 0.1258 Epoch: 9 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:53,141-Speed 18188.60 samples/sec Loss 7.2114 LearningRate 0.1258 Epoch: 9 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:20:57,588-Speed 18424.88 samples/sec Loss 7.1664 LearningRate 0.1258 Epoch: 9 Global Step: 51360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-14 04:21:02,013-Speed 18518.13 samples/sec Loss 7.1294 LearningRate 0.1257 Epoch: 9 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:21:06,409-Speed 18643.61 samples/sec Loss 7.1160 LearningRate 0.1257 Epoch: 9 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:21:10,820-Speed 18575.61 samples/sec Loss 7.1434 LearningRate 0.1256 Epoch: 9 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:21:15,223-Speed 18612.28 samples/sec Loss 7.1556 LearningRate 0.1256 Epoch: 9 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:21:19,638-Speed 18559.41 samples/sec Loss 7.1549 LearningRate 0.1255 Epoch: 9 Global Step: 51410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:21:24,087-Speed 18420.86 samples/sec Loss 7.1851 LearningRate 0.1255 Epoch: 9 Global Step: 51420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-14 04:21:28,550-Speed 18358.14 samples/sec Loss 7.1284 LearningRate 0.1254 Epoch: 9 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:21:32,956-Speed 18600.04 samples/sec Loss 7.0975 LearningRate 0.1254 Epoch: 9 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:21:37,432-Speed 18308.68 samples/sec Loss 7.1211 LearningRate 0.1253 Epoch: 9 Global Step: 51450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:21:41,833-Speed 18618.14 samples/sec Loss 7.1386 LearningRate 0.1253 Epoch: 9 Global Step: 51460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:21:46,283-Speed 18412.96 samples/sec Loss 7.1478 LearningRate 0.1252 Epoch: 9 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:21:50,723-Speed 18454.78 samples/sec Loss 7.1784 LearningRate 0.1252 Epoch: 9 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:21:55,176-Speed 18406.41 samples/sec Loss 7.1780 LearningRate 0.1251 Epoch: 9 Global Step: 51490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:21:59,626-Speed 18409.59 samples/sec Loss 7.1785 LearningRate 0.1251 Epoch: 9 Global Step: 51500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:04,067-Speed 18453.35 samples/sec Loss 7.1382 LearningRate 0.1250 Epoch: 9 Global Step: 51510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:08,468-Speed 18617.87 samples/sec Loss 7.1397 LearningRate 0.1250 Epoch: 9 Global Step: 51520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:12,879-Speed 18576.87 samples/sec Loss 7.1666 LearningRate 0.1249 Epoch: 9 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:17,270-Speed 18666.44 samples/sec Loss 7.1634 LearningRate 0.1249 Epoch: 9 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:21,678-Speed 18585.25 samples/sec Loss 7.1270 LearningRate 0.1248 Epoch: 9 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:26,153-Speed 18315.30 samples/sec Loss 7.1694 LearningRate 0.1248 Epoch: 9 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:30,613-Speed 18375.34 samples/sec Loss 7.1820 LearningRate 0.1247 Epoch: 9 Global Step: 51570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:22:35,071-Speed 18381.11 samples/sec Loss 7.1563 LearningRate 0.1247 Epoch: 9 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:39,527-Speed 18389.01 samples/sec Loss 7.1495 LearningRate 0.1247 Epoch: 9 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:43,968-Speed 18452.18 samples/sec Loss 7.1970 LearningRate 0.1246 Epoch: 9 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:48,413-Speed 18438.12 samples/sec Loss 7.1672 LearningRate 0.1246 Epoch: 9 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:52,829-Speed 18554.61 samples/sec Loss 7.1065 LearningRate 0.1245 Epoch: 9 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:22:57,304-Speed 18310.73 samples/sec Loss 7.1417 LearningRate 0.1245 Epoch: 9 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:01,748-Speed 18440.99 samples/sec Loss 7.1752 LearningRate 0.1244 Epoch: 9 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:06,219-Speed 18327.63 samples/sec Loss 7.1310 LearningRate 0.1244 Epoch: 9 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:10,645-Speed 18512.27 samples/sec Loss 7.1470 LearningRate 0.1243 Epoch: 9 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:15,050-Speed 18602.86 samples/sec Loss 7.1530 LearningRate 0.1243 Epoch: 9 Global Step: 51670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:19,463-Speed 18567.64 samples/sec Loss 7.1649 LearningRate 0.1242 Epoch: 9 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:23:23,873-Speed 18579.43 samples/sec Loss 7.1884 LearningRate 0.1242 Epoch: 9 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:23:28,283-Speed 18586.24 samples/sec Loss 7.1485 LearningRate 0.1241 Epoch: 9 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:23:32,677-Speed 18646.46 samples/sec Loss 7.1029 LearningRate 0.1241 Epoch: 9 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:37,140-Speed 18360.15 samples/sec Loss 7.1275 LearningRate 0.1240 Epoch: 9 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:41,556-Speed 18556.08 samples/sec Loss 7.1209 LearningRate 0.1240 Epoch: 9 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:45,981-Speed 18516.76 samples/sec Loss 7.1179 LearningRate 0.1239 Epoch: 9 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:50,380-Speed 18630.39 samples/sec Loss 7.1326 LearningRate 0.1239 Epoch: 9 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:54,765-Speed 18690.60 samples/sec Loss 7.1738 LearningRate 0.1238 Epoch: 9 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:23:59,157-Speed 18655.85 samples/sec Loss 7.1514 LearningRate 0.1238 Epoch: 9 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:03,586-Speed 18501.02 samples/sec Loss 7.1371 LearningRate 0.1237 Epoch: 9 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:07,983-Speed 18635.29 samples/sec Loss 7.1576 LearningRate 0.1237 Epoch: 9 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:12,416-Speed 18487.20 samples/sec Loss 7.1790 LearningRate 0.1236 Epoch: 9 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:16,864-Speed 18424.79 samples/sec Loss 7.0887 LearningRate 0.1236 Epoch: 9 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:21,286-Speed 18528.94 samples/sec Loss 7.1277 LearningRate 0.1236 Epoch: 9 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:25,716-Speed 18502.04 samples/sec Loss 7.1423 LearningRate 0.1235 Epoch: 9 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:30,174-Speed 18378.02 samples/sec Loss 7.1424 LearningRate 0.1235 Epoch: 9 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:48,394-Speed 4496.82 samples/sec Loss 7.1307 LearningRate 0.1234 Epoch: 10 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:52,819-Speed 18515.76 samples/sec Loss 7.1418 LearningRate 0.1234 Epoch: 10 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:24:57,226-Speed 18608.97 samples/sec Loss 7.1220 LearningRate 0.1233 Epoch: 10 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:25:01,636-Speed 18578.61 samples/sec Loss 7.1122 LearningRate 0.1233 Epoch: 10 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:25:06,094-Speed 18382.75 samples/sec Loss 7.1403 LearningRate 0.1232 Epoch: 10 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:25:10,491-Speed 18633.41 samples/sec Loss 7.1369 LearningRate 0.1232 Epoch: 10 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:25:14,968-Speed 18302.00 samples/sec Loss 7.1040 LearningRate 0.1231 Epoch: 10 Global Step: 51910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:25:19,474-Speed 18185.61 samples/sec Loss 7.1339 LearningRate 0.1231 Epoch: 10 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:25:23,869-Speed 18643.72 samples/sec Loss 7.0855 LearningRate 0.1230 Epoch: 10 Global Step: 51930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:28,270-Speed 18626.69 samples/sec Loss 7.1420 LearningRate 0.1230 Epoch: 10 Global Step: 51940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:32,659-Speed 18674.85 samples/sec Loss 7.1001 LearningRate 0.1229 Epoch: 10 Global Step: 51950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:37,056-Speed 18637.74 samples/sec Loss 7.1009 LearningRate 0.1229 Epoch: 10 Global Step: 51960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:41,471-Speed 18561.02 samples/sec Loss 7.0937 LearningRate 0.1228 Epoch: 10 Global Step: 51970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:45,910-Speed 18459.68 samples/sec Loss 7.1061 LearningRate 0.1228 Epoch: 10 Global Step: 51980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:50,353-Speed 18446.65 samples/sec Loss 7.1469 LearningRate 0.1227 Epoch: 10 Global Step: 51990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:54,768-Speed 18562.26 samples/sec Loss 7.0905 LearningRate 0.1227 Epoch: 10 Global Step: 52000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:25:59,218-Speed 18414.86 samples/sec Loss 7.1329 LearningRate 0.1226 Epoch: 10 Global Step: 52010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:26:09,164-Speed 8237.81 samples/sec Loss 7.1232 LearningRate 0.1226 Epoch: 10 Global Step: 52020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:26:13,595-Speed 18493.42 samples/sec Loss 7.0870 LearningRate 0.1226 Epoch: 10 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:18,015-Speed 18541.18 samples/sec Loss 7.1198 LearningRate 0.1225 Epoch: 10 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:22,519-Speed 18194.54 samples/sec Loss 7.1019 LearningRate 0.1225 Epoch: 10 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:26,976-Speed 18386.51 samples/sec Loss 7.1176 LearningRate 0.1224 Epoch: 10 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:31,387-Speed 18579.84 samples/sec Loss 7.1624 LearningRate 0.1224 Epoch: 10 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:35,799-Speed 18577.29 samples/sec Loss 7.0695 LearningRate 0.1223 Epoch: 10 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:40,279-Speed 18288.75 samples/sec Loss 7.0533 LearningRate 0.1223 Epoch: 10 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:44,694-Speed 18558.61 samples/sec Loss 7.1351 LearningRate 0.1222 Epoch: 10 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:49,111-Speed 18554.95 samples/sec Loss 7.1338 LearningRate 0.1222 Epoch: 10 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:53,561-Speed 18418.86 samples/sec Loss 7.1279 LearningRate 0.1221 Epoch: 10 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:26:58,016-Speed 18396.39 samples/sec Loss 7.1343 LearningRate 0.1221 Epoch: 10 Global Step: 52130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:27:02,486-Speed 18332.59 samples/sec Loss 7.1096 LearningRate 0.1220 Epoch: 10 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:06,891-Speed 18602.91 samples/sec Loss 7.1145 LearningRate 0.1220 Epoch: 10 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:11,271-Speed 18710.80 samples/sec Loss 7.0776 LearningRate 0.1219 Epoch: 10 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:15,703-Speed 18501.32 samples/sec Loss 7.0854 LearningRate 0.1219 Epoch: 10 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:20,122-Speed 18548.75 samples/sec Loss 7.1250 LearningRate 0.1218 Epoch: 10 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:24,568-Speed 18433.14 samples/sec Loss 7.1051 LearningRate 0.1218 Epoch: 10 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:28,982-Speed 18561.93 samples/sec Loss 7.1027 LearningRate 0.1217 Epoch: 10 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:33,405-Speed 18544.10 samples/sec Loss 7.0796 LearningRate 0.1217 Epoch: 10 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:37,902-Speed 18220.28 samples/sec Loss 7.1257 LearningRate 0.1217 Epoch: 10 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:42,359-Speed 18385.58 samples/sec Loss 7.1118 LearningRate 0.1216 Epoch: 10 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:46,822-Speed 18362.52 samples/sec Loss 7.1009 LearningRate 0.1216 Epoch: 10 Global Step: 52240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:27:51,237-Speed 18558.46 samples/sec Loss 7.0932 LearningRate 0.1215 Epoch: 10 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:27:55,642-Speed 18607.99 samples/sec Loss 7.0967 LearningRate 0.1215 Epoch: 10 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:00,097-Speed 18393.46 samples/sec Loss 7.0960 LearningRate 0.1214 Epoch: 10 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:04,586-Speed 18259.59 samples/sec Loss 7.0850 LearningRate 0.1214 Epoch: 10 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:09,029-Speed 18438.17 samples/sec Loss 7.1078 LearningRate 0.1213 Epoch: 10 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:13,445-Speed 18556.99 samples/sec Loss 7.0966 LearningRate 0.1213 Epoch: 10 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:17,882-Speed 18474.48 samples/sec Loss 7.1186 LearningRate 0.1212 Epoch: 10 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:22,341-Speed 18373.76 samples/sec Loss 7.1640 LearningRate 0.1212 Epoch: 10 Global Step: 52320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:26,838-Speed 18219.77 samples/sec Loss 7.0926 LearningRate 0.1211 Epoch: 10 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:28:31,257-Speed 18547.36 samples/sec Loss 7.1174 LearningRate 0.1211 Epoch: 10 Global Step: 52340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:28:35,688-Speed 18491.73 samples/sec Loss 7.1281 LearningRate 0.1210 Epoch: 10 Global Step: 52350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:28:40,206-Speed 18142.54 samples/sec Loss 7.0900 LearningRate 0.1210 Epoch: 10 Global Step: 52360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:28:44,633-Speed 18508.44 samples/sec Loss 7.0883 LearningRate 0.1209 Epoch: 10 Global Step: 52370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:28:49,096-Speed 18358.08 samples/sec Loss 7.0984 LearningRate 0.1209 Epoch: 10 Global Step: 52380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:28:53,536-Speed 18454.94 samples/sec Loss 7.1033 LearningRate 0.1209 Epoch: 10 Global Step: 52390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:28:57,985-Speed 18417.55 samples/sec Loss 7.1084 LearningRate 0.1208 Epoch: 10 Global Step: 52400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:29:02,493-Speed 18179.96 samples/sec Loss 7.1020 LearningRate 0.1208 Epoch: 10 Global Step: 52410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:29:06,973-Speed 18293.38 samples/sec Loss 7.1040 LearningRate 0.1207 Epoch: 10 Global Step: 52420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:29:11,418-Speed 18440.31 samples/sec Loss 7.0413 LearningRate 0.1207 Epoch: 10 Global Step: 52430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:29:15,898-Speed 18295.24 samples/sec Loss 7.0293 LearningRate 0.1206 Epoch: 10 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:29:20,341-Speed 18443.29 samples/sec Loss 7.1151 LearningRate 0.1206 Epoch: 10 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:29:24,809-Speed 18339.30 samples/sec Loss 7.0498 LearningRate 0.1205 Epoch: 10 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:29:29,294-Speed 18272.65 samples/sec Loss 7.1045 LearningRate 0.1205 Epoch: 10 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:29:33,776-Speed 18281.01 samples/sec Loss 7.0846 LearningRate 0.1204 Epoch: 10 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:29:38,250-Speed 18319.23 samples/sec Loss 7.0870 LearningRate 0.1204 Epoch: 10 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:29:42,730-Speed 18291.14 samples/sec Loss 7.0511 LearningRate 0.1203 Epoch: 10 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:29:47,224-Speed 18240.14 samples/sec Loss 7.0786 LearningRate 0.1203 Epoch: 10 Global Step: 52510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:29:51,687-Speed 18360.85 samples/sec Loss 7.0568 LearningRate 0.1202 Epoch: 10 Global Step: 52520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:29:56,106-Speed 18538.66 samples/sec Loss 7.0845 LearningRate 0.1202 Epoch: 10 Global Step: 52530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:00,512-Speed 18599.23 samples/sec Loss 7.0787 LearningRate 0.1201 Epoch: 10 Global Step: 52540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:04,935-Speed 18530.27 samples/sec Loss 7.0966 LearningRate 0.1201 Epoch: 10 Global Step: 52550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:09,428-Speed 18240.56 samples/sec Loss 7.0799 LearningRate 0.1201 Epoch: 10 Global Step: 52560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:13,832-Speed 18609.67 samples/sec Loss 7.0792 LearningRate 0.1200 Epoch: 10 Global Step: 52570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:18,293-Speed 18370.96 samples/sec Loss 7.0878 LearningRate 0.1200 Epoch: 10 Global Step: 52580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:22,747-Speed 18399.24 samples/sec Loss 7.0670 LearningRate 0.1199 Epoch: 10 Global Step: 52590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:27,151-Speed 18607.83 samples/sec Loss 7.0520 LearningRate 0.1199 Epoch: 10 Global Step: 52600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:31,646-Speed 18228.09 samples/sec Loss 7.1300 LearningRate 0.1198 Epoch: 10 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:30:36,081-Speed 18475.43 samples/sec Loss 7.0253 LearningRate 0.1198 Epoch: 10 Global Step: 52620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:40,481-Speed 18627.03 samples/sec Loss 7.0591 LearningRate 0.1197 Epoch: 10 Global Step: 52630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:44,922-Speed 18451.30 samples/sec Loss 7.0581 LearningRate 0.1197 Epoch: 10 Global Step: 52640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:49,329-Speed 18592.52 samples/sec Loss 7.1009 LearningRate 0.1196 Epoch: 10 Global Step: 52650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:53,746-Speed 18550.45 samples/sec Loss 7.0676 LearningRate 0.1196 Epoch: 10 Global Step: 52660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:30:58,154-Speed 18592.48 samples/sec Loss 7.0976 LearningRate 0.1195 Epoch: 10 Global Step: 52670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:31:02,564-Speed 18580.80 samples/sec Loss 7.0541 LearningRate 0.1195 Epoch: 10 Global Step: 52680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:31:06,998-Speed 18481.78 samples/sec Loss 7.1214 LearningRate 0.1194 Epoch: 10 Global Step: 52690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:31:11,406-Speed 18590.41 samples/sec Loss 7.0615 LearningRate 0.1194 Epoch: 10 Global Step: 52700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:31:15,858-Speed 18405.83 samples/sec Loss 7.0792 LearningRate 0.1193 Epoch: 10 Global Step: 52710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:31:20,302-Speed 18440.03 samples/sec Loss 7.0668 LearningRate 0.1193 Epoch: 10 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:24,740-Speed 18459.46 samples/sec Loss 7.0788 LearningRate 0.1193 Epoch: 10 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:29,190-Speed 18417.26 samples/sec Loss 7.0718 LearningRate 0.1192 Epoch: 10 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:33,656-Speed 18344.89 samples/sec Loss 7.0563 LearningRate 0.1192 Epoch: 10 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:38,106-Speed 18414.40 samples/sec Loss 7.0511 LearningRate 0.1191 Epoch: 10 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:42,530-Speed 18524.08 samples/sec Loss 7.0846 LearningRate 0.1191 Epoch: 10 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:46,956-Speed 18513.96 samples/sec Loss 7.0733 LearningRate 0.1190 Epoch: 10 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:51,404-Speed 18422.08 samples/sec Loss 7.0470 LearningRate 0.1190 Epoch: 10 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:31:55,823-Speed 18541.90 samples/sec Loss 7.0531 LearningRate 0.1189 Epoch: 10 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:32:00,264-Speed 18455.42 samples/sec Loss 7.0924 LearningRate 0.1189 Epoch: 10 Global Step: 52810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:32:04,661-Speed 18632.22 samples/sec Loss 7.0683 LearningRate 0.1188 Epoch: 10 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:32:09,111-Speed 18419.09 samples/sec Loss 7.0243 LearningRate 0.1188 Epoch: 10 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:32:13,614-Speed 18195.14 samples/sec Loss 7.0901 LearningRate 0.1187 Epoch: 10 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:32:18,103-Speed 18255.01 samples/sec Loss 7.0500 LearningRate 0.1187 Epoch: 10 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:32:22,498-Speed 18644.45 samples/sec Loss 7.0528 LearningRate 0.1186 Epoch: 10 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:32:26,913-Speed 18560.35 samples/sec Loss 7.0221 LearningRate 0.1186 Epoch: 10 Global Step: 52870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:32:31,331-Speed 18547.64 samples/sec Loss 7.0737 LearningRate 0.1186 Epoch: 10 Global Step: 52880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:32:35,736-Speed 18607.31 samples/sec Loss 7.0573 LearningRate 0.1185 Epoch: 10 Global Step: 52890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:32:40,116-Speed 18714.15 samples/sec Loss 7.0229 LearningRate 0.1185 Epoch: 10 Global Step: 52900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:32:44,569-Speed 18406.08 samples/sec Loss 7.0614 LearningRate 0.1184 Epoch: 10 Global Step: 52910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:32:48,967-Speed 18632.47 samples/sec Loss 7.0749 LearningRate 0.1184 Epoch: 10 Global Step: 52920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:32:53,391-Speed 18530.56 samples/sec Loss 7.0260 LearningRate 0.1183 Epoch: 10 Global Step: 52930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:32:57,813-Speed 18534.28 samples/sec Loss 7.0711 LearningRate 0.1183 Epoch: 10 Global Step: 52940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:33:02,220-Speed 18590.89 samples/sec Loss 7.0863 LearningRate 0.1182 Epoch: 10 Global Step: 52950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:33:06,655-Speed 18476.79 samples/sec Loss 7.1076 LearningRate 0.1182 Epoch: 10 Global Step: 52960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:33:11,118-Speed 18362.76 samples/sec Loss 7.0720 LearningRate 0.1181 Epoch: 10 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:33:15,519-Speed 18621.33 samples/sec Loss 7.0691 LearningRate 0.1181 Epoch: 10 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:33:19,952-Speed 18482.74 samples/sec Loss 7.0642 LearningRate 0.1180 Epoch: 10 Global Step: 52990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:33:24,412-Speed 18371.69 samples/sec Loss 6.9962 LearningRate 0.1180 Epoch: 10 Global Step: 53000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:28,870-Speed 18380.74 samples/sec Loss 7.0747 LearningRate 0.1179 Epoch: 10 Global Step: 53010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:33,301-Speed 18494.52 samples/sec Loss 7.0904 LearningRate 0.1179 Epoch: 10 Global Step: 53020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:37,761-Speed 18373.14 samples/sec Loss 7.0494 LearningRate 0.1179 Epoch: 10 Global Step: 53030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:42,195-Speed 18479.59 samples/sec Loss 7.0555 LearningRate 0.1178 Epoch: 10 Global Step: 53040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:46,610-Speed 18559.87 samples/sec Loss 7.0561 LearningRate 0.1178 Epoch: 10 Global Step: 53050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:51,003-Speed 18649.42 samples/sec Loss 7.0459 LearningRate 0.1177 Epoch: 10 Global Step: 53060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:55,396-Speed 18653.90 samples/sec Loss 7.0296 LearningRate 0.1177 Epoch: 10 Global Step: 53070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:33:59,799-Speed 18611.73 samples/sec Loss 7.0422 LearningRate 0.1176 Epoch: 10 Global Step: 53080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:34:04,210-Speed 18577.56 samples/sec Loss 7.0548 LearningRate 0.1176 Epoch: 10 Global Step: 53090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:34:08,640-Speed 18494.67 samples/sec Loss 7.0142 LearningRate 0.1175 Epoch: 10 Global Step: 53100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:13,090-Speed 18411.09 samples/sec Loss 7.0662 LearningRate 0.1175 Epoch: 10 Global Step: 53110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:17,513-Speed 18532.51 samples/sec Loss 7.0094 LearningRate 0.1174 Epoch: 10 Global Step: 53120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:21,929-Speed 18552.76 samples/sec Loss 7.0789 LearningRate 0.1174 Epoch: 10 Global Step: 53130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:26,342-Speed 18568.19 samples/sec Loss 7.0594 LearningRate 0.1173 Epoch: 10 Global Step: 53140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:30,867-Speed 18108.85 samples/sec Loss 7.0481 LearningRate 0.1173 Epoch: 10 Global Step: 53150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:35,349-Speed 18285.14 samples/sec Loss 7.0049 LearningRate 0.1172 Epoch: 10 Global Step: 53160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:39,793-Speed 18436.60 samples/sec Loss 7.0534 LearningRate 0.1172 Epoch: 10 Global Step: 53170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:44,204-Speed 18580.34 samples/sec Loss 7.0418 LearningRate 0.1172 Epoch: 10 Global Step: 53180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:48,641-Speed 18468.95 samples/sec Loss 7.0408 LearningRate 0.1171 Epoch: 10 Global Step: 53190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:34:53,117-Speed 18305.35 samples/sec Loss 7.0243 LearningRate 0.1171 Epoch: 10 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:34:57,495-Speed 18719.68 samples/sec Loss 7.0161 LearningRate 0.1170 Epoch: 10 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:35:01,985-Speed 18251.27 samples/sec Loss 7.0255 LearningRate 0.1170 Epoch: 10 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:35:06,421-Speed 18473.93 samples/sec Loss 7.0589 LearningRate 0.1169 Epoch: 10 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:35:10,815-Speed 18646.41 samples/sec Loss 7.0478 LearningRate 0.1169 Epoch: 10 Global Step: 53240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:15,220-Speed 18604.09 samples/sec Loss 7.0436 LearningRate 0.1168 Epoch: 10 Global Step: 53250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:19,638-Speed 18543.84 samples/sec Loss 7.0548 LearningRate 0.1168 Epoch: 10 Global Step: 53260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:24,086-Speed 18420.53 samples/sec Loss 7.0464 LearningRate 0.1167 Epoch: 10 Global Step: 53270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:28,494-Speed 18592.09 samples/sec Loss 7.0857 LearningRate 0.1167 Epoch: 10 Global Step: 53280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:32,918-Speed 18527.26 samples/sec Loss 7.0558 LearningRate 0.1166 Epoch: 10 Global Step: 53290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:37,334-Speed 18555.53 samples/sec Loss 7.0543 LearningRate 0.1166 Epoch: 10 Global Step: 53300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:41,785-Speed 18410.28 samples/sec Loss 7.0184 LearningRate 0.1166 Epoch: 10 Global Step: 53310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:46,262-Speed 18303.46 samples/sec Loss 7.0412 LearningRate 0.1165 Epoch: 10 Global Step: 53320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:50,699-Speed 18469.38 samples/sec Loss 7.0225 LearningRate 0.1165 Epoch: 10 Global Step: 53330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:35:55,154-Speed 18391.28 samples/sec Loss 7.0547 LearningRate 0.1164 Epoch: 10 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:35:59,574-Speed 18538.55 samples/sec Loss 7.0478 LearningRate 0.1164 Epoch: 10 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:04,052-Speed 18297.61 samples/sec Loss 7.0390 LearningRate 0.1163 Epoch: 10 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:08,480-Speed 18507.25 samples/sec Loss 7.0417 LearningRate 0.1163 Epoch: 10 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:18,040-Speed 8570.87 samples/sec Loss 7.0188 LearningRate 0.1162 Epoch: 10 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:22,501-Speed 18372.28 samples/sec Loss 6.9932 LearningRate 0.1162 Epoch: 10 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:26,994-Speed 18236.96 samples/sec Loss 7.0382 LearningRate 0.1161 Epoch: 10 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:31,457-Speed 18361.28 samples/sec Loss 7.0559 LearningRate 0.1161 Epoch: 10 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:35,881-Speed 18523.66 samples/sec Loss 7.0647 LearningRate 0.1160 Epoch: 10 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:40,318-Speed 18465.57 samples/sec Loss 7.0237 LearningRate 0.1160 Epoch: 10 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:44,739-Speed 18537.13 samples/sec Loss 7.0325 LearningRate 0.1160 Epoch: 10 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:36:49,144-Speed 18600.74 samples/sec Loss 7.0285 LearningRate 0.1159 Epoch: 10 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:36:53,560-Speed 18557.95 samples/sec Loss 7.0674 LearningRate 0.1159 Epoch: 10 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:36:57,962-Speed 18615.81 samples/sec Loss 7.0186 LearningRate 0.1158 Epoch: 10 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:37:02,388-Speed 18514.38 samples/sec Loss 7.0349 LearningRate 0.1158 Epoch: 10 Global Step: 53480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:37:06,852-Speed 18356.18 samples/sec Loss 7.0306 LearningRate 0.1157 Epoch: 10 Global Step: 53490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:37:11,294-Speed 18447.10 samples/sec Loss 7.0423 LearningRate 0.1157 Epoch: 10 Global Step: 53500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:37:15,771-Speed 18305.03 samples/sec Loss 6.9872 LearningRate 0.1156 Epoch: 10 Global Step: 53510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:37:20,239-Speed 18338.43 samples/sec Loss 6.9985 LearningRate 0.1156 Epoch: 10 Global Step: 53520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:37:24,724-Speed 18269.85 samples/sec Loss 7.0107 LearningRate 0.1155 Epoch: 10 Global Step: 53530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:37:29,155-Speed 18494.43 samples/sec Loss 7.0307 LearningRate 0.1155 Epoch: 10 Global Step: 53540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:37:33,572-Speed 18549.45 samples/sec Loss 7.0003 LearningRate 0.1154 Epoch: 10 Global Step: 53550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:37:37,990-Speed 18549.23 samples/sec Loss 7.0357 LearningRate 0.1154 Epoch: 10 Global Step: 53560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:37:42,476-Speed 18264.92 samples/sec Loss 6.9936 LearningRate 0.1154 Epoch: 10 Global Step: 53570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:37:46,919-Speed 18448.95 samples/sec Loss 6.9706 LearningRate 0.1153 Epoch: 10 Global Step: 53580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:37:51,349-Speed 18501.85 samples/sec Loss 7.0377 LearningRate 0.1153 Epoch: 10 Global Step: 53590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:37:55,779-Speed 18496.38 samples/sec Loss 6.9980 LearningRate 0.1152 Epoch: 10 Global Step: 53600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:38:00,203-Speed 18523.75 samples/sec Loss 6.9679 LearningRate 0.1152 Epoch: 10 Global Step: 53610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:38:04,598-Speed 18644.67 samples/sec Loss 6.9699 LearningRate 0.1151 Epoch: 10 Global Step: 53620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:38:08,991-Speed 18654.67 samples/sec Loss 7.0215 LearningRate 0.1151 Epoch: 10 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:13,376-Speed 18688.10 samples/sec Loss 6.9802 LearningRate 0.1150 Epoch: 10 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:17,764-Speed 18676.42 samples/sec Loss 6.9671 LearningRate 0.1150 Epoch: 10 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:22,195-Speed 18490.98 samples/sec Loss 7.0297 LearningRate 0.1149 Epoch: 10 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:26,620-Speed 18518.18 samples/sec Loss 6.9968 LearningRate 0.1149 Epoch: 10 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:31,056-Speed 18468.79 samples/sec Loss 6.9721 LearningRate 0.1148 Epoch: 10 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:35,464-Speed 18592.62 samples/sec Loss 7.0122 LearningRate 0.1148 Epoch: 10 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:39,879-Speed 18563.41 samples/sec Loss 7.0278 LearningRate 0.1148 Epoch: 10 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:44,319-Speed 18453.71 samples/sec Loss 7.0349 LearningRate 0.1147 Epoch: 10 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:48,769-Speed 18415.51 samples/sec Loss 6.9881 LearningRate 0.1147 Epoch: 10 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:53,198-Speed 18502.12 samples/sec Loss 6.9995 LearningRate 0.1146 Epoch: 10 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:38:57,633-Speed 18475.39 samples/sec Loss 6.9788 LearningRate 0.1146 Epoch: 10 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:39:02,027-Speed 18653.35 samples/sec Loss 6.9668 LearningRate 0.1145 Epoch: 10 Global Step: 53750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:39:06,435-Speed 18589.86 samples/sec Loss 7.0320 LearningRate 0.1145 Epoch: 10 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:39:10,865-Speed 18494.25 samples/sec Loss 6.9854 LearningRate 0.1144 Epoch: 10 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:39:15,268-Speed 18614.52 samples/sec Loss 7.0081 LearningRate 0.1144 Epoch: 10 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:39:19,703-Speed 18472.99 samples/sec Loss 7.0142 LearningRate 0.1143 Epoch: 10 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:39:24,165-Speed 18371.56 samples/sec Loss 6.9395 LearningRate 0.1143 Epoch: 10 Global Step: 53800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:28,601-Speed 18472.02 samples/sec Loss 6.9966 LearningRate 0.1143 Epoch: 10 Global Step: 53810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:33,052-Speed 18412.57 samples/sec Loss 7.0114 LearningRate 0.1142 Epoch: 10 Global Step: 53820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:37,495-Speed 18441.84 samples/sec Loss 7.0012 LearningRate 0.1142 Epoch: 10 Global Step: 53830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:41,922-Speed 18511.37 samples/sec Loss 6.9901 LearningRate 0.1141 Epoch: 10 Global Step: 53840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:46,340-Speed 18550.42 samples/sec Loss 6.9979 LearningRate 0.1141 Epoch: 10 Global Step: 53850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:50,742-Speed 18614.89 samples/sec Loss 6.9857 LearningRate 0.1140 Epoch: 10 Global Step: 53860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:55,143-Speed 18618.82 samples/sec Loss 7.0153 LearningRate 0.1140 Epoch: 10 Global Step: 53870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:39:59,565-Speed 18529.01 samples/sec Loss 6.9927 LearningRate 0.1139 Epoch: 10 Global Step: 53880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:04,062-Speed 18222.53 samples/sec Loss 7.0268 LearningRate 0.1139 Epoch: 10 Global Step: 53890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:08,479-Speed 18552.74 samples/sec Loss 6.9931 LearningRate 0.1138 Epoch: 10 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:40:12,889-Speed 18585.35 samples/sec Loss 7.0042 LearningRate 0.1138 Epoch: 10 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:40:17,330-Speed 18455.43 samples/sec Loss 6.9470 LearningRate 0.1137 Epoch: 10 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:40:21,753-Speed 18525.98 samples/sec Loss 6.9870 LearningRate 0.1137 Epoch: 10 Global Step: 53930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:40:26,168-Speed 18562.34 samples/sec Loss 6.9717 LearningRate 0.1137 Epoch: 10 Global Step: 53940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:30,586-Speed 18552.14 samples/sec Loss 6.9531 LearningRate 0.1136 Epoch: 10 Global Step: 53950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:35,037-Speed 18409.15 samples/sec Loss 6.9503 LearningRate 0.1136 Epoch: 10 Global Step: 53960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:39,490-Speed 18407.43 samples/sec Loss 7.0013 LearningRate 0.1135 Epoch: 10 Global Step: 53970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:43,910-Speed 18541.17 samples/sec Loss 7.0007 LearningRate 0.1135 Epoch: 10 Global Step: 53980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:48,370-Speed 18375.49 samples/sec Loss 7.0134 LearningRate 0.1134 Epoch: 10 Global Step: 53990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:52,756-Speed 18679.80 samples/sec Loss 6.9960 LearningRate 0.1134 Epoch: 10 Global Step: 54000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:40:57,200-Speed 18441.02 samples/sec Loss 6.9562 LearningRate 0.1133 Epoch: 10 Global Step: 54010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:01,674-Speed 18317.11 samples/sec Loss 6.9673 LearningRate 0.1133 Epoch: 10 Global Step: 54020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:06,088-Speed 18577.76 samples/sec Loss 6.9704 LearningRate 0.1132 Epoch: 10 Global Step: 54030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:10,509-Speed 18534.47 samples/sec Loss 6.9878 LearningRate 0.1132 Epoch: 10 Global Step: 54040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:41:14,953-Speed 18438.03 samples/sec Loss 6.9602 LearningRate 0.1132 Epoch: 10 Global Step: 54050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:41:19,386-Speed 18482.74 samples/sec Loss 7.0105 LearningRate 0.1131 Epoch: 10 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:41:23,818-Speed 18487.22 samples/sec Loss 7.0072 LearningRate 0.1131 Epoch: 10 Global Step: 54070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:28,341-Speed 18118.00 samples/sec Loss 6.9616 LearningRate 0.1130 Epoch: 10 Global Step: 54080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:32,765-Speed 18517.95 samples/sec Loss 6.9926 LearningRate 0.1130 Epoch: 10 Global Step: 54090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:37,171-Speed 18601.61 samples/sec Loss 6.9477 LearningRate 0.1129 Epoch: 10 Global Step: 54100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:41,615-Speed 18439.31 samples/sec Loss 7.0127 LearningRate 0.1129 Epoch: 10 Global Step: 54110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:46,023-Speed 18591.47 samples/sec Loss 7.0055 LearningRate 0.1128 Epoch: 10 Global Step: 54120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:50,432-Speed 18586.34 samples/sec Loss 7.0089 LearningRate 0.1128 Epoch: 10 Global Step: 54130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:54,816-Speed 18687.29 samples/sec Loss 6.9609 LearningRate 0.1127 Epoch: 10 Global Step: 54140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:41:59,239-Speed 18532.44 samples/sec Loss 6.9792 LearningRate 0.1127 Epoch: 10 Global Step: 54150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:42:03,691-Speed 18402.39 samples/sec Loss 6.9597 LearningRate 0.1127 Epoch: 10 Global Step: 54160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:42:08,115-Speed 18521.40 samples/sec Loss 6.9920 LearningRate 0.1126 Epoch: 10 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:12,524-Speed 18585.72 samples/sec Loss 7.0409 LearningRate 0.1126 Epoch: 10 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:16,986-Speed 18366.23 samples/sec Loss 6.9439 LearningRate 0.1125 Epoch: 10 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:21,430-Speed 18440.43 samples/sec Loss 6.9641 LearningRate 0.1125 Epoch: 10 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:25,887-Speed 18384.16 samples/sec Loss 6.9727 LearningRate 0.1124 Epoch: 10 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:30,300-Speed 18569.08 samples/sec Loss 6.9466 LearningRate 0.1124 Epoch: 10 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:34,771-Speed 18326.53 samples/sec Loss 6.9550 LearningRate 0.1123 Epoch: 10 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:39,219-Speed 18420.16 samples/sec Loss 6.9711 LearningRate 0.1123 Epoch: 10 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:43,679-Speed 18373.34 samples/sec Loss 6.9601 LearningRate 0.1122 Epoch: 10 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:48,116-Speed 18478.35 samples/sec Loss 6.9894 LearningRate 0.1122 Epoch: 10 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:42:52,557-Speed 18456.76 samples/sec Loss 6.9804 LearningRate 0.1122 Epoch: 10 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:42:57,020-Speed 18359.60 samples/sec Loss 6.9633 LearningRate 0.1121 Epoch: 10 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:01,478-Speed 18381.62 samples/sec Loss 6.9938 LearningRate 0.1121 Epoch: 10 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:05,918-Speed 18457.66 samples/sec Loss 6.9500 LearningRate 0.1120 Epoch: 10 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:10,339-Speed 18533.60 samples/sec Loss 6.9511 LearningRate 0.1120 Epoch: 10 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:14,748-Speed 18588.96 samples/sec Loss 6.9477 LearningRate 0.1119 Epoch: 10 Global Step: 54320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:19,178-Speed 18497.76 samples/sec Loss 6.9486 LearningRate 0.1119 Epoch: 10 Global Step: 54330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:23,596-Speed 18547.01 samples/sec Loss 6.9646 LearningRate 0.1118 Epoch: 10 Global Step: 54340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:28,049-Speed 18400.99 samples/sec Loss 6.9410 LearningRate 0.1118 Epoch: 10 Global Step: 54350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:32,516-Speed 18347.19 samples/sec Loss 6.9532 LearningRate 0.1117 Epoch: 10 Global Step: 54360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:36,994-Speed 18298.43 samples/sec Loss 6.9986 LearningRate 0.1117 Epoch: 10 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:41,464-Speed 18340.25 samples/sec Loss 6.9686 LearningRate 0.1117 Epoch: 10 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:45,942-Speed 18298.44 samples/sec Loss 6.9998 LearningRate 0.1116 Epoch: 10 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:43:50,427-Speed 18269.90 samples/sec Loss 6.9440 LearningRate 0.1116 Epoch: 10 Global Step: 54400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:43:54,863-Speed 18467.57 samples/sec Loss 6.9716 LearningRate 0.1115 Epoch: 10 Global Step: 54410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:43:59,270-Speed 18596.56 samples/sec Loss 6.9598 LearningRate 0.1115 Epoch: 10 Global Step: 54420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:03,744-Speed 18311.94 samples/sec Loss 6.9759 LearningRate 0.1114 Epoch: 10 Global Step: 54430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:08,180-Speed 18478.29 samples/sec Loss 6.9921 LearningRate 0.1114 Epoch: 10 Global Step: 54440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:12,556-Speed 18728.97 samples/sec Loss 6.9792 LearningRate 0.1113 Epoch: 10 Global Step: 54450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:16,953-Speed 18638.68 samples/sec Loss 6.9313 LearningRate 0.1113 Epoch: 10 Global Step: 54460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:21,349-Speed 18638.85 samples/sec Loss 6.9957 LearningRate 0.1112 Epoch: 10 Global Step: 54470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:25,756-Speed 18593.62 samples/sec Loss 6.9184 LearningRate 0.1112 Epoch: 10 Global Step: 54480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:30,186-Speed 18502.57 samples/sec Loss 6.9096 LearningRate 0.1112 Epoch: 10 Global Step: 54490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:44:34,571-Speed 18692.32 samples/sec Loss 6.9316 LearningRate 0.1111 Epoch: 10 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:44:38,971-Speed 18623.14 samples/sec Loss 6.9495 LearningRate 0.1111 Epoch: 10 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:44:43,391-Speed 18538.53 samples/sec Loss 6.9613 LearningRate 0.1110 Epoch: 10 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:44:47,824-Speed 18489.65 samples/sec Loss 6.9847 LearningRate 0.1110 Epoch: 10 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:44:52,290-Speed 18348.88 samples/sec Loss 6.9465 LearningRate 0.1109 Epoch: 10 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:44:56,685-Speed 18642.69 samples/sec Loss 6.8635 LearningRate 0.1109 Epoch: 10 Global Step: 54550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:01,096-Speed 18575.44 samples/sec Loss 6.9080 LearningRate 0.1108 Epoch: 10 Global Step: 54560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:05,494-Speed 18634.08 samples/sec Loss 6.9596 LearningRate 0.1108 Epoch: 10 Global Step: 54570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:09,871-Speed 18724.42 samples/sec Loss 6.9296 LearningRate 0.1108 Epoch: 10 Global Step: 54580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:14,293-Speed 18533.80 samples/sec Loss 6.9629 LearningRate 0.1107 Epoch: 10 Global Step: 54590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:18,760-Speed 18343.14 samples/sec Loss 6.9907 LearningRate 0.1107 Epoch: 10 Global Step: 54600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:23,162-Speed 18612.66 samples/sec Loss 6.9406 LearningRate 0.1106 Epoch: 10 Global Step: 54610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:27,626-Speed 18360.91 samples/sec Loss 6.9684 LearningRate 0.1106 Epoch: 10 Global Step: 54620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:32,028-Speed 18617.28 samples/sec Loss 6.9879 LearningRate 0.1105 Epoch: 10 Global Step: 54630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:36,425-Speed 18636.30 samples/sec Loss 6.9612 LearningRate 0.1105 Epoch: 10 Global Step: 54640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:45:40,847-Speed 18530.25 samples/sec Loss 6.9456 LearningRate 0.1104 Epoch: 10 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:45:45,259-Speed 18582.67 samples/sec Loss 6.9281 LearningRate 0.1104 Epoch: 10 Global Step: 54660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:45:49,702-Speed 18447.55 samples/sec Loss 6.9457 LearningRate 0.1103 Epoch: 10 Global Step: 54670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:45:57,393-Speed 10652.21 samples/sec Loss 6.9909 LearningRate 0.1103 Epoch: 10 Global Step: 54680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:01,826-Speed 18486.18 samples/sec Loss 6.9494 LearningRate 0.1103 Epoch: 10 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:06,285-Speed 18379.16 samples/sec Loss 6.9628 LearningRate 0.1102 Epoch: 10 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:10,669-Speed 18690.12 samples/sec Loss 6.9390 LearningRate 0.1102 Epoch: 10 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:15,069-Speed 18624.74 samples/sec Loss 6.9724 LearningRate 0.1101 Epoch: 10 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:19,498-Speed 18499.72 samples/sec Loss 6.9269 LearningRate 0.1101 Epoch: 10 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:23,889-Speed 18662.44 samples/sec Loss 6.9686 LearningRate 0.1100 Epoch: 10 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:28,306-Speed 18556.27 samples/sec Loss 6.9316 LearningRate 0.1100 Epoch: 10 Global Step: 54750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:32,725-Speed 18541.39 samples/sec Loss 6.9198 LearningRate 0.1099 Epoch: 10 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:37,135-Speed 18582.17 samples/sec Loss 6.9517 LearningRate 0.1099 Epoch: 10 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:41,545-Speed 18580.35 samples/sec Loss 6.9549 LearningRate 0.1099 Epoch: 10 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:46,006-Speed 18370.58 samples/sec Loss 6.9682 LearningRate 0.1098 Epoch: 10 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:50,561-Speed 17992.70 samples/sec Loss 6.9066 LearningRate 0.1098 Epoch: 10 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:54,981-Speed 18537.18 samples/sec Loss 6.8951 LearningRate 0.1097 Epoch: 10 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:46:59,448-Speed 18346.14 samples/sec Loss 6.9100 LearningRate 0.1097 Epoch: 10 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:03,918-Speed 18336.74 samples/sec Loss 6.9431 LearningRate 0.1096 Epoch: 10 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:08,330-Speed 18571.64 samples/sec Loss 6.8964 LearningRate 0.1096 Epoch: 10 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:12,747-Speed 18551.88 samples/sec Loss 6.9837 LearningRate 0.1095 Epoch: 10 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:17,200-Speed 18399.90 samples/sec Loss 6.9389 LearningRate 0.1095 Epoch: 10 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:21,632-Speed 18485.60 samples/sec Loss 6.9127 LearningRate 0.1094 Epoch: 10 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:26,034-Speed 18617.63 samples/sec Loss 6.9167 LearningRate 0.1094 Epoch: 10 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:30,436-Speed 18613.98 samples/sec Loss 6.9779 LearningRate 0.1094 Epoch: 10 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:34,838-Speed 18617.62 samples/sec Loss 6.9385 LearningRate 0.1093 Epoch: 10 Global Step: 54900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:47:39,244-Speed 18597.16 samples/sec Loss 6.9617 LearningRate 0.1093 Epoch: 10 Global Step: 54910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:47:43,642-Speed 18629.89 samples/sec Loss 6.9477 LearningRate 0.1092 Epoch: 10 Global Step: 54920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:47:48,082-Speed 18457.44 samples/sec Loss 6.9346 LearningRate 0.1092 Epoch: 10 Global Step: 54930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:47:52,497-Speed 18561.48 samples/sec Loss 6.9210 LearningRate 0.1091 Epoch: 10 Global Step: 54940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:47:56,954-Speed 18383.78 samples/sec Loss 6.8711 LearningRate 0.1091 Epoch: 10 Global Step: 54950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:48:01,361-Speed 18596.90 samples/sec Loss 6.9322 LearningRate 0.1090 Epoch: 10 Global Step: 54960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:48:05,795-Speed 18480.98 samples/sec Loss 6.9288 LearningRate 0.1090 Epoch: 10 Global Step: 54970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:48:10,187-Speed 18657.26 samples/sec Loss 6.9172 LearningRate 0.1090 Epoch: 10 Global Step: 54980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:48:14,580-Speed 18648.21 samples/sec Loss 6.9012 LearningRate 0.1089 Epoch: 10 Global Step: 54990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:48:19,073-Speed 18238.74 samples/sec Loss 6.9558 LearningRate 0.1089 Epoch: 10 Global Step: 55000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:48:23,487-Speed 18562.90 samples/sec Loss 6.9163 LearningRate 0.1088 Epoch: 10 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:27,912-Speed 18518.82 samples/sec Loss 6.8956 LearningRate 0.1088 Epoch: 10 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:32,353-Speed 18450.52 samples/sec Loss 6.9034 LearningRate 0.1087 Epoch: 10 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:36,786-Speed 18485.84 samples/sec Loss 6.8923 LearningRate 0.1087 Epoch: 10 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:41,184-Speed 18627.99 samples/sec Loss 6.9655 LearningRate 0.1086 Epoch: 10 Global Step: 55050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:45,646-Speed 18368.37 samples/sec Loss 6.9292 LearningRate 0.1086 Epoch: 10 Global Step: 55060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:50,091-Speed 18431.98 samples/sec Loss 6.9645 LearningRate 0.1086 Epoch: 10 Global Step: 55070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:54,494-Speed 18610.63 samples/sec Loss 6.9022 LearningRate 0.1085 Epoch: 10 Global Step: 55080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:48:58,934-Speed 18457.39 samples/sec Loss 6.9695 LearningRate 0.1085 Epoch: 10 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:49:03,352-Speed 18545.38 samples/sec Loss 6.9469 LearningRate 0.1084 Epoch: 10 Global Step: 55100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:49:07,768-Speed 18557.86 samples/sec Loss 6.9219 LearningRate 0.1084 Epoch: 10 Global Step: 55110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:49:12,187-Speed 18540.69 samples/sec Loss 6.8952 LearningRate 0.1083 Epoch: 10 Global Step: 55120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:16,560-Speed 18741.72 samples/sec Loss 6.9146 LearningRate 0.1083 Epoch: 10 Global Step: 55130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:20,956-Speed 18645.09 samples/sec Loss 6.8505 LearningRate 0.1082 Epoch: 10 Global Step: 55140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:25,374-Speed 18552.12 samples/sec Loss 6.9166 LearningRate 0.1082 Epoch: 10 Global Step: 55150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:29,819-Speed 18433.99 samples/sec Loss 6.8951 LearningRate 0.1082 Epoch: 10 Global Step: 55160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:34,231-Speed 18574.88 samples/sec Loss 6.8694 LearningRate 0.1081 Epoch: 10 Global Step: 55170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:38,653-Speed 18527.41 samples/sec Loss 6.9003 LearningRate 0.1081 Epoch: 10 Global Step: 55180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:43,093-Speed 18461.60 samples/sec Loss 6.9088 LearningRate 0.1080 Epoch: 10 Global Step: 55190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:47,515-Speed 18536.06 samples/sec Loss 6.8823 LearningRate 0.1080 Epoch: 10 Global Step: 55200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:51,913-Speed 18629.22 samples/sec Loss 6.8596 LearningRate 0.1079 Epoch: 10 Global Step: 55210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:49:56,307-Speed 18648.58 samples/sec Loss 6.9175 LearningRate 0.1079 Epoch: 10 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:00,759-Speed 18407.30 samples/sec Loss 6.8835 LearningRate 0.1078 Epoch: 10 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:05,212-Speed 18402.82 samples/sec Loss 6.9362 LearningRate 0.1078 Epoch: 10 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:09,631-Speed 18539.93 samples/sec Loss 6.9571 LearningRate 0.1077 Epoch: 10 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:14,084-Speed 18400.18 samples/sec Loss 6.9032 LearningRate 0.1077 Epoch: 10 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:18,524-Speed 18457.69 samples/sec Loss 6.9155 LearningRate 0.1077 Epoch: 10 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:22,936-Speed 18572.04 samples/sec Loss 6.8531 LearningRate 0.1076 Epoch: 10 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:27,460-Speed 18109.58 samples/sec Loss 6.8593 LearningRate 0.1076 Epoch: 10 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:31,959-Speed 18215.02 samples/sec Loss 6.8598 LearningRate 0.1075 Epoch: 10 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:36,388-Speed 18501.07 samples/sec Loss 6.8870 LearningRate 0.1075 Epoch: 10 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:40,801-Speed 18566.85 samples/sec Loss 6.9131 LearningRate 0.1074 Epoch: 10 Global Step: 55320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:50:45,201-Speed 18625.55 samples/sec Loss 6.8926 LearningRate 0.1074 Epoch: 10 Global Step: 55330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:50:49,596-Speed 18646.30 samples/sec Loss 6.9210 LearningRate 0.1073 Epoch: 10 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:54,006-Speed 18579.74 samples/sec Loss 6.8624 LearningRate 0.1073 Epoch: 10 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:50:58,410-Speed 18606.13 samples/sec Loss 6.8719 LearningRate 0.1073 Epoch: 10 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:02,814-Speed 18604.37 samples/sec Loss 6.9105 LearningRate 0.1072 Epoch: 10 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:07,227-Speed 18574.19 samples/sec Loss 6.8803 LearningRate 0.1072 Epoch: 10 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:11,639-Speed 18570.05 samples/sec Loss 6.8806 LearningRate 0.1071 Epoch: 10 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:16,038-Speed 18629.10 samples/sec Loss 6.8740 LearningRate 0.1071 Epoch: 10 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:20,411-Speed 18737.63 samples/sec Loss 6.9016 LearningRate 0.1070 Epoch: 10 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:24,827-Speed 18554.18 samples/sec Loss 6.9332 LearningRate 0.1070 Epoch: 10 Global Step: 55420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:29,263-Speed 18473.69 samples/sec Loss 6.8814 LearningRate 0.1069 Epoch: 10 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:33,656-Speed 18649.29 samples/sec Loss 6.8634 LearningRate 0.1069 Epoch: 10 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:51:38,057-Speed 18623.24 samples/sec Loss 6.8958 LearningRate 0.1069 Epoch: 10 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:51:42,488-Speed 18490.58 samples/sec Loss 6.8969 LearningRate 0.1068 Epoch: 10 Global Step: 55460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:51:46,929-Speed 18456.05 samples/sec Loss 6.9022 LearningRate 0.1068 Epoch: 10 Global Step: 55470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:51:51,317-Speed 18677.22 samples/sec Loss 6.9095 LearningRate 0.1067 Epoch: 10 Global Step: 55480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:51:55,703-Speed 18681.43 samples/sec Loss 6.8857 LearningRate 0.1067 Epoch: 10 Global Step: 55490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:52:00,096-Speed 18656.95 samples/sec Loss 6.8887 LearningRate 0.1066 Epoch: 10 Global Step: 55500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:52:04,523-Speed 18508.03 samples/sec Loss 6.8790 LearningRate 0.1066 Epoch: 10 Global Step: 55510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:52:08,946-Speed 18529.79 samples/sec Loss 6.8583 LearningRate 0.1066 Epoch: 10 Global Step: 55520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:52:13,383-Speed 18470.16 samples/sec Loss 6.8839 LearningRate 0.1065 Epoch: 10 Global Step: 55530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:52:17,796-Speed 18568.53 samples/sec Loss 6.8518 LearningRate 0.1065 Epoch: 10 Global Step: 55540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:52:22,298-Speed 18204.57 samples/sec Loss 6.8742 LearningRate 0.1064 Epoch: 10 Global Step: 55550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:52:26,712-Speed 18562.22 samples/sec Loss 6.8882 LearningRate 0.1064 Epoch: 10 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:52:31,118-Speed 18600.47 samples/sec Loss 6.8528 LearningRate 0.1063 Epoch: 10 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:52:35,494-Speed 18729.77 samples/sec Loss 6.8695 LearningRate 0.1063 Epoch: 10 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:52:39,868-Speed 18735.66 samples/sec Loss 6.8776 LearningRate 0.1062 Epoch: 10 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:52:44,348-Speed 18290.09 samples/sec Loss 6.8631 LearningRate 0.1062 Epoch: 10 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:52:48,752-Speed 18610.66 samples/sec Loss 6.8777 LearningRate 0.1062 Epoch: 10 Global Step: 55610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:52:53,153-Speed 18616.84 samples/sec Loss 6.9278 LearningRate 0.1061 Epoch: 10 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:52:57,530-Speed 18720.30 samples/sec Loss 6.8555 LearningRate 0.1061 Epoch: 10 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:53:01,970-Speed 18455.40 samples/sec Loss 6.8975 LearningRate 0.1060 Epoch: 10 Global Step: 55640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:06,396-Speed 18514.63 samples/sec Loss 6.9073 LearningRate 0.1060 Epoch: 10 Global Step: 55650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:10,806-Speed 18582.51 samples/sec Loss 6.8857 LearningRate 0.1059 Epoch: 10 Global Step: 55660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:15,206-Speed 18625.06 samples/sec Loss 6.8349 LearningRate 0.1059 Epoch: 10 Global Step: 55670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:19,640-Speed 18478.92 samples/sec Loss 6.8700 LearningRate 0.1058 Epoch: 10 Global Step: 55680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:24,072-Speed 18486.62 samples/sec Loss 6.8979 LearningRate 0.1058 Epoch: 10 Global Step: 55690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:28,506-Speed 18478.93 samples/sec Loss 6.8950 LearningRate 0.1058 Epoch: 10 Global Step: 55700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:32,994-Speed 18260.36 samples/sec Loss 6.8632 LearningRate 0.1057 Epoch: 10 Global Step: 55710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:37,421-Speed 18510.66 samples/sec Loss 6.7753 LearningRate 0.1057 Epoch: 10 Global Step: 55720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:41,862-Speed 18450.46 samples/sec Loss 6.8153 LearningRate 0.1056 Epoch: 10 Global Step: 55730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:53:46,285-Speed 18527.68 samples/sec Loss 6.8898 LearningRate 0.1056 Epoch: 10 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:53:50,674-Speed 18670.72 samples/sec Loss 6.8732 LearningRate 0.1055 Epoch: 10 Global Step: 55750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:53:55,083-Speed 18583.75 samples/sec Loss 6.8907 LearningRate 0.1055 Epoch: 10 Global Step: 55760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:53:59,539-Speed 18387.14 samples/sec Loss 6.8337 LearningRate 0.1054 Epoch: 10 Global Step: 55770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:03,967-Speed 18506.88 samples/sec Loss 6.8696 LearningRate 0.1054 Epoch: 10 Global Step: 55780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:08,448-Speed 18294.88 samples/sec Loss 6.7875 LearningRate 0.1054 Epoch: 10 Global Step: 55790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:12,883-Speed 18475.66 samples/sec Loss 6.8285 LearningRate 0.1053 Epoch: 10 Global Step: 55800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:17,313-Speed 18503.30 samples/sec Loss 6.8372 LearningRate 0.1053 Epoch: 10 Global Step: 55810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:21,723-Speed 18584.09 samples/sec Loss 6.8935 LearningRate 0.1052 Epoch: 10 Global Step: 55820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:26,185-Speed 18365.60 samples/sec Loss 6.8634 LearningRate 0.1052 Epoch: 10 Global Step: 55830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:30,563-Speed 18713.58 samples/sec Loss 6.8509 LearningRate 0.1051 Epoch: 10 Global Step: 55840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 04:54:35,027-Speed 18360.54 samples/sec Loss 6.9288 LearningRate 0.1051 Epoch: 10 Global Step: 55850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:54:39,415-Speed 18676.36 samples/sec Loss 6.9081 LearningRate 0.1051 Epoch: 10 Global Step: 55860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:54:43,847-Speed 18490.75 samples/sec Loss 6.8306 LearningRate 0.1050 Epoch: 10 Global Step: 55870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:54:48,331-Speed 18271.40 samples/sec Loss 6.8982 LearningRate 0.1050 Epoch: 10 Global Step: 55880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:54:52,838-Speed 18180.37 samples/sec Loss 6.8846 LearningRate 0.1049 Epoch: 10 Global Step: 55890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:54:57,268-Speed 18498.95 samples/sec Loss 6.8977 LearningRate 0.1049 Epoch: 10 Global Step: 55900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:55:01,716-Speed 18427.25 samples/sec Loss 6.8687 LearningRate 0.1048 Epoch: 10 Global Step: 55910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:55:06,163-Speed 18429.03 samples/sec Loss 6.8669 LearningRate 0.1048 Epoch: 10 Global Step: 55920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:55:10,580-Speed 18554.90 samples/sec Loss 6.8729 LearningRate 0.1047 Epoch: 10 Global Step: 55930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:55:15,002-Speed 18529.47 samples/sec Loss 6.8202 LearningRate 0.1047 Epoch: 10 Global Step: 55940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:55:19,470-Speed 18340.47 samples/sec Loss 6.8721 LearningRate 0.1047 Epoch: 10 Global Step: 55950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:23,904-Speed 18480.55 samples/sec Loss 6.8900 LearningRate 0.1046 Epoch: 10 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:28,329-Speed 18520.02 samples/sec Loss 6.8758 LearningRate 0.1046 Epoch: 10 Global Step: 55970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:32,769-Speed 18453.64 samples/sec Loss 6.8511 LearningRate 0.1045 Epoch: 10 Global Step: 55980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:37,154-Speed 18687.09 samples/sec Loss 6.8859 LearningRate 0.1045 Epoch: 10 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:41,585-Speed 18492.74 samples/sec Loss 6.8616 LearningRate 0.1044 Epoch: 10 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:46,011-Speed 18515.40 samples/sec Loss 6.8727 LearningRate 0.1044 Epoch: 10 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:50,400-Speed 18670.00 samples/sec Loss 6.8559 LearningRate 0.1044 Epoch: 10 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:54,815-Speed 18562.46 samples/sec Loss 6.8580 LearningRate 0.1043 Epoch: 10 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:55:59,215-Speed 18623.32 samples/sec Loss 6.8505 LearningRate 0.1043 Epoch: 10 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:03,634-Speed 18542.13 samples/sec Loss 6.8153 LearningRate 0.1042 Epoch: 10 Global Step: 56050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:56:08,037-Speed 18613.91 samples/sec Loss 6.8391 LearningRate 0.1042 Epoch: 10 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:17,894-Speed 8311.78 samples/sec Loss 6.8106 LearningRate 0.1041 Epoch: 10 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:22,304-Speed 18580.52 samples/sec Loss 6.8453 LearningRate 0.1041 Epoch: 10 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:26,738-Speed 18479.80 samples/sec Loss 6.8535 LearningRate 0.1040 Epoch: 10 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:31,137-Speed 18624.91 samples/sec Loss 6.8645 LearningRate 0.1040 Epoch: 10 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:35,534-Speed 18639.82 samples/sec Loss 6.8168 LearningRate 0.1040 Epoch: 10 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:39,952-Speed 18545.96 samples/sec Loss 6.8431 LearningRate 0.1039 Epoch: 10 Global Step: 56120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:44,349-Speed 18642.12 samples/sec Loss 6.8710 LearningRate 0.1039 Epoch: 10 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:48,800-Speed 18413.27 samples/sec Loss 6.8394 LearningRate 0.1038 Epoch: 10 Global Step: 56140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:53,225-Speed 18522.56 samples/sec Loss 6.8379 LearningRate 0.1038 Epoch: 10 Global Step: 56150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:56:57,642-Speed 18548.97 samples/sec Loss 6.8918 LearningRate 0.1037 Epoch: 10 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 04:57:02,122-Speed 18347.09 samples/sec Loss 6.8958 LearningRate 0.1037 Epoch: 10 Global Step: 56170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:57:06,518-Speed 18645.10 samples/sec Loss 6.8199 LearningRate 0.1037 Epoch: 10 Global Step: 56180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:57:10,984-Speed 18345.85 samples/sec Loss 6.8309 LearningRate 0.1036 Epoch: 10 Global Step: 56190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:57:15,440-Speed 18389.78 samples/sec Loss 6.8381 LearningRate 0.1036 Epoch: 10 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:19,920-Speed 18294.25 samples/sec Loss 6.8396 LearningRate 0.1035 Epoch: 10 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:24,353-Speed 18481.57 samples/sec Loss 6.8159 LearningRate 0.1035 Epoch: 10 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:28,931-Speed 17898.38 samples/sec Loss 6.8030 LearningRate 0.1034 Epoch: 10 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:33,386-Speed 18392.65 samples/sec Loss 6.8173 LearningRate 0.1034 Epoch: 10 Global Step: 56240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:37,819-Speed 18492.46 samples/sec Loss 6.8507 LearningRate 0.1033 Epoch: 10 Global Step: 56250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:42,233-Speed 18566.82 samples/sec Loss 6.8524 LearningRate 0.1033 Epoch: 10 Global Step: 56260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:46,691-Speed 18387.92 samples/sec Loss 6.8347 LearningRate 0.1033 Epoch: 10 Global Step: 56270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:51,116-Speed 18519.73 samples/sec Loss 6.8740 LearningRate 0.1032 Epoch: 10 Global Step: 56280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:55,543-Speed 18510.24 samples/sec Loss 6.8481 LearningRate 0.1032 Epoch: 10 Global Step: 56290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:57:59,966-Speed 18524.79 samples/sec Loss 6.8683 LearningRate 0.1031 Epoch: 10 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:58:04,416-Speed 18416.48 samples/sec Loss 6.8509 LearningRate 0.1031 Epoch: 10 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:58:08,853-Speed 18465.97 samples/sec Loss 6.8385 LearningRate 0.1030 Epoch: 10 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:58:13,328-Speed 18311.57 samples/sec Loss 6.8404 LearningRate 0.1030 Epoch: 10 Global Step: 56330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:17,745-Speed 18552.96 samples/sec Loss 6.8205 LearningRate 0.1030 Epoch: 10 Global Step: 56340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:22,166-Speed 18532.31 samples/sec Loss 6.8338 LearningRate 0.1029 Epoch: 10 Global Step: 56350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:26,599-Speed 18486.79 samples/sec Loss 6.8560 LearningRate 0.1029 Epoch: 10 Global Step: 56360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:31,057-Speed 18384.29 samples/sec Loss 6.8470 LearningRate 0.1028 Epoch: 10 Global Step: 56370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:35,456-Speed 18628.80 samples/sec Loss 6.8151 LearningRate 0.1028 Epoch: 10 Global Step: 56380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:39,851-Speed 18654.09 samples/sec Loss 6.8342 LearningRate 0.1027 Epoch: 10 Global Step: 56390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:44,258-Speed 18593.12 samples/sec Loss 6.8493 LearningRate 0.1027 Epoch: 10 Global Step: 56400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:48,652-Speed 18648.13 samples/sec Loss 6.8675 LearningRate 0.1026 Epoch: 10 Global Step: 56410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:53,059-Speed 18593.76 samples/sec Loss 6.8813 LearningRate 0.1026 Epoch: 10 Global Step: 56420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:58:57,464-Speed 18599.51 samples/sec Loss 6.8472 LearningRate 0.1026 Epoch: 10 Global Step: 56430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:01,949-Speed 18272.83 samples/sec Loss 6.7797 LearningRate 0.1025 Epoch: 10 Global Step: 56440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:06,348-Speed 18624.06 samples/sec Loss 6.7402 LearningRate 0.1025 Epoch: 10 Global Step: 56450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:10,875-Speed 18104.18 samples/sec Loss 6.8253 LearningRate 0.1024 Epoch: 10 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:15,307-Speed 18490.95 samples/sec Loss 6.8152 LearningRate 0.1024 Epoch: 10 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:19,761-Speed 18397.40 samples/sec Loss 6.8204 LearningRate 0.1023 Epoch: 10 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:24,221-Speed 18370.55 samples/sec Loss 6.8244 LearningRate 0.1023 Epoch: 10 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:28,681-Speed 18373.88 samples/sec Loss 6.8222 LearningRate 0.1023 Epoch: 10 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:33,166-Speed 18267.93 samples/sec Loss 6.8213 LearningRate 0.1022 Epoch: 10 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 04:59:37,626-Speed 18373.81 samples/sec Loss 6.7828 LearningRate 0.1022 Epoch: 10 Global Step: 56520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:59:42,047-Speed 18538.37 samples/sec Loss 6.8173 LearningRate 0.1021 Epoch: 10 Global Step: 56530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:59:46,493-Speed 18433.47 samples/sec Loss 6.8517 LearningRate 0.1021 Epoch: 10 Global Step: 56540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:59:50,985-Speed 18238.92 samples/sec Loss 6.8525 LearningRate 0.1020 Epoch: 10 Global Step: 56550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:59:55,397-Speed 18571.96 samples/sec Loss 6.8510 LearningRate 0.1020 Epoch: 10 Global Step: 56560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 04:59:59,875-Speed 18298.64 samples/sec Loss 6.8258 LearningRate 0.1020 Epoch: 10 Global Step: 56570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:00:04,349-Speed 18314.36 samples/sec Loss 6.8310 LearningRate 0.1019 Epoch: 10 Global Step: 56580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:00:08,787-Speed 18462.44 samples/sec Loss 6.7905 LearningRate 0.1019 Epoch: 10 Global Step: 56590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:00:13,240-Speed 18402.49 samples/sec Loss 6.8251 LearningRate 0.1018 Epoch: 10 Global Step: 56600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:00:17,704-Speed 18355.80 samples/sec Loss 6.8394 LearningRate 0.1018 Epoch: 10 Global Step: 56610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:00:22,122-Speed 18547.06 samples/sec Loss 6.8004 LearningRate 0.1017 Epoch: 10 Global Step: 56620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:00:26,553-Speed 18492.77 samples/sec Loss 6.8236 LearningRate 0.1017 Epoch: 10 Global Step: 56630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:00:30,959-Speed 18597.35 samples/sec Loss 6.8110 LearningRate 0.1017 Epoch: 10 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:00:35,389-Speed 18493.97 samples/sec Loss 6.7962 LearningRate 0.1016 Epoch: 10 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:00:39,819-Speed 18498.77 samples/sec Loss 6.7851 LearningRate 0.1016 Epoch: 10 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:00:44,211-Speed 18660.13 samples/sec Loss 6.7704 LearningRate 0.1015 Epoch: 10 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:00:48,627-Speed 18558.21 samples/sec Loss 6.8374 LearningRate 0.1015 Epoch: 10 Global Step: 56680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:00:53,111-Speed 18274.09 samples/sec Loss 6.8232 LearningRate 0.1014 Epoch: 10 Global Step: 56690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:00:58,541-Speed 15091.50 samples/sec Loss 6.7898 LearningRate 0.1014 Epoch: 10 Global Step: 56700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:02,942-Speed 18619.43 samples/sec Loss 6.7880 LearningRate 0.1014 Epoch: 10 Global Step: 56710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:07,380-Speed 18465.45 samples/sec Loss 6.7765 LearningRate 0.1013 Epoch: 10 Global Step: 56720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:11,767-Speed 18682.09 samples/sec Loss 6.7960 LearningRate 0.1013 Epoch: 10 Global Step: 56730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:16,205-Speed 18461.63 samples/sec Loss 6.8050 LearningRate 0.1012 Epoch: 10 Global Step: 56740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:20,630-Speed 18518.96 samples/sec Loss 6.8265 LearningRate 0.1012 Epoch: 10 Global Step: 56750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:25,049-Speed 18543.77 samples/sec Loss 6.8179 LearningRate 0.1011 Epoch: 10 Global Step: 56760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:29,485-Speed 18471.89 samples/sec Loss 6.7808 LearningRate 0.1011 Epoch: 10 Global Step: 56770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:01:33,879-Speed 18648.57 samples/sec Loss 6.7888 LearningRate 0.1010 Epoch: 10 Global Step: 56780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:01:38,335-Speed 18395.02 samples/sec Loss 6.8003 LearningRate 0.1010 Epoch: 10 Global Step: 56790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:01:42,755-Speed 18534.65 samples/sec Loss 6.8256 LearningRate 0.1010 Epoch: 10 Global Step: 56800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:01:47,164-Speed 18588.08 samples/sec Loss 6.8190 LearningRate 0.1009 Epoch: 10 Global Step: 56810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:01:51,591-Speed 18510.14 samples/sec Loss 6.8228 LearningRate 0.1009 Epoch: 10 Global Step: 56820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:01:56,030-Speed 18457.71 samples/sec Loss 6.8254 LearningRate 0.1008 Epoch: 10 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:02:00,479-Speed 18420.50 samples/sec Loss 6.8539 LearningRate 0.1008 Epoch: 10 Global Step: 56840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:04,896-Speed 18549.85 samples/sec Loss 6.8109 LearningRate 0.1007 Epoch: 10 Global Step: 56850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:09,320-Speed 18524.40 samples/sec Loss 6.8121 LearningRate 0.1007 Epoch: 10 Global Step: 56860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:13,715-Speed 18645.30 samples/sec Loss 6.8290 LearningRate 0.1007 Epoch: 10 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:18,181-Speed 18349.63 samples/sec Loss 6.7548 LearningRate 0.1006 Epoch: 10 Global Step: 56880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:22,565-Speed 18690.43 samples/sec Loss 6.8141 LearningRate 0.1006 Epoch: 10 Global Step: 56890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:26,967-Speed 18614.34 samples/sec Loss 6.8427 LearningRate 0.1005 Epoch: 10 Global Step: 56900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:31,365-Speed 18630.01 samples/sec Loss 6.8167 LearningRate 0.1005 Epoch: 10 Global Step: 56910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:35,780-Speed 18560.81 samples/sec Loss 6.8131 LearningRate 0.1004 Epoch: 10 Global Step: 56920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:40,253-Speed 18320.38 samples/sec Loss 6.7798 LearningRate 0.1004 Epoch: 10 Global Step: 56930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:02:44,676-Speed 18522.51 samples/sec Loss 6.7789 LearningRate 0.1004 Epoch: 10 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:02:49,101-Speed 18520.33 samples/sec Loss 6.7927 LearningRate 0.1003 Epoch: 10 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:02:53,540-Speed 18459.35 samples/sec Loss 6.8043 LearningRate 0.1003 Epoch: 10 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:02:57,948-Speed 18588.71 samples/sec Loss 6.8034 LearningRate 0.1002 Epoch: 10 Global Step: 56970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:02,335-Speed 18681.56 samples/sec Loss 6.7975 LearningRate 0.1002 Epoch: 10 Global Step: 56980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:06,722-Speed 18679.35 samples/sec Loss 6.8370 LearningRate 0.1001 Epoch: 10 Global Step: 56990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:11,168-Speed 18436.95 samples/sec Loss 6.8144 LearningRate 0.1001 Epoch: 10 Global Step: 57000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:15,600-Speed 18488.01 samples/sec Loss 6.8128 LearningRate 0.1001 Epoch: 10 Global Step: 57010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:20,044-Speed 18439.22 samples/sec Loss 6.7799 LearningRate 0.1000 Epoch: 10 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:24,425-Speed 18700.30 samples/sec Loss 6.8116 LearningRate 0.1000 Epoch: 10 Global Step: 57030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:43,390-Speed 4319.81 samples/sec Loss 6.7855 LearningRate 0.0999 Epoch: 11 Global Step: 57040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:47,800-Speed 18585.73 samples/sec Loss 6.7415 LearningRate 0.0999 Epoch: 11 Global Step: 57050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:52,215-Speed 18564.05 samples/sec Loss 6.8072 LearningRate 0.0998 Epoch: 11 Global Step: 57060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:03:56,629-Speed 18565.48 samples/sec Loss 6.8184 LearningRate 0.0998 Epoch: 11 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:04:01,098-Speed 18331.92 samples/sec Loss 6.7603 LearningRate 0.0998 Epoch: 11 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:04:05,520-Speed 18529.05 samples/sec Loss 6.7727 LearningRate 0.0997 Epoch: 11 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:04:09,990-Speed 18333.46 samples/sec Loss 6.7989 LearningRate 0.0997 Epoch: 11 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:04:14,389-Speed 18627.58 samples/sec Loss 6.8450 LearningRate 0.0996 Epoch: 11 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:04:18,794-Speed 18604.42 samples/sec Loss 6.7483 LearningRate 0.0996 Epoch: 11 Global Step: 57120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:23,231-Speed 18464.86 samples/sec Loss 6.7590 LearningRate 0.0995 Epoch: 11 Global Step: 57130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:27,676-Speed 18435.79 samples/sec Loss 6.8002 LearningRate 0.0995 Epoch: 11 Global Step: 57140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:32,161-Speed 18270.67 samples/sec Loss 6.7718 LearningRate 0.0995 Epoch: 11 Global Step: 57150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:36,602-Speed 18447.46 samples/sec Loss 6.7468 LearningRate 0.0994 Epoch: 11 Global Step: 57160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:41,053-Speed 18412.68 samples/sec Loss 6.7263 LearningRate 0.0994 Epoch: 11 Global Step: 57170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:45,536-Speed 18276.30 samples/sec Loss 6.7865 LearningRate 0.0993 Epoch: 11 Global Step: 57180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:50,001-Speed 18353.44 samples/sec Loss 6.7907 LearningRate 0.0993 Epoch: 11 Global Step: 57190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:54,434-Speed 18480.79 samples/sec Loss 6.7729 LearningRate 0.0992 Epoch: 11 Global Step: 57200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:04:58,866-Speed 18488.64 samples/sec Loss 6.7069 LearningRate 0.0992 Epoch: 11 Global Step: 57210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:05:03,298-Speed 18490.57 samples/sec Loss 6.7345 LearningRate 0.0992 Epoch: 11 Global Step: 57220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:05:07,727-Speed 18496.89 samples/sec Loss 6.7413 LearningRate 0.0991 Epoch: 11 Global Step: 57230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:05:12,143-Speed 18556.35 samples/sec Loss 6.7850 LearningRate 0.0991 Epoch: 11 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:05:16,529-Speed 18681.52 samples/sec Loss 6.7577 LearningRate 0.0990 Epoch: 11 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:05:20,928-Speed 18626.46 samples/sec Loss 6.8066 LearningRate 0.0990 Epoch: 11 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:05:25,348-Speed 18537.16 samples/sec Loss 6.8069 LearningRate 0.0989 Epoch: 11 Global Step: 57270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:05:35,771-Speed 7860.57 samples/sec Loss 6.7850 LearningRate 0.0989 Epoch: 11 Global Step: 57280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:05:40,176-Speed 18599.56 samples/sec Loss 6.8097 LearningRate 0.0989 Epoch: 11 Global Step: 57290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:05:44,608-Speed 18487.14 samples/sec Loss 6.8003 LearningRate 0.0988 Epoch: 11 Global Step: 57300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:05:49,003-Speed 18641.35 samples/sec Loss 6.7573 LearningRate 0.0988 Epoch: 11 Global Step: 57310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:05:53,420-Speed 18552.43 samples/sec Loss 6.7667 LearningRate 0.0987 Epoch: 11 Global Step: 57320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:05:57,820-Speed 18621.60 samples/sec Loss 6.7944 LearningRate 0.0987 Epoch: 11 Global Step: 57330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:02,267-Speed 18426.30 samples/sec Loss 6.7612 LearningRate 0.0986 Epoch: 11 Global Step: 57340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:06,691-Speed 18523.80 samples/sec Loss 6.7655 LearningRate 0.0986 Epoch: 11 Global Step: 57350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:11,126-Speed 18477.32 samples/sec Loss 6.7431 LearningRate 0.0986 Epoch: 11 Global Step: 57360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:15,600-Speed 18318.13 samples/sec Loss 6.8032 LearningRate 0.0985 Epoch: 11 Global Step: 57370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:06:20,059-Speed 18378.64 samples/sec Loss 6.7729 LearningRate 0.0985 Epoch: 11 Global Step: 57380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:06:24,555-Speed 18224.19 samples/sec Loss 6.7820 LearningRate 0.0984 Epoch: 11 Global Step: 57390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:06:28,965-Speed 18578.41 samples/sec Loss 6.7496 LearningRate 0.0984 Epoch: 11 Global Step: 57400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:06:33,396-Speed 18490.48 samples/sec Loss 6.7706 LearningRate 0.0984 Epoch: 11 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:06:37,954-Speed 17980.24 samples/sec Loss 6.7953 LearningRate 0.0983 Epoch: 11 Global Step: 57420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:42,380-Speed 18516.65 samples/sec Loss 6.7374 LearningRate 0.0983 Epoch: 11 Global Step: 57430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:46,799-Speed 18550.78 samples/sec Loss 6.7892 LearningRate 0.0982 Epoch: 11 Global Step: 57440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:51,235-Speed 18471.39 samples/sec Loss 6.7676 LearningRate 0.0982 Epoch: 11 Global Step: 57450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:06:55,619-Speed 18694.52 samples/sec Loss 6.7495 LearningRate 0.0981 Epoch: 11 Global Step: 57460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:07:00,046-Speed 18510.77 samples/sec Loss 6.7636 LearningRate 0.0981 Epoch: 11 Global Step: 57470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:07:04,480-Speed 18482.55 samples/sec Loss 6.7601 LearningRate 0.0981 Epoch: 11 Global Step: 57480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:07:08,881-Speed 18620.35 samples/sec Loss 6.7508 LearningRate 0.0980 Epoch: 11 Global Step: 57490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:07:13,293-Speed 18571.03 samples/sec Loss 6.7423 LearningRate 0.0980 Epoch: 11 Global Step: 57500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:07:17,752-Speed 18376.92 samples/sec Loss 6.7699 LearningRate 0.0979 Epoch: 11 Global Step: 57510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:07:22,205-Speed 18402.23 samples/sec Loss 6.7961 LearningRate 0.0979 Epoch: 11 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:26,661-Speed 18388.82 samples/sec Loss 6.7813 LearningRate 0.0978 Epoch: 11 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:31,054-Speed 18653.64 samples/sec Loss 6.7640 LearningRate 0.0978 Epoch: 11 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:35,482-Speed 18505.06 samples/sec Loss 6.7576 LearningRate 0.0978 Epoch: 11 Global Step: 57550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:39,946-Speed 18356.97 samples/sec Loss 6.7580 LearningRate 0.0977 Epoch: 11 Global Step: 57560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:44,374-Speed 18507.47 samples/sec Loss 6.7342 LearningRate 0.0977 Epoch: 11 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:48,787-Speed 18569.07 samples/sec Loss 6.7212 LearningRate 0.0976 Epoch: 11 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:53,215-Speed 18502.48 samples/sec Loss 6.7781 LearningRate 0.0976 Epoch: 11 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:07:57,660-Speed 18434.19 samples/sec Loss 6.7640 LearningRate 0.0975 Epoch: 11 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:02,071-Speed 18576.05 samples/sec Loss 6.7140 LearningRate 0.0975 Epoch: 11 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:06,478-Speed 18595.99 samples/sec Loss 6.7493 LearningRate 0.0975 Epoch: 11 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 05:08:10,922-Speed 18437.21 samples/sec Loss 6.7583 LearningRate 0.0974 Epoch: 11 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:15,386-Speed 18354.95 samples/sec Loss 6.7298 LearningRate 0.0974 Epoch: 11 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:19,830-Speed 18442.83 samples/sec Loss 6.7402 LearningRate 0.0973 Epoch: 11 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:24,277-Speed 18425.05 samples/sec Loss 6.7559 LearningRate 0.0973 Epoch: 11 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:28,727-Speed 18411.88 samples/sec Loss 6.7264 LearningRate 0.0972 Epoch: 11 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:33,157-Speed 18501.88 samples/sec Loss 6.7144 LearningRate 0.0972 Epoch: 11 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:37,608-Speed 18412.43 samples/sec Loss 6.7263 LearningRate 0.0972 Epoch: 11 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:42,040-Speed 18490.41 samples/sec Loss 6.7297 LearningRate 0.0971 Epoch: 11 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:46,494-Speed 18397.08 samples/sec Loss 6.7218 LearningRate 0.0971 Epoch: 11 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:50,915-Speed 18536.12 samples/sec Loss 6.7637 LearningRate 0.0970 Epoch: 11 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:55,323-Speed 18586.87 samples/sec Loss 6.7925 LearningRate 0.0970 Epoch: 11 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:08:59,755-Speed 18489.06 samples/sec Loss 6.7504 LearningRate 0.0970 Epoch: 11 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:09:04,181-Speed 18514.87 samples/sec Loss 6.7287 LearningRate 0.0969 Epoch: 11 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:09:08,655-Speed 18316.00 samples/sec Loss 6.7629 LearningRate 0.0969 Epoch: 11 Global Step: 57760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:13,077-Speed 18530.03 samples/sec Loss 6.7562 LearningRate 0.0968 Epoch: 11 Global Step: 57770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:17,524-Speed 18426.35 samples/sec Loss 6.7138 LearningRate 0.0968 Epoch: 11 Global Step: 57780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:21,948-Speed 18522.96 samples/sec Loss 6.7253 LearningRate 0.0967 Epoch: 11 Global Step: 57790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:26,431-Speed 18277.08 samples/sec Loss 6.7407 LearningRate 0.0967 Epoch: 11 Global Step: 57800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:30,890-Speed 18378.06 samples/sec Loss 6.7022 LearningRate 0.0967 Epoch: 11 Global Step: 57810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:35,337-Speed 18426.42 samples/sec Loss 6.7236 LearningRate 0.0966 Epoch: 11 Global Step: 57820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:39,780-Speed 18441.30 samples/sec Loss 6.7231 LearningRate 0.0966 Epoch: 11 Global Step: 57830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:44,234-Speed 18395.64 samples/sec Loss 6.7697 LearningRate 0.0965 Epoch: 11 Global Step: 57840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:48,669-Speed 18476.31 samples/sec Loss 6.7153 LearningRate 0.0965 Epoch: 11 Global Step: 57850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:09:53,097-Speed 18507.78 samples/sec Loss 6.7565 LearningRate 0.0964 Epoch: 11 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:09:57,499-Speed 18613.23 samples/sec Loss 6.7125 LearningRate 0.0964 Epoch: 11 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:10:01,963-Speed 18357.40 samples/sec Loss 6.7282 LearningRate 0.0964 Epoch: 11 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:10:06,405-Speed 18447.43 samples/sec Loss 6.7098 LearningRate 0.0963 Epoch: 11 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:10:10,855-Speed 18418.61 samples/sec Loss 6.7235 LearningRate 0.0963 Epoch: 11 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:10:15,301-Speed 18427.02 samples/sec Loss 6.7394 LearningRate 0.0962 Epoch: 11 Global Step: 57910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:19,698-Speed 18640.81 samples/sec Loss 6.7076 LearningRate 0.0962 Epoch: 11 Global Step: 57920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:24,128-Speed 18496.40 samples/sec Loss 6.7438 LearningRate 0.0962 Epoch: 11 Global Step: 57930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:28,546-Speed 18550.30 samples/sec Loss 6.7339 LearningRate 0.0961 Epoch: 11 Global Step: 57940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:32,980-Speed 18487.08 samples/sec Loss 6.7371 LearningRate 0.0961 Epoch: 11 Global Step: 57950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:37,370-Speed 18664.52 samples/sec Loss 6.7600 LearningRate 0.0960 Epoch: 11 Global Step: 57960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:41,787-Speed 18556.89 samples/sec Loss 6.7262 LearningRate 0.0960 Epoch: 11 Global Step: 57970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:46,270-Speed 18283.77 samples/sec Loss 6.7446 LearningRate 0.0959 Epoch: 11 Global Step: 57980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:50,736-Speed 18347.41 samples/sec Loss 6.6956 LearningRate 0.0959 Epoch: 11 Global Step: 57990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:55,183-Speed 18424.82 samples/sec Loss 6.6986 LearningRate 0.0959 Epoch: 11 Global Step: 58000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:10:59,623-Speed 18456.96 samples/sec Loss 6.6807 LearningRate 0.0958 Epoch: 11 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:04,038-Speed 18559.62 samples/sec Loss 6.7701 LearningRate 0.0958 Epoch: 11 Global Step: 58020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:08,456-Speed 18549.45 samples/sec Loss 6.6904 LearningRate 0.0957 Epoch: 11 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:12,889-Speed 18484.83 samples/sec Loss 6.7488 LearningRate 0.0957 Epoch: 11 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:17,316-Speed 18507.43 samples/sec Loss 6.7599 LearningRate 0.0957 Epoch: 11 Global Step: 58050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:21,752-Speed 18475.37 samples/sec Loss 6.7070 LearningRate 0.0956 Epoch: 11 Global Step: 58060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:26,228-Speed 18305.54 samples/sec Loss 6.7143 LearningRate 0.0956 Epoch: 11 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:30,692-Speed 18355.99 samples/sec Loss 6.7685 LearningRate 0.0955 Epoch: 11 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:35,139-Speed 18430.75 samples/sec Loss 6.7046 LearningRate 0.0955 Epoch: 11 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:39,533-Speed 18654.62 samples/sec Loss 6.6856 LearningRate 0.0954 Epoch: 11 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:11:43,941-Speed 18591.98 samples/sec Loss 6.6940 LearningRate 0.0954 Epoch: 11 Global Step: 58110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 05:11:48,370-Speed 18501.51 samples/sec Loss 6.7443 LearningRate 0.0954 Epoch: 11 Global Step: 58120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 05:11:52,830-Speed 18379.21 samples/sec Loss 6.6962 LearningRate 0.0953 Epoch: 11 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 05:11:57,337-Speed 18182.19 samples/sec Loss 6.7041 LearningRate 0.0953 Epoch: 11 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:12:01,761-Speed 18519.31 samples/sec Loss 6.7135 LearningRate 0.0952 Epoch: 11 Global Step: 58150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:12:06,160-Speed 18628.54 samples/sec Loss 6.7695 LearningRate 0.0952 Epoch: 11 Global Step: 58160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:10,601-Speed 18452.41 samples/sec Loss 6.6955 LearningRate 0.0951 Epoch: 11 Global Step: 58170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:14,989-Speed 18670.30 samples/sec Loss 6.7445 LearningRate 0.0951 Epoch: 11 Global Step: 58180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:19,437-Speed 18423.70 samples/sec Loss 6.7817 LearningRate 0.0951 Epoch: 11 Global Step: 58190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:23,853-Speed 18553.71 samples/sec Loss 6.7104 LearningRate 0.0950 Epoch: 11 Global Step: 58200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:28,287-Speed 18478.33 samples/sec Loss 6.7087 LearningRate 0.0950 Epoch: 11 Global Step: 58210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:32,750-Speed 18363.43 samples/sec Loss 6.6901 LearningRate 0.0949 Epoch: 11 Global Step: 58220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:37,236-Speed 18267.14 samples/sec Loss 6.6747 LearningRate 0.0949 Epoch: 11 Global Step: 58230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:41,687-Speed 18410.37 samples/sec Loss 6.6726 LearningRate 0.0949 Epoch: 11 Global Step: 58240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:46,160-Speed 18320.53 samples/sec Loss 6.6896 LearningRate 0.0948 Epoch: 11 Global Step: 58250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:12:50,607-Speed 18426.22 samples/sec Loss 6.6916 LearningRate 0.0948 Epoch: 11 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:12:55,032-Speed 18515.75 samples/sec Loss 6.6929 LearningRate 0.0947 Epoch: 11 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:12:59,498-Speed 18349.84 samples/sec Loss 6.7543 LearningRate 0.0947 Epoch: 11 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:13:03,891-Speed 18652.39 samples/sec Loss 6.7116 LearningRate 0.0946 Epoch: 11 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:13:08,294-Speed 18610.29 samples/sec Loss 6.7338 LearningRate 0.0946 Epoch: 11 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:13:12,718-Speed 18520.72 samples/sec Loss 6.6922 LearningRate 0.0946 Epoch: 11 Global Step: 58310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:13:17,162-Speed 18439.83 samples/sec Loss 6.6558 LearningRate 0.0945 Epoch: 11 Global Step: 58320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:13:21,575-Speed 18570.12 samples/sec Loss 6.7260 LearningRate 0.0945 Epoch: 11 Global Step: 58330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:26,082-Speed 18180.33 samples/sec Loss 6.7064 LearningRate 0.0944 Epoch: 11 Global Step: 58340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:30,507-Speed 18520.61 samples/sec Loss 6.7100 LearningRate 0.0944 Epoch: 11 Global Step: 58350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:34,890-Speed 18697.49 samples/sec Loss 6.7131 LearningRate 0.0944 Epoch: 11 Global Step: 58360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:39,294-Speed 18607.27 samples/sec Loss 6.6993 LearningRate 0.0943 Epoch: 11 Global Step: 58370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:43,712-Speed 18546.57 samples/sec Loss 6.6896 LearningRate 0.0943 Epoch: 11 Global Step: 58380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:48,099-Speed 18685.98 samples/sec Loss 6.6541 LearningRate 0.0942 Epoch: 11 Global Step: 58390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:52,528-Speed 18497.33 samples/sec Loss 6.6689 LearningRate 0.0942 Epoch: 11 Global Step: 58400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:13:56,919-Speed 18663.18 samples/sec Loss 6.7121 LearningRate 0.0941 Epoch: 11 Global Step: 58410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:01,380-Speed 18372.77 samples/sec Loss 6.7366 LearningRate 0.0941 Epoch: 11 Global Step: 58420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:05,797-Speed 18548.76 samples/sec Loss 6.6707 LearningRate 0.0941 Epoch: 11 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:14:10,217-Speed 18543.48 samples/sec Loss 6.7226 LearningRate 0.0940 Epoch: 11 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:14:14,643-Speed 18513.40 samples/sec Loss 6.7329 LearningRate 0.0940 Epoch: 11 Global Step: 58450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:19,063-Speed 18538.68 samples/sec Loss 6.7136 LearningRate 0.0939 Epoch: 11 Global Step: 58460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:23,493-Speed 18495.33 samples/sec Loss 6.7175 LearningRate 0.0939 Epoch: 11 Global Step: 58470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:27,926-Speed 18488.83 samples/sec Loss 6.6866 LearningRate 0.0939 Epoch: 11 Global Step: 58480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:32,336-Speed 18585.23 samples/sec Loss 6.6580 LearningRate 0.0938 Epoch: 11 Global Step: 58490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:36,733-Speed 18637.31 samples/sec Loss 6.6718 LearningRate 0.0938 Epoch: 11 Global Step: 58500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:41,199-Speed 18349.28 samples/sec Loss 6.6802 LearningRate 0.0937 Epoch: 11 Global Step: 58510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:14:45,632-Speed 18480.63 samples/sec Loss 6.6831 LearningRate 0.0937 Epoch: 11 Global Step: 58520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:14:50,075-Speed 18444.37 samples/sec Loss 6.7466 LearningRate 0.0936 Epoch: 11 Global Step: 58530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:14:54,484-Speed 18589.23 samples/sec Loss 6.6636 LearningRate 0.0936 Epoch: 11 Global Step: 58540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:14:58,962-Speed 18298.45 samples/sec Loss 6.6682 LearningRate 0.0936 Epoch: 11 Global Step: 58550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:15:03,439-Speed 18302.46 samples/sec Loss 6.7286 LearningRate 0.0935 Epoch: 11 Global Step: 58560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:15:07,868-Speed 18503.08 samples/sec Loss 6.7169 LearningRate 0.0935 Epoch: 11 Global Step: 58570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:15:12,328-Speed 18372.65 samples/sec Loss 6.6625 LearningRate 0.0934 Epoch: 11 Global Step: 58580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:15:16,773-Speed 18433.03 samples/sec Loss 6.7027 LearningRate 0.0934 Epoch: 11 Global Step: 58590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:15:21,177-Speed 18608.91 samples/sec Loss 6.7024 LearningRate 0.0934 Epoch: 11 Global Step: 58600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:15:25,637-Speed 18378.66 samples/sec Loss 6.7160 LearningRate 0.0933 Epoch: 11 Global Step: 58610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:15:30,051-Speed 18567.35 samples/sec Loss 6.7191 LearningRate 0.0933 Epoch: 11 Global Step: 58620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:15:34,530-Speed 18295.66 samples/sec Loss 6.7140 LearningRate 0.0932 Epoch: 11 Global Step: 58630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:15:38,943-Speed 18569.53 samples/sec Loss 6.7039 LearningRate 0.0932 Epoch: 11 Global Step: 58640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:15:43,332-Speed 18671.73 samples/sec Loss 6.7041 LearningRate 0.0932 Epoch: 11 Global Step: 58650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:15:47,726-Speed 18648.53 samples/sec Loss 6.7027 LearningRate 0.0931 Epoch: 11 Global Step: 58660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:15:52,205-Speed 18295.22 samples/sec Loss 6.6986 LearningRate 0.0931 Epoch: 11 Global Step: 58670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:15:56,601-Speed 18639.54 samples/sec Loss 6.6531 LearningRate 0.0930 Epoch: 11 Global Step: 58680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:01,002-Speed 18623.37 samples/sec Loss 6.6632 LearningRate 0.0930 Epoch: 11 Global Step: 58690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:05,446-Speed 18443.18 samples/sec Loss 6.7106 LearningRate 0.0929 Epoch: 11 Global Step: 58700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:09,834-Speed 18678.18 samples/sec Loss 6.6485 LearningRate 0.0929 Epoch: 11 Global Step: 58710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:14,247-Speed 18568.86 samples/sec Loss 6.6893 LearningRate 0.0929 Epoch: 11 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:16:18,713-Speed 18349.47 samples/sec Loss 6.6698 LearningRate 0.0928 Epoch: 11 Global Step: 58730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:23,121-Speed 18589.93 samples/sec Loss 6.6471 LearningRate 0.0928 Epoch: 11 Global Step: 58740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:27,536-Speed 18561.05 samples/sec Loss 6.6410 LearningRate 0.0927 Epoch: 11 Global Step: 58750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:31,952-Speed 18558.86 samples/sec Loss 6.6704 LearningRate 0.0927 Epoch: 11 Global Step: 58760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:36,379-Speed 18509.00 samples/sec Loss 6.6420 LearningRate 0.0927 Epoch: 11 Global Step: 58770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:40,861-Speed 18290.07 samples/sec Loss 6.6395 LearningRate 0.0926 Epoch: 11 Global Step: 58780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:45,319-Speed 18382.34 samples/sec Loss 6.6738 LearningRate 0.0926 Epoch: 11 Global Step: 58790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:49,771-Speed 18405.00 samples/sec Loss 6.6588 LearningRate 0.0925 Epoch: 11 Global Step: 58800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:54,209-Speed 18460.51 samples/sec Loss 6.6890 LearningRate 0.0925 Epoch: 11 Global Step: 58810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:16:58,614-Speed 18599.52 samples/sec Loss 6.7116 LearningRate 0.0924 Epoch: 11 Global Step: 58820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:17:03,042-Speed 18506.89 samples/sec Loss 6.6832 LearningRate 0.0924 Epoch: 11 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:17:07,462-Speed 18539.59 samples/sec Loss 6.6478 LearningRate 0.0924 Epoch: 11 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:17:11,888-Speed 18515.76 samples/sec Loss 6.6581 LearningRate 0.0923 Epoch: 11 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:17:16,337-Speed 18416.79 samples/sec Loss 6.6866 LearningRate 0.0923 Epoch: 11 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:17:20,733-Speed 18643.36 samples/sec Loss 6.6868 LearningRate 0.0922 Epoch: 11 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:17:25,143-Speed 18579.28 samples/sec Loss 6.6032 LearningRate 0.0922 Epoch: 11 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:17:29,573-Speed 18493.54 samples/sec Loss 6.6833 LearningRate 0.0922 Epoch: 11 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:17:33,972-Speed 18627.83 samples/sec Loss 6.6891 LearningRate 0.0921 Epoch: 11 Global Step: 58900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:17:38,396-Speed 18521.22 samples/sec Loss 6.6977 LearningRate 0.0921 Epoch: 11 Global Step: 58910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:17:42,817-Speed 18535.33 samples/sec Loss 6.6656 LearningRate 0.0920 Epoch: 11 Global Step: 58920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:17:47,231-Speed 18559.34 samples/sec Loss 6.6603 LearningRate 0.0920 Epoch: 11 Global Step: 58930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:17:51,614-Speed 18698.78 samples/sec Loss 6.6593 LearningRate 0.0920 Epoch: 11 Global Step: 58940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:17:56,017-Speed 18606.61 samples/sec Loss 6.6992 LearningRate 0.0919 Epoch: 11 Global Step: 58950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:00,400-Speed 18693.58 samples/sec Loss 6.6957 LearningRate 0.0919 Epoch: 11 Global Step: 58960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:04,798-Speed 18632.29 samples/sec Loss 6.6734 LearningRate 0.0918 Epoch: 11 Global Step: 58970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:09,186-Speed 18672.03 samples/sec Loss 6.6551 LearningRate 0.0918 Epoch: 11 Global Step: 58980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:13,631-Speed 18437.20 samples/sec Loss 6.6578 LearningRate 0.0917 Epoch: 11 Global Step: 58990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:18,043-Speed 18573.77 samples/sec Loss 6.6585 LearningRate 0.0917 Epoch: 11 Global Step: 59000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:22,441-Speed 18630.55 samples/sec Loss 6.6608 LearningRate 0.0917 Epoch: 11 Global Step: 59010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:26,874-Speed 18483.10 samples/sec Loss 6.6351 LearningRate 0.0916 Epoch: 11 Global Step: 59020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:31,309-Speed 18474.62 samples/sec Loss 6.6469 LearningRate 0.0916 Epoch: 11 Global Step: 59030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-01-14 05:18:35,779-Speed 18334.46 samples/sec Loss 6.6555 LearningRate 0.0915 Epoch: 11 Global Step: 59040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:18:40,190-Speed 18578.39 samples/sec Loss 6.6438 LearningRate 0.0915 Epoch: 11 Global Step: 59050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:18:44,603-Speed 18568.35 samples/sec Loss 6.6786 LearningRate 0.0915 Epoch: 11 Global Step: 59060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:18:49,023-Speed 18538.49 samples/sec Loss 6.6644 LearningRate 0.0914 Epoch: 11 Global Step: 59070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:18:53,418-Speed 18647.45 samples/sec Loss 6.6976 LearningRate 0.0914 Epoch: 11 Global Step: 59080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:18:57,820-Speed 18614.67 samples/sec Loss 6.6241 LearningRate 0.0913 Epoch: 11 Global Step: 59090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:19:02,224-Speed 18608.20 samples/sec Loss 6.6362 LearningRate 0.0913 Epoch: 11 Global Step: 59100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:19:06,640-Speed 18555.41 samples/sec Loss 6.6677 LearningRate 0.0913 Epoch: 11 Global Step: 59110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:19:11,059-Speed 18542.02 samples/sec Loss 6.6580 LearningRate 0.0912 Epoch: 11 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:19:15,462-Speed 18611.67 samples/sec Loss 6.6442 LearningRate 0.0912 Epoch: 11 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:19:19,871-Speed 18585.21 samples/sec Loss 6.6091 LearningRate 0.0911 Epoch: 11 Global Step: 59140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:24,293-Speed 18530.10 samples/sec Loss 6.6104 LearningRate 0.0911 Epoch: 11 Global Step: 59150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:28,685-Speed 18665.07 samples/sec Loss 6.6483 LearningRate 0.0911 Epoch: 11 Global Step: 59160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:33,122-Speed 18470.69 samples/sec Loss 6.6299 LearningRate 0.0910 Epoch: 11 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:37,509-Speed 18678.06 samples/sec Loss 6.6572 LearningRate 0.0910 Epoch: 11 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:41,900-Speed 18663.32 samples/sec Loss 6.6182 LearningRate 0.0909 Epoch: 11 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:46,292-Speed 18656.42 samples/sec Loss 6.6899 LearningRate 0.0909 Epoch: 11 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:50,697-Speed 18601.87 samples/sec Loss 6.6306 LearningRate 0.0908 Epoch: 11 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:55,130-Speed 18484.81 samples/sec Loss 6.6651 LearningRate 0.0908 Epoch: 11 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:19:59,527-Speed 18638.30 samples/sec Loss 6.6620 LearningRate 0.0908 Epoch: 11 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:03,950-Speed 18527.72 samples/sec Loss 6.6539 LearningRate 0.0907 Epoch: 11 Global Step: 59240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-14 05:20:08,403-Speed 18402.69 samples/sec Loss 6.6315 LearningRate 0.0907 Epoch: 11 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:12,810-Speed 18589.25 samples/sec Loss 6.6395 LearningRate 0.0906 Epoch: 11 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:17,244-Speed 18481.67 samples/sec Loss 6.6309 LearningRate 0.0906 Epoch: 11 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:21,650-Speed 18596.85 samples/sec Loss 6.6594 LearningRate 0.0906 Epoch: 11 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:26,059-Speed 18585.14 samples/sec Loss 6.6524 LearningRate 0.0905 Epoch: 11 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:30,481-Speed 18530.21 samples/sec Loss 6.6261 LearningRate 0.0905 Epoch: 11 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:34,904-Speed 18529.90 samples/sec Loss 6.5964 LearningRate 0.0904 Epoch: 11 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:39,360-Speed 18387.06 samples/sec Loss 6.6450 LearningRate 0.0904 Epoch: 11 Global Step: 59320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:43,810-Speed 18413.75 samples/sec Loss 6.6270 LearningRate 0.0904 Epoch: 11 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:48,242-Speed 18487.48 samples/sec Loss 6.6031 LearningRate 0.0903 Epoch: 11 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:20:52,652-Speed 18584.91 samples/sec Loss 6.5997 LearningRate 0.0903 Epoch: 11 Global Step: 59350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:20:57,065-Speed 18567.47 samples/sec Loss 6.6266 LearningRate 0.0902 Epoch: 11 Global Step: 59360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:01,476-Speed 18575.19 samples/sec Loss 6.6586 LearningRate 0.0902 Epoch: 11 Global Step: 59370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:05,887-Speed 18580.65 samples/sec Loss 6.6101 LearningRate 0.0902 Epoch: 11 Global Step: 59380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:10,367-Speed 18291.71 samples/sec Loss 6.6165 LearningRate 0.0901 Epoch: 11 Global Step: 59390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:14,767-Speed 18623.07 samples/sec Loss 6.6758 LearningRate 0.0901 Epoch: 11 Global Step: 59400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:19,198-Speed 18494.80 samples/sec Loss 6.6278 LearningRate 0.0900 Epoch: 11 Global Step: 59410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:23,592-Speed 18649.02 samples/sec Loss 6.6140 LearningRate 0.0900 Epoch: 11 Global Step: 59420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:27,983-Speed 18661.14 samples/sec Loss 6.6155 LearningRate 0.0900 Epoch: 11 Global Step: 59430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:32,382-Speed 18627.70 samples/sec Loss 6.6525 LearningRate 0.0899 Epoch: 11 Global Step: 59440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-14 05:21:36,828-Speed 18431.50 samples/sec Loss 6.6715 LearningRate 0.0899 Epoch: 11 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:21:41,259-Speed 18492.77 samples/sec Loss 6.6163 LearningRate 0.0898 Epoch: 11 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:21:45,719-Speed 18376.67 samples/sec Loss 6.6681 LearningRate 0.0898 Epoch: 11 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:21:50,157-Speed 18467.98 samples/sec Loss 6.6597 LearningRate 0.0897 Epoch: 11 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:21:54,561-Speed 18607.79 samples/sec Loss 6.6412 LearningRate 0.0897 Epoch: 11 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-14 05:21:58,984-Speed 18526.95 samples/sec Loss 6.6719 LearningRate 0.0897 Epoch: 11 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:22:03,417-Speed 18484.99 samples/sec Loss 6.6061 LearningRate 0.0896 Epoch: 11 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:22:07,883-Speed 18346.62 samples/sec Loss 6.6468 LearningRate 0.0896 Epoch: 11 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:22:12,369-Speed 18269.75 samples/sec Loss 6.6258 LearningRate 0.0895 Epoch: 11 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:22:16,782-Speed 18566.74 samples/sec Loss 6.6205 LearningRate 0.0895 Epoch: 11 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:22:21,217-Speed 18481.70 samples/sec Loss 6.6147 LearningRate 0.0895 Epoch: 11 Global Step: 59550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:25,645-Speed 18503.82 samples/sec Loss 6.5905 LearningRate 0.0894 Epoch: 11 Global Step: 59560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:30,112-Speed 18344.18 samples/sec Loss 6.6362 LearningRate 0.0894 Epoch: 11 Global Step: 59570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:34,541-Speed 18504.60 samples/sec Loss 6.6457 LearningRate 0.0893 Epoch: 11 Global Step: 59580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:38,955-Speed 18563.84 samples/sec Loss 6.6411 LearningRate 0.0893 Epoch: 11 Global Step: 59590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:43,375-Speed 18538.16 samples/sec Loss 6.6386 LearningRate 0.0893 Epoch: 11 Global Step: 59600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:47,830-Speed 18392.70 samples/sec Loss 6.6576 LearningRate 0.0892 Epoch: 11 Global Step: 59610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:52,270-Speed 18459.46 samples/sec Loss 6.6003 LearningRate 0.0892 Epoch: 11 Global Step: 59620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:22:56,739-Speed 18334.13 samples/sec Loss 6.6058 LearningRate 0.0891 Epoch: 11 Global Step: 59630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:01,164-Speed 18516.25 samples/sec Loss 6.6492 LearningRate 0.0891 Epoch: 11 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:05,581-Speed 18558.67 samples/sec Loss 6.6250 LearningRate 0.0891 Epoch: 11 Global Step: 59650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:23:10,002-Speed 18536.96 samples/sec Loss 6.6525 LearningRate 0.0890 Epoch: 11 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:14,474-Speed 18323.14 samples/sec Loss 6.6133 LearningRate 0.0890 Epoch: 11 Global Step: 59670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:18,885-Speed 18575.31 samples/sec Loss 6.5985 LearningRate 0.0889 Epoch: 11 Global Step: 59680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:23,349-Speed 18355.67 samples/sec Loss 6.5992 LearningRate 0.0889 Epoch: 11 Global Step: 59690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:27,791-Speed 18445.02 samples/sec Loss 6.6087 LearningRate 0.0889 Epoch: 11 Global Step: 59700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:32,203-Speed 18573.52 samples/sec Loss 6.6118 LearningRate 0.0888 Epoch: 11 Global Step: 59710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:36,662-Speed 18376.05 samples/sec Loss 6.6206 LearningRate 0.0888 Epoch: 11 Global Step: 59720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:41,097-Speed 18480.91 samples/sec Loss 6.6048 LearningRate 0.0887 Epoch: 11 Global Step: 59730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:45,517-Speed 18537.89 samples/sec Loss 6.6206 LearningRate 0.0887 Epoch: 11 Global Step: 59740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:49,952-Speed 18475.91 samples/sec Loss 6.6340 LearningRate 0.0887 Epoch: 11 Global Step: 59750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:23:54,387-Speed 18479.06 samples/sec Loss 6.6263 LearningRate 0.0886 Epoch: 11 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:23:58,890-Speed 18194.58 samples/sec Loss 6.6154 LearningRate 0.0886 Epoch: 11 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:24:03,373-Speed 18281.92 samples/sec Loss 6.5765 LearningRate 0.0885 Epoch: 11 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:24:07,833-Speed 18371.40 samples/sec Loss 6.6354 LearningRate 0.0885 Epoch: 11 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:24:12,254-Speed 18534.01 samples/sec Loss 6.6178 LearningRate 0.0885 Epoch: 11 Global Step: 59800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:16,676-Speed 18532.64 samples/sec Loss 6.5779 LearningRate 0.0884 Epoch: 11 Global Step: 59810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:21,100-Speed 18523.38 samples/sec Loss 6.6410 LearningRate 0.0884 Epoch: 11 Global Step: 59820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:25,547-Speed 18424.37 samples/sec Loss 6.6169 LearningRate 0.0883 Epoch: 11 Global Step: 59830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:30,003-Speed 18389.80 samples/sec Loss 6.6203 LearningRate 0.0883 Epoch: 11 Global Step: 59840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:34,464-Speed 18370.98 samples/sec Loss 6.5979 LearningRate 0.0883 Epoch: 11 Global Step: 59850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:38,867-Speed 18610.25 samples/sec Loss 6.5628 LearningRate 0.0882 Epoch: 11 Global Step: 59860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:43,347-Speed 18291.61 samples/sec Loss 6.6187 LearningRate 0.0882 Epoch: 11 Global Step: 59870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:47,778-Speed 18494.55 samples/sec Loss 6.6314 LearningRate 0.0881 Epoch: 11 Global Step: 59880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:52,199-Speed 18535.07 samples/sec Loss 6.6172 LearningRate 0.0881 Epoch: 11 Global Step: 59890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:24:56,605-Speed 18599.19 samples/sec Loss 6.6281 LearningRate 0.0881 Epoch: 11 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:25:01,025-Speed 18543.06 samples/sec Loss 6.6426 LearningRate 0.0880 Epoch: 11 Global Step: 59910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:25:05,445-Speed 18542.67 samples/sec Loss 6.6230 LearningRate 0.0880 Epoch: 11 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:25:09,826-Speed 18703.85 samples/sec Loss 6.6405 LearningRate 0.0879 Epoch: 11 Global Step: 59930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:14,291-Speed 18351.91 samples/sec Loss 6.6034 LearningRate 0.0879 Epoch: 11 Global Step: 59940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:18,715-Speed 18521.82 samples/sec Loss 6.6502 LearningRate 0.0879 Epoch: 11 Global Step: 59950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:23,160-Speed 18434.87 samples/sec Loss 6.6173 LearningRate 0.0878 Epoch: 11 Global Step: 59960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:27,632-Speed 18327.04 samples/sec Loss 6.5857 LearningRate 0.0878 Epoch: 11 Global Step: 59970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:32,054-Speed 18525.06 samples/sec Loss 6.6325 LearningRate 0.0877 Epoch: 11 Global Step: 59980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:36,515-Speed 18374.86 samples/sec Loss 6.6120 LearningRate 0.0877 Epoch: 11 Global Step: 59990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:40,927-Speed 18574.27 samples/sec Loss 6.5983 LearningRate 0.0876 Epoch: 11 Global Step: 60000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:45,390-Speed 18357.23 samples/sec Loss 6.6006 LearningRate 0.0876 Epoch: 11 Global Step: 60010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:49,839-Speed 18426.06 samples/sec Loss 6.5919 LearningRate 0.0876 Epoch: 11 Global Step: 60020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:25:54,306-Speed 18347.92 samples/sec Loss 6.5806 LearningRate 0.0875 Epoch: 11 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:25:58,709-Speed 18611.20 samples/sec Loss 6.5972 LearningRate 0.0875 Epoch: 11 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:26:03,176-Speed 18342.01 samples/sec Loss 6.5989 LearningRate 0.0874 Epoch: 11 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:26:07,623-Speed 18427.79 samples/sec Loss 6.5923 LearningRate 0.0874 Epoch: 11 Global Step: 60060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:26:12,165-Speed 18042.71 samples/sec Loss 6.5935 LearningRate 0.0874 Epoch: 11 Global Step: 60070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:16,602-Speed 18468.22 samples/sec Loss 6.5493 LearningRate 0.0873 Epoch: 11 Global Step: 60080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:21,008-Speed 18607.76 samples/sec Loss 6.5889 LearningRate 0.0873 Epoch: 11 Global Step: 60090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:25,434-Speed 18516.51 samples/sec Loss 6.5973 LearningRate 0.0872 Epoch: 11 Global Step: 60100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:29,894-Speed 18372.77 samples/sec Loss 6.6043 LearningRate 0.0872 Epoch: 11 Global Step: 60110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:34,278-Speed 18694.89 samples/sec Loss 6.5843 LearningRate 0.0872 Epoch: 11 Global Step: 60120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:38,655-Speed 18717.95 samples/sec Loss 6.5980 LearningRate 0.0871 Epoch: 11 Global Step: 60130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:43,066-Speed 18579.97 samples/sec Loss 6.6122 LearningRate 0.0871 Epoch: 11 Global Step: 60140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:47,454-Speed 18672.57 samples/sec Loss 6.5727 LearningRate 0.0870 Epoch: 11 Global Step: 60150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:51,881-Speed 18510.53 samples/sec Loss 6.5922 LearningRate 0.0870 Epoch: 11 Global Step: 60160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:26:56,287-Speed 18599.20 samples/sec Loss 6.5627 LearningRate 0.0870 Epoch: 11 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:27:00,680-Speed 18653.00 samples/sec Loss 6.5323 LearningRate 0.0869 Epoch: 11 Global Step: 60180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:05,114-Speed 18478.67 samples/sec Loss 6.5789 LearningRate 0.0869 Epoch: 11 Global Step: 60190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:09,516-Speed 18614.47 samples/sec Loss 6.6053 LearningRate 0.0868 Epoch: 11 Global Step: 60200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:13,938-Speed 18532.28 samples/sec Loss 6.5585 LearningRate 0.0868 Epoch: 11 Global Step: 60210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:18,328-Speed 18666.05 samples/sec Loss 6.5783 LearningRate 0.0868 Epoch: 11 Global Step: 60220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:22,817-Speed 18256.44 samples/sec Loss 6.6049 LearningRate 0.0867 Epoch: 11 Global Step: 60230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:27,217-Speed 18620.19 samples/sec Loss 6.5514 LearningRate 0.0867 Epoch: 11 Global Step: 60240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:31,612-Speed 18643.62 samples/sec Loss 6.6159 LearningRate 0.0866 Epoch: 11 Global Step: 60250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:36,002-Speed 18674.79 samples/sec Loss 6.5566 LearningRate 0.0866 Epoch: 11 Global Step: 60260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:40,412-Speed 18584.31 samples/sec Loss 6.6079 LearningRate 0.0866 Epoch: 11 Global Step: 60270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:27:44,829-Speed 18550.41 samples/sec Loss 6.5596 LearningRate 0.0865 Epoch: 11 Global Step: 60280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:27:49,232-Speed 18611.48 samples/sec Loss 6.5745 LearningRate 0.0865 Epoch: 11 Global Step: 60290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:27:53,706-Speed 18314.55 samples/sec Loss 6.5658 LearningRate 0.0864 Epoch: 11 Global Step: 60300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:27:58,151-Speed 18433.41 samples/sec Loss 6.5770 LearningRate 0.0864 Epoch: 11 Global Step: 60310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:02,629-Speed 18300.99 samples/sec Loss 6.5427 LearningRate 0.0864 Epoch: 11 Global Step: 60320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:07,030-Speed 18619.34 samples/sec Loss 6.5878 LearningRate 0.0863 Epoch: 11 Global Step: 60330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:11,494-Speed 18359.79 samples/sec Loss 6.5403 LearningRate 0.0863 Epoch: 11 Global Step: 60340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:15,907-Speed 18572.16 samples/sec Loss 6.5409 LearningRate 0.0863 Epoch: 11 Global Step: 60350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:20,341-Speed 18483.57 samples/sec Loss 6.5627 LearningRate 0.0862 Epoch: 11 Global Step: 60360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:24,761-Speed 18537.25 samples/sec Loss 6.5693 LearningRate 0.0862 Epoch: 11 Global Step: 60370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:29,169-Speed 18590.17 samples/sec Loss 6.5547 LearningRate 0.0861 Epoch: 11 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:28:33,629-Speed 18371.27 samples/sec Loss 6.5788 LearningRate 0.0861 Epoch: 11 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:28:38,048-Speed 18545.48 samples/sec Loss 6.5815 LearningRate 0.0861 Epoch: 11 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:28:42,520-Speed 18322.96 samples/sec Loss 6.5272 LearningRate 0.0860 Epoch: 11 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:28:46,929-Speed 18584.12 samples/sec Loss 6.5880 LearningRate 0.0860 Epoch: 11 Global Step: 60420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:51,371-Speed 18449.93 samples/sec Loss 6.5479 LearningRate 0.0859 Epoch: 11 Global Step: 60430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:28:55,847-Speed 18310.17 samples/sec Loss 6.5559 LearningRate 0.0859 Epoch: 11 Global Step: 60440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:00,312-Speed 18349.68 samples/sec Loss 6.5596 LearningRate 0.0859 Epoch: 11 Global Step: 60450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:04,724-Speed 18573.25 samples/sec Loss 6.5869 LearningRate 0.0858 Epoch: 11 Global Step: 60460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:09,155-Speed 18495.54 samples/sec Loss 6.5946 LearningRate 0.0858 Epoch: 11 Global Step: 60470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:13,551-Speed 18643.44 samples/sec Loss 6.5698 LearningRate 0.0857 Epoch: 11 Global Step: 60480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:18,024-Speed 18317.88 samples/sec Loss 6.5616 LearningRate 0.0857 Epoch: 11 Global Step: 60490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:22,430-Speed 18598.82 samples/sec Loss 6.5767 LearningRate 0.0857 Epoch: 11 Global Step: 60500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:26,841-Speed 18574.50 samples/sec Loss 6.5657 LearningRate 0.0856 Epoch: 11 Global Step: 60510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:29:31,272-Speed 18492.17 samples/sec Loss 6.5619 LearningRate 0.0856 Epoch: 11 Global Step: 60520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:29:35,710-Speed 18465.81 samples/sec Loss 6.5468 LearningRate 0.0855 Epoch: 11 Global Step: 60530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:29:40,128-Speed 18545.24 samples/sec Loss 6.5416 LearningRate 0.0855 Epoch: 11 Global Step: 60540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:29:44,507-Speed 18713.66 samples/sec Loss 6.5927 LearningRate 0.0855 Epoch: 11 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:29:48,906-Speed 18625.77 samples/sec Loss 6.5390 LearningRate 0.0854 Epoch: 11 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:29:53,348-Speed 18451.50 samples/sec Loss 6.5432 LearningRate 0.0854 Epoch: 11 Global Step: 60570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:29:57,780-Speed 18487.79 samples/sec Loss 6.5669 LearningRate 0.0853 Epoch: 11 Global Step: 60580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:30:02,264-Speed 18273.94 samples/sec Loss 6.5739 LearningRate 0.0853 Epoch: 11 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:30:06,734-Speed 18332.52 samples/sec Loss 6.5678 LearningRate 0.0853 Epoch: 11 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:30:11,170-Speed 18476.34 samples/sec Loss 6.5712 LearningRate 0.0852 Epoch: 11 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:30:15,637-Speed 18344.10 samples/sec Loss 6.5662 LearningRate 0.0852 Epoch: 11 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:30:20,114-Speed 18300.13 samples/sec Loss 6.5405 LearningRate 0.0851 Epoch: 11 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:30:24,497-Speed 18699.40 samples/sec Loss 6.5685 LearningRate 0.0851 Epoch: 11 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:30:28,905-Speed 18585.05 samples/sec Loss 6.5579 LearningRate 0.0851 Epoch: 11 Global Step: 60650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:30:33,322-Speed 18552.55 samples/sec Loss 6.5609 LearningRate 0.0850 Epoch: 11 Global Step: 60660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:30:37,785-Speed 18362.59 samples/sec Loss 6.5719 LearningRate 0.0850 Epoch: 11 Global Step: 60670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:30:42,187-Speed 18613.64 samples/sec Loss 6.5644 LearningRate 0.0849 Epoch: 11 Global Step: 60680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:30:46,575-Speed 18676.40 samples/sec Loss 6.5382 LearningRate 0.0849 Epoch: 11 Global Step: 60690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:30:51,035-Speed 18376.44 samples/sec Loss 6.5612 LearningRate 0.0849 Epoch: 11 Global Step: 60700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:30:55,448-Speed 18570.23 samples/sec Loss 6.5474 LearningRate 0.0848 Epoch: 11 Global Step: 60710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:30:59,865-Speed 18549.33 samples/sec Loss 6.5317 LearningRate 0.0848 Epoch: 11 Global Step: 60720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:04,270-Speed 18604.24 samples/sec Loss 6.5401 LearningRate 0.0847 Epoch: 11 Global Step: 60730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:08,687-Speed 18554.35 samples/sec Loss 6.5246 LearningRate 0.0847 Epoch: 11 Global Step: 60740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:13,060-Speed 18738.52 samples/sec Loss 6.5695 LearningRate 0.0847 Epoch: 11 Global Step: 60750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:17,486-Speed 18510.90 samples/sec Loss 6.5345 LearningRate 0.0846 Epoch: 11 Global Step: 60760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:21,921-Speed 18476.78 samples/sec Loss 6.5501 LearningRate 0.0846 Epoch: 11 Global Step: 60770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:26,365-Speed 18443.83 samples/sec Loss 6.5577 LearningRate 0.0845 Epoch: 11 Global Step: 60780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:30,742-Speed 18722.42 samples/sec Loss 6.5477 LearningRate 0.0845 Epoch: 11 Global Step: 60790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:35,188-Speed 18430.01 samples/sec Loss 6.5510 LearningRate 0.0845 Epoch: 11 Global Step: 60800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:31:39,661-Speed 18318.97 samples/sec Loss 6.5939 LearningRate 0.0844 Epoch: 11 Global Step: 60810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:31:44,152-Speed 18244.69 samples/sec Loss 6.5393 LearningRate 0.0844 Epoch: 11 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:31:48,558-Speed 18596.55 samples/sec Loss 6.5261 LearningRate 0.0844 Epoch: 11 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:31:53,068-Speed 18169.13 samples/sec Loss 6.5030 LearningRate 0.0843 Epoch: 11 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:31:57,544-Speed 18309.28 samples/sec Loss 6.5217 LearningRate 0.0843 Epoch: 11 Global Step: 60850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:01,970-Speed 18513.99 samples/sec Loss 6.5142 LearningRate 0.0842 Epoch: 11 Global Step: 60860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:06,394-Speed 18523.91 samples/sec Loss 6.5436 LearningRate 0.0842 Epoch: 11 Global Step: 60870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:10,805-Speed 18574.43 samples/sec Loss 6.5663 LearningRate 0.0842 Epoch: 11 Global Step: 60880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:15,238-Speed 18488.17 samples/sec Loss 6.5422 LearningRate 0.0841 Epoch: 11 Global Step: 60890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:19,704-Speed 18346.71 samples/sec Loss 6.5513 LearningRate 0.0841 Epoch: 11 Global Step: 60900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:24,175-Speed 18329.43 samples/sec Loss 6.5331 LearningRate 0.0840 Epoch: 11 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:32:28,587-Speed 18574.39 samples/sec Loss 6.5434 LearningRate 0.0840 Epoch: 11 Global Step: 60920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:32,971-Speed 18688.35 samples/sec Loss 6.5496 LearningRate 0.0840 Epoch: 11 Global Step: 60930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:37,366-Speed 18646.67 samples/sec Loss 6.5111 LearningRate 0.0839 Epoch: 11 Global Step: 60940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:41,764-Speed 18631.15 samples/sec Loss 6.5581 LearningRate 0.0839 Epoch: 11 Global Step: 60950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:46,174-Speed 18580.77 samples/sec Loss 6.5172 LearningRate 0.0838 Epoch: 11 Global Step: 60960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:50,599-Speed 18516.84 samples/sec Loss 6.5360 LearningRate 0.0838 Epoch: 11 Global Step: 60970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:55,003-Speed 18605.86 samples/sec Loss 6.5303 LearningRate 0.0838 Epoch: 11 Global Step: 60980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:32:59,415-Speed 18577.49 samples/sec Loss 6.5384 LearningRate 0.0837 Epoch: 11 Global Step: 60990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:03,888-Speed 18319.32 samples/sec Loss 6.5391 LearningRate 0.0837 Epoch: 11 Global Step: 61000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:08,341-Speed 18401.00 samples/sec Loss 6.5091 LearningRate 0.0836 Epoch: 11 Global Step: 61010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:12,798-Speed 18385.19 samples/sec Loss 6.5680 LearningRate 0.0836 Epoch: 11 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:33:17,222-Speed 18522.27 samples/sec Loss 6.5181 LearningRate 0.0836 Epoch: 11 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:33:21,667-Speed 18433.21 samples/sec Loss 6.5925 LearningRate 0.0835 Epoch: 11 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:26,092-Speed 18521.21 samples/sec Loss 6.5264 LearningRate 0.0835 Epoch: 11 Global Step: 61050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:30,479-Speed 18675.10 samples/sec Loss 6.5421 LearningRate 0.0834 Epoch: 11 Global Step: 61060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:34,888-Speed 18589.58 samples/sec Loss 6.5129 LearningRate 0.0834 Epoch: 11 Global Step: 61070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:39,278-Speed 18666.18 samples/sec Loss 6.5236 LearningRate 0.0834 Epoch: 11 Global Step: 61080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:43,678-Speed 18628.00 samples/sec Loss 6.5367 LearningRate 0.0833 Epoch: 11 Global Step: 61090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:48,091-Speed 18572.56 samples/sec Loss 6.5323 LearningRate 0.0833 Epoch: 11 Global Step: 61100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:52,535-Speed 18436.48 samples/sec Loss 6.5469 LearningRate 0.0833 Epoch: 11 Global Step: 61110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:33:56,997-Speed 18367.65 samples/sec Loss 6.4986 LearningRate 0.0832 Epoch: 11 Global Step: 61120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:01,416-Speed 18544.05 samples/sec Loss 6.5122 LearningRate 0.0832 Epoch: 11 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:05,838-Speed 18535.23 samples/sec Loss 6.5222 LearningRate 0.0831 Epoch: 11 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:34:10,300-Speed 18362.29 samples/sec Loss 6.4848 LearningRate 0.0831 Epoch: 11 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:34:14,694-Speed 18645.83 samples/sec Loss 6.5265 LearningRate 0.0831 Epoch: 11 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:34:19,080-Speed 18686.36 samples/sec Loss 6.5236 LearningRate 0.0830 Epoch: 11 Global Step: 61170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:23,486-Speed 18594.50 samples/sec Loss 6.4959 LearningRate 0.0830 Epoch: 11 Global Step: 61180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:27,917-Speed 18498.70 samples/sec Loss 6.5018 LearningRate 0.0829 Epoch: 11 Global Step: 61190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:32,324-Speed 18595.36 samples/sec Loss 6.4904 LearningRate 0.0829 Epoch: 11 Global Step: 61200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:36,755-Speed 18493.51 samples/sec Loss 6.5182 LearningRate 0.0829 Epoch: 11 Global Step: 61210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:41,223-Speed 18335.12 samples/sec Loss 6.5210 LearningRate 0.0828 Epoch: 11 Global Step: 61220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:45,621-Speed 18633.13 samples/sec Loss 6.5182 LearningRate 0.0828 Epoch: 11 Global Step: 61230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:50,047-Speed 18509.43 samples/sec Loss 6.4934 LearningRate 0.0827 Epoch: 11 Global Step: 61240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:54,516-Speed 18338.02 samples/sec Loss 6.5082 LearningRate 0.0827 Epoch: 11 Global Step: 61250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:34:58,982-Speed 18344.80 samples/sec Loss 6.5210 LearningRate 0.0827 Epoch: 11 Global Step: 61260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:35:03,406-Speed 18519.99 samples/sec Loss 6.5452 LearningRate 0.0826 Epoch: 11 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:07,823-Speed 18551.76 samples/sec Loss 6.5159 LearningRate 0.0826 Epoch: 11 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:12,275-Speed 18403.04 samples/sec Loss 6.5102 LearningRate 0.0825 Epoch: 11 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:16,712-Speed 18469.10 samples/sec Loss 6.5049 LearningRate 0.0825 Epoch: 11 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:21,135-Speed 18526.86 samples/sec Loss 6.5376 LearningRate 0.0825 Epoch: 11 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:25,567-Speed 18485.18 samples/sec Loss 6.5163 LearningRate 0.0824 Epoch: 11 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:29,956-Speed 18670.78 samples/sec Loss 6.5570 LearningRate 0.0824 Epoch: 11 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:34,345-Speed 18668.98 samples/sec Loss 6.5491 LearningRate 0.0824 Epoch: 11 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:38,769-Speed 18523.35 samples/sec Loss 6.4715 LearningRate 0.0823 Epoch: 11 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:43,184-Speed 18559.49 samples/sec Loss 6.5324 LearningRate 0.0823 Epoch: 11 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:47,630-Speed 18430.49 samples/sec Loss 6.5187 LearningRate 0.0822 Epoch: 11 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 05:35:52,093-Speed 18363.26 samples/sec Loss 6.4921 LearningRate 0.0822 Epoch: 11 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:35:56,521-Speed 18503.49 samples/sec Loss 6.5118 LearningRate 0.0822 Epoch: 11 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:36:00,945-Speed 18522.69 samples/sec Loss 6.5187 LearningRate 0.0821 Epoch: 11 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:36:05,351-Speed 18599.25 samples/sec Loss 6.5530 LearningRate 0.0821 Epoch: 11 Global Step: 61410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:09,766-Speed 18561.86 samples/sec Loss 6.4976 LearningRate 0.0820 Epoch: 11 Global Step: 61420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:14,189-Speed 18525.83 samples/sec Loss 6.4984 LearningRate 0.0820 Epoch: 11 Global Step: 61430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:18,593-Speed 18607.26 samples/sec Loss 6.4807 LearningRate 0.0820 Epoch: 11 Global Step: 61440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:22,997-Speed 18602.31 samples/sec Loss 6.5131 LearningRate 0.0819 Epoch: 11 Global Step: 61450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:27,428-Speed 18494.92 samples/sec Loss 6.5191 LearningRate 0.0819 Epoch: 11 Global Step: 61460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:31,835-Speed 18595.02 samples/sec Loss 6.4632 LearningRate 0.0818 Epoch: 11 Global Step: 61470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:36,285-Speed 18413.84 samples/sec Loss 6.5024 LearningRate 0.0818 Epoch: 11 Global Step: 61480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:40,708-Speed 18526.56 samples/sec Loss 6.4683 LearningRate 0.0818 Epoch: 11 Global Step: 61490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:45,202-Speed 18232.33 samples/sec Loss 6.5326 LearningRate 0.0817 Epoch: 11 Global Step: 61500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:36:49,661-Speed 18375.40 samples/sec Loss 6.5306 LearningRate 0.0817 Epoch: 11 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:36:54,074-Speed 18569.38 samples/sec Loss 6.5123 LearningRate 0.0817 Epoch: 11 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:37:04,137-Speed 8141.94 samples/sec Loss 6.4949 LearningRate 0.0816 Epoch: 11 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:37:08,598-Speed 18373.03 samples/sec Loss 6.4842 LearningRate 0.0816 Epoch: 11 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:37:13,010-Speed 18571.10 samples/sec Loss 6.4483 LearningRate 0.0815 Epoch: 11 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:37:17,428-Speed 18543.53 samples/sec Loss 6.5307 LearningRate 0.0815 Epoch: 11 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:37:21,850-Speed 18532.27 samples/sec Loss 6.4659 LearningRate 0.0815 Epoch: 11 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:37:26,297-Speed 18428.51 samples/sec Loss 6.5286 LearningRate 0.0814 Epoch: 11 Global Step: 61580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:37:30,752-Speed 18395.39 samples/sec Loss 6.4598 LearningRate 0.0814 Epoch: 11 Global Step: 61590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:37:35,211-Speed 18376.30 samples/sec Loss 6.4936 LearningRate 0.0813 Epoch: 11 Global Step: 61600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:37:39,616-Speed 18603.01 samples/sec Loss 6.5139 LearningRate 0.0813 Epoch: 11 Global Step: 61610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:37:44,072-Speed 18390.50 samples/sec Loss 6.5123 LearningRate 0.0813 Epoch: 11 Global Step: 61620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:37:48,487-Speed 18558.62 samples/sec Loss 6.5100 LearningRate 0.0812 Epoch: 11 Global Step: 61630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:37:52,964-Speed 18302.06 samples/sec Loss 6.4833 LearningRate 0.0812 Epoch: 11 Global Step: 61640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:37:57,439-Speed 18313.98 samples/sec Loss 6.5135 LearningRate 0.0812 Epoch: 11 Global Step: 61650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:01,843-Speed 18604.94 samples/sec Loss 6.5218 LearningRate 0.0811 Epoch: 11 Global Step: 61660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:06,277-Speed 18482.57 samples/sec Loss 6.4784 LearningRate 0.0811 Epoch: 11 Global Step: 61670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:10,721-Speed 18439.50 samples/sec Loss 6.5197 LearningRate 0.0810 Epoch: 11 Global Step: 61680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:15,169-Speed 18423.29 samples/sec Loss 6.4773 LearningRate 0.0810 Epoch: 11 Global Step: 61690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:19,594-Speed 18519.73 samples/sec Loss 6.5052 LearningRate 0.0810 Epoch: 11 Global Step: 61700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:24,006-Speed 18571.37 samples/sec Loss 6.4812 LearningRate 0.0809 Epoch: 11 Global Step: 61710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:28,453-Speed 18427.17 samples/sec Loss 6.5181 LearningRate 0.0809 Epoch: 11 Global Step: 61720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:32,929-Speed 18305.18 samples/sec Loss 6.4794 LearningRate 0.0808 Epoch: 11 Global Step: 61730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:37,381-Speed 18406.95 samples/sec Loss 6.5340 LearningRate 0.0808 Epoch: 11 Global Step: 61740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:38:41,785-Speed 18606.00 samples/sec Loss 6.4530 LearningRate 0.0808 Epoch: 11 Global Step: 61750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:38:46,195-Speed 18579.33 samples/sec Loss 6.4637 LearningRate 0.0807 Epoch: 11 Global Step: 61760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:38:50,690-Speed 18231.83 samples/sec Loss 6.4831 LearningRate 0.0807 Epoch: 11 Global Step: 61770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:38:55,110-Speed 18541.57 samples/sec Loss 6.4761 LearningRate 0.0807 Epoch: 11 Global Step: 61780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:38:59,541-Speed 18493.01 samples/sec Loss 6.4529 LearningRate 0.0806 Epoch: 11 Global Step: 61790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:04,017-Speed 18308.47 samples/sec Loss 6.4765 LearningRate 0.0806 Epoch: 11 Global Step: 61800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:08,426-Speed 18587.91 samples/sec Loss 6.5165 LearningRate 0.0805 Epoch: 11 Global Step: 61810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:12,847-Speed 18531.11 samples/sec Loss 6.4556 LearningRate 0.0805 Epoch: 11 Global Step: 61820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:17,262-Speed 18558.53 samples/sec Loss 6.4697 LearningRate 0.0805 Epoch: 11 Global Step: 61830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:21,681-Speed 18547.83 samples/sec Loss 6.4642 LearningRate 0.0804 Epoch: 11 Global Step: 61840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:26,119-Speed 18458.55 samples/sec Loss 6.4564 LearningRate 0.0804 Epoch: 11 Global Step: 61850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:39:30,558-Speed 18464.22 samples/sec Loss 6.4720 LearningRate 0.0803 Epoch: 11 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:39:35,119-Speed 17964.64 samples/sec Loss 6.4540 LearningRate 0.0803 Epoch: 11 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:39:39,535-Speed 18556.99 samples/sec Loss 6.5105 LearningRate 0.0803 Epoch: 11 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:39:43,944-Speed 18590.30 samples/sec Loss 6.4247 LearningRate 0.0802 Epoch: 11 Global Step: 61890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:48,367-Speed 18526.56 samples/sec Loss 6.4531 LearningRate 0.0802 Epoch: 11 Global Step: 61900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:52,791-Speed 18522.89 samples/sec Loss 6.4985 LearningRate 0.0802 Epoch: 11 Global Step: 61910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:39:57,209-Speed 18549.59 samples/sec Loss 6.4840 LearningRate 0.0801 Epoch: 11 Global Step: 61920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:40:01,616-Speed 18591.70 samples/sec Loss 6.4531 LearningRate 0.0801 Epoch: 11 Global Step: 61930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:40:06,042-Speed 18517.07 samples/sec Loss 6.4487 LearningRate 0.0800 Epoch: 11 Global Step: 61940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:40:10,453-Speed 18572.45 samples/sec Loss 6.5046 LearningRate 0.0800 Epoch: 11 Global Step: 61950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:40:14,904-Speed 18411.37 samples/sec Loss 6.4312 LearningRate 0.0800 Epoch: 11 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:40:19,332-Speed 18505.28 samples/sec Loss 6.4421 LearningRate 0.0799 Epoch: 11 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:40:23,749-Speed 18552.62 samples/sec Loss 6.4757 LearningRate 0.0799 Epoch: 11 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:40:28,180-Speed 18491.56 samples/sec Loss 6.4977 LearningRate 0.0798 Epoch: 11 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:40:32,588-Speed 18590.63 samples/sec Loss 6.4698 LearningRate 0.0798 Epoch: 11 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:40:37,017-Speed 18501.09 samples/sec Loss 6.4479 LearningRate 0.0798 Epoch: 11 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:40:41,402-Speed 18686.70 samples/sec Loss 6.4985 LearningRate 0.0797 Epoch: 11 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:40:45,800-Speed 18631.81 samples/sec Loss 6.4926 LearningRate 0.0797 Epoch: 11 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:40:50,268-Speed 18341.96 samples/sec Loss 6.4999 LearningRate 0.0797 Epoch: 11 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:40:54,661-Speed 18651.87 samples/sec Loss 6.4807 LearningRate 0.0796 Epoch: 11 Global Step: 62050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:40:59,055-Speed 18650.06 samples/sec Loss 6.4726 LearningRate 0.0796 Epoch: 11 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:41:03,456-Speed 18615.45 samples/sec Loss 6.4591 LearningRate 0.0795 Epoch: 11 Global Step: 62070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:07,834-Speed 18717.22 samples/sec Loss 6.4695 LearningRate 0.0795 Epoch: 11 Global Step: 62080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:12,223-Speed 18669.86 samples/sec Loss 6.4379 LearningRate 0.0795 Epoch: 11 Global Step: 62090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:16,652-Speed 18501.14 samples/sec Loss 6.5182 LearningRate 0.0794 Epoch: 11 Global Step: 62100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:21,076-Speed 18521.16 samples/sec Loss 6.4296 LearningRate 0.0794 Epoch: 11 Global Step: 62110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:25,557-Speed 18287.75 samples/sec Loss 6.4887 LearningRate 0.0793 Epoch: 11 Global Step: 62120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:29,970-Speed 18567.95 samples/sec Loss 6.4314 LearningRate 0.0793 Epoch: 11 Global Step: 62130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:34,439-Speed 18338.88 samples/sec Loss 6.4230 LearningRate 0.0793 Epoch: 11 Global Step: 62140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:38,879-Speed 18454.48 samples/sec Loss 6.4831 LearningRate 0.0792 Epoch: 11 Global Step: 62150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:43,288-Speed 18588.75 samples/sec Loss 6.4238 LearningRate 0.0792 Epoch: 11 Global Step: 62160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:41:47,708-Speed 18538.09 samples/sec Loss 6.5000 LearningRate 0.0792 Epoch: 11 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:41:52,196-Speed 18258.81 samples/sec Loss 6.4975 LearningRate 0.0791 Epoch: 11 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:41:56,629-Speed 18493.11 samples/sec Loss 6.4530 LearningRate 0.0791 Epoch: 11 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:42:01,053-Speed 18527.05 samples/sec Loss 6.4573 LearningRate 0.0790 Epoch: 11 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:42:05,458-Speed 18600.97 samples/sec Loss 6.5477 LearningRate 0.0790 Epoch: 11 Global Step: 62210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:24,508-Speed 4300.64 samples/sec Loss 6.4928 LearningRate 0.0790 Epoch: 12 Global Step: 62220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:28,967-Speed 18380.78 samples/sec Loss 6.4808 LearningRate 0.0789 Epoch: 12 Global Step: 62230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:33,390-Speed 18529.03 samples/sec Loss 6.4504 LearningRate 0.0789 Epoch: 12 Global Step: 62240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:37,787-Speed 18638.38 samples/sec Loss 6.4281 LearningRate 0.0789 Epoch: 12 Global Step: 62250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:42,222-Speed 18476.98 samples/sec Loss 6.4675 LearningRate 0.0788 Epoch: 12 Global Step: 62260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:46,690-Speed 18337.54 samples/sec Loss 6.4513 LearningRate 0.0788 Epoch: 12 Global Step: 62270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:51,104-Speed 18565.30 samples/sec Loss 6.4226 LearningRate 0.0787 Epoch: 12 Global Step: 62280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:55,478-Speed 18732.97 samples/sec Loss 6.4433 LearningRate 0.0787 Epoch: 12 Global Step: 62290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:42:59,886-Speed 18588.04 samples/sec Loss 6.4269 LearningRate 0.0787 Epoch: 12 Global Step: 62300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:04,274-Speed 18675.59 samples/sec Loss 6.4798 LearningRate 0.0786 Epoch: 12 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:43:08,703-Speed 18502.73 samples/sec Loss 6.4245 LearningRate 0.0786 Epoch: 12 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:43:13,115-Speed 18570.30 samples/sec Loss 6.4620 LearningRate 0.0785 Epoch: 12 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:43:17,586-Speed 18327.55 samples/sec Loss 6.4475 LearningRate 0.0785 Epoch: 12 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:43:22,047-Speed 18369.78 samples/sec Loss 6.3918 LearningRate 0.0785 Epoch: 12 Global Step: 62350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:26,515-Speed 18339.31 samples/sec Loss 6.4462 LearningRate 0.0784 Epoch: 12 Global Step: 62360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:30,989-Speed 18316.07 samples/sec Loss 6.4293 LearningRate 0.0784 Epoch: 12 Global Step: 62370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:35,447-Speed 18378.67 samples/sec Loss 6.4498 LearningRate 0.0784 Epoch: 12 Global Step: 62380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:39,907-Speed 18375.49 samples/sec Loss 6.4506 LearningRate 0.0783 Epoch: 12 Global Step: 62390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:44,296-Speed 18676.01 samples/sec Loss 6.4143 LearningRate 0.0783 Epoch: 12 Global Step: 62400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:48,737-Speed 18453.08 samples/sec Loss 6.4447 LearningRate 0.0782 Epoch: 12 Global Step: 62410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:53,197-Speed 18374.22 samples/sec Loss 6.4462 LearningRate 0.0782 Epoch: 12 Global Step: 62420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:43:57,606-Speed 18585.35 samples/sec Loss 6.4103 LearningRate 0.0782 Epoch: 12 Global Step: 62430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:01,995-Speed 18669.94 samples/sec Loss 6.4670 LearningRate 0.0781 Epoch: 12 Global Step: 62440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:06,466-Speed 18327.99 samples/sec Loss 6.4127 LearningRate 0.0781 Epoch: 12 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:44:10,933-Speed 18348.09 samples/sec Loss 6.4431 LearningRate 0.0781 Epoch: 12 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:44:15,378-Speed 18431.81 samples/sec Loss 6.4349 LearningRate 0.0780 Epoch: 12 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:44:19,791-Speed 18568.50 samples/sec Loss 6.4231 LearningRate 0.0780 Epoch: 12 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:44:24,199-Speed 18590.10 samples/sec Loss 6.4591 LearningRate 0.0779 Epoch: 12 Global Step: 62490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:28,634-Speed 18475.89 samples/sec Loss 6.3942 LearningRate 0.0779 Epoch: 12 Global Step: 62500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:33,051-Speed 18551.74 samples/sec Loss 6.4055 LearningRate 0.0779 Epoch: 12 Global Step: 62510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:37,479-Speed 18509.45 samples/sec Loss 6.3804 LearningRate 0.0778 Epoch: 12 Global Step: 62520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:41,883-Speed 18604.81 samples/sec Loss 6.4326 LearningRate 0.0778 Epoch: 12 Global Step: 62530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:46,361-Speed 18295.09 samples/sec Loss 6.4437 LearningRate 0.0778 Epoch: 12 Global Step: 62540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:50,780-Speed 18543.36 samples/sec Loss 6.4852 LearningRate 0.0777 Epoch: 12 Global Step: 62550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:55,173-Speed 18654.14 samples/sec Loss 6.4547 LearningRate 0.0777 Epoch: 12 Global Step: 62560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:44:59,613-Speed 18457.37 samples/sec Loss 6.4049 LearningRate 0.0776 Epoch: 12 Global Step: 62570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:04,129-Speed 18144.91 samples/sec Loss 6.4273 LearningRate 0.0776 Epoch: 12 Global Step: 62580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:08,586-Speed 18392.31 samples/sec Loss 6.4489 LearningRate 0.0776 Epoch: 12 Global Step: 62590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:13,038-Speed 18407.28 samples/sec Loss 6.4703 LearningRate 0.0775 Epoch: 12 Global Step: 62600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:17,496-Speed 18382.23 samples/sec Loss 6.4309 LearningRate 0.0775 Epoch: 12 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:22,001-Speed 18195.61 samples/sec Loss 6.4339 LearningRate 0.0775 Epoch: 12 Global Step: 62620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:26,544-Speed 18038.52 samples/sec Loss 6.4613 LearningRate 0.0774 Epoch: 12 Global Step: 62630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:30,967-Speed 18523.29 samples/sec Loss 6.4363 LearningRate 0.0774 Epoch: 12 Global Step: 62640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:35,370-Speed 18611.57 samples/sec Loss 6.4236 LearningRate 0.0773 Epoch: 12 Global Step: 62650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:39,772-Speed 18616.53 samples/sec Loss 6.4092 LearningRate 0.0773 Epoch: 12 Global Step: 62660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:44,194-Speed 18537.14 samples/sec Loss 6.4301 LearningRate 0.0773 Epoch: 12 Global Step: 62670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:48,605-Speed 18581.56 samples/sec Loss 6.4337 LearningRate 0.0772 Epoch: 12 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:45:53,025-Speed 18536.90 samples/sec Loss 6.4381 LearningRate 0.0772 Epoch: 12 Global Step: 62690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:45:57,431-Speed 18601.94 samples/sec Loss 6.4318 LearningRate 0.0771 Epoch: 12 Global Step: 62700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:46:01,837-Speed 18600.56 samples/sec Loss 6.4421 LearningRate 0.0771 Epoch: 12 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:46:06,243-Speed 18594.90 samples/sec Loss 6.4304 LearningRate 0.0771 Epoch: 12 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:46:10,693-Speed 18414.62 samples/sec Loss 6.4660 LearningRate 0.0770 Epoch: 12 Global Step: 62730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:46:15,120-Speed 18509.97 samples/sec Loss 6.3929 LearningRate 0.0770 Epoch: 12 Global Step: 62740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:46:19,584-Speed 18362.67 samples/sec Loss 6.3723 LearningRate 0.0770 Epoch: 12 Global Step: 62750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:46:23,976-Speed 18659.37 samples/sec Loss 6.4315 LearningRate 0.0769 Epoch: 12 Global Step: 62760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:46:28,419-Speed 18445.44 samples/sec Loss 6.4186 LearningRate 0.0769 Epoch: 12 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:46:32,890-Speed 18339.09 samples/sec Loss 6.4353 LearningRate 0.0768 Epoch: 12 Global Step: 62780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:46:37,376-Speed 18271.33 samples/sec Loss 6.4180 LearningRate 0.0768 Epoch: 12 Global Step: 62790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:46:41,802-Speed 18518.06 samples/sec Loss 6.4161 LearningRate 0.0768 Epoch: 12 Global Step: 62800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:46:46,235-Speed 18488.58 samples/sec Loss 6.4238 LearningRate 0.0767 Epoch: 12 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:46:50,655-Speed 18535.28 samples/sec Loss 6.4351 LearningRate 0.0767 Epoch: 12 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:46:55,134-Speed 18297.11 samples/sec Loss 6.3929 LearningRate 0.0767 Epoch: 12 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:46:59,617-Speed 18278.01 samples/sec Loss 6.4342 LearningRate 0.0766 Epoch: 12 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:47:04,037-Speed 18539.74 samples/sec Loss 6.4059 LearningRate 0.0766 Epoch: 12 Global Step: 62850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:47:08,520-Speed 18277.56 samples/sec Loss 6.4250 LearningRate 0.0765 Epoch: 12 Global Step: 62860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:47:12,948-Speed 18504.67 samples/sec Loss 6.3944 LearningRate 0.0765 Epoch: 12 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:17,351-Speed 18610.19 samples/sec Loss 6.4189 LearningRate 0.0765 Epoch: 12 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:21,801-Speed 18413.29 samples/sec Loss 6.4126 LearningRate 0.0764 Epoch: 12 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:30,800-Speed 9103.64 samples/sec Loss 6.3967 LearningRate 0.0764 Epoch: 12 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:35,270-Speed 18332.08 samples/sec Loss 6.4267 LearningRate 0.0764 Epoch: 12 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:39,723-Speed 18403.67 samples/sec Loss 6.4327 LearningRate 0.0763 Epoch: 12 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:44,143-Speed 18539.75 samples/sec Loss 6.4136 LearningRate 0.0763 Epoch: 12 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:48,538-Speed 18645.54 samples/sec Loss 6.4372 LearningRate 0.0762 Epoch: 12 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:52,930-Speed 18653.08 samples/sec Loss 6.4260 LearningRate 0.0762 Epoch: 12 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:47:57,365-Speed 18476.49 samples/sec Loss 6.3966 LearningRate 0.0762 Epoch: 12 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:01,780-Speed 18558.59 samples/sec Loss 6.3646 LearningRate 0.0761 Epoch: 12 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 05:48:06,202-Speed 18529.56 samples/sec Loss 6.4647 LearningRate 0.0761 Epoch: 12 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:10,630-Speed 18518.70 samples/sec Loss 6.4149 LearningRate 0.0761 Epoch: 12 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:15,067-Speed 18466.21 samples/sec Loss 6.3915 LearningRate 0.0760 Epoch: 12 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:19,556-Speed 18254.93 samples/sec Loss 6.3958 LearningRate 0.0760 Epoch: 12 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:24,060-Speed 18190.34 samples/sec Loss 6.3746 LearningRate 0.0759 Epoch: 12 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:28,536-Speed 18310.13 samples/sec Loss 6.3798 LearningRate 0.0759 Epoch: 12 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:32,976-Speed 18454.31 samples/sec Loss 6.3926 LearningRate 0.0759 Epoch: 12 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:37,386-Speed 18584.91 samples/sec Loss 6.3976 LearningRate 0.0758 Epoch: 12 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:41,794-Speed 18588.29 samples/sec Loss 6.4491 LearningRate 0.0758 Epoch: 12 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:46,220-Speed 18514.62 samples/sec Loss 6.4103 LearningRate 0.0758 Epoch: 12 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:50,671-Speed 18409.85 samples/sec Loss 6.3785 LearningRate 0.0757 Epoch: 12 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:55,154-Speed 18277.42 samples/sec Loss 6.3709 LearningRate 0.0757 Epoch: 12 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:48:59,599-Speed 18438.62 samples/sec Loss 6.3857 LearningRate 0.0757 Epoch: 12 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:49:04,064-Speed 18349.49 samples/sec Loss 6.3923 LearningRate 0.0756 Epoch: 12 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:49:08,483-Speed 18543.66 samples/sec Loss 6.4293 LearningRate 0.0756 Epoch: 12 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:49:12,989-Speed 18186.22 samples/sec Loss 6.3615 LearningRate 0.0755 Epoch: 12 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:49:17,428-Speed 18463.18 samples/sec Loss 6.4200 LearningRate 0.0755 Epoch: 12 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:49:21,884-Speed 18386.43 samples/sec Loss 6.3898 LearningRate 0.0755 Epoch: 12 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:26,387-Speed 18196.79 samples/sec Loss 6.3922 LearningRate 0.0754 Epoch: 12 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:30,845-Speed 18384.86 samples/sec Loss 6.3751 LearningRate 0.0754 Epoch: 12 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:35,335-Speed 18248.75 samples/sec Loss 6.3818 LearningRate 0.0754 Epoch: 12 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:39,800-Speed 18351.06 samples/sec Loss 6.3997 LearningRate 0.0753 Epoch: 12 Global Step: 63190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:44,267-Speed 18344.17 samples/sec Loss 6.3887 LearningRate 0.0753 Epoch: 12 Global Step: 63200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:48,664-Speed 18637.35 samples/sec Loss 6.3870 LearningRate 0.0752 Epoch: 12 Global Step: 63210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:53,089-Speed 18517.45 samples/sec Loss 6.4067 LearningRate 0.0752 Epoch: 12 Global Step: 63220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:49:57,551-Speed 18364.62 samples/sec Loss 6.3926 LearningRate 0.0752 Epoch: 12 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:01,988-Speed 18471.94 samples/sec Loss 6.4294 LearningRate 0.0751 Epoch: 12 Global Step: 63240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:06,388-Speed 18620.65 samples/sec Loss 6.3623 LearningRate 0.0751 Epoch: 12 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:50:10,849-Speed 18367.29 samples/sec Loss 6.3685 LearningRate 0.0751 Epoch: 12 Global Step: 63260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:15,311-Speed 18364.18 samples/sec Loss 6.3576 LearningRate 0.0750 Epoch: 12 Global Step: 63270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:19,751-Speed 18457.46 samples/sec Loss 6.3950 LearningRate 0.0750 Epoch: 12 Global Step: 63280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:24,216-Speed 18350.98 samples/sec Loss 6.3677 LearningRate 0.0749 Epoch: 12 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:28,692-Speed 18305.31 samples/sec Loss 6.4085 LearningRate 0.0749 Epoch: 12 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:33,108-Speed 18558.85 samples/sec Loss 6.3566 LearningRate 0.0749 Epoch: 12 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:37,560-Speed 18404.76 samples/sec Loss 6.4049 LearningRate 0.0748 Epoch: 12 Global Step: 63320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:41,995-Speed 18476.77 samples/sec Loss 6.3711 LearningRate 0.0748 Epoch: 12 Global Step: 63330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:46,434-Speed 18461.85 samples/sec Loss 6.4403 LearningRate 0.0748 Epoch: 12 Global Step: 63340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:50,869-Speed 18476.57 samples/sec Loss 6.4215 LearningRate 0.0747 Epoch: 12 Global Step: 63350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:50:55,389-Speed 18127.42 samples/sec Loss 6.4037 LearningRate 0.0747 Epoch: 12 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:50:59,862-Speed 18321.87 samples/sec Loss 6.3976 LearningRate 0.0746 Epoch: 12 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:51:04,295-Speed 18481.79 samples/sec Loss 6.4096 LearningRate 0.0746 Epoch: 12 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:51:08,736-Speed 18452.67 samples/sec Loss 6.3815 LearningRate 0.0746 Epoch: 12 Global Step: 63390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:13,181-Speed 18434.93 samples/sec Loss 6.3479 LearningRate 0.0745 Epoch: 12 Global Step: 63400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:17,608-Speed 18510.26 samples/sec Loss 6.3261 LearningRate 0.0745 Epoch: 12 Global Step: 63410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:22,067-Speed 18376.85 samples/sec Loss 6.3840 LearningRate 0.0745 Epoch: 12 Global Step: 63420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:26,558-Speed 18242.33 samples/sec Loss 6.3739 LearningRate 0.0744 Epoch: 12 Global Step: 63430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:31,090-Speed 18082.80 samples/sec Loss 6.4103 LearningRate 0.0744 Epoch: 12 Global Step: 63440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:35,588-Speed 18216.87 samples/sec Loss 6.4068 LearningRate 0.0744 Epoch: 12 Global Step: 63450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:40,037-Speed 18420.75 samples/sec Loss 6.3685 LearningRate 0.0743 Epoch: 12 Global Step: 63460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:44,478-Speed 18446.81 samples/sec Loss 6.3670 LearningRate 0.0743 Epoch: 12 Global Step: 63470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:48,937-Speed 18381.02 samples/sec Loss 6.3640 LearningRate 0.0742 Epoch: 12 Global Step: 63480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:51:53,362-Speed 18515.91 samples/sec Loss 6.3423 LearningRate 0.0742 Epoch: 12 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:51:57,788-Speed 18514.03 samples/sec Loss 6.3773 LearningRate 0.0742 Epoch: 12 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:52:02,258-Speed 18331.34 samples/sec Loss 6.4284 LearningRate 0.0741 Epoch: 12 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:52:06,744-Speed 18264.76 samples/sec Loss 6.4089 LearningRate 0.0741 Epoch: 12 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:52:11,221-Speed 18302.24 samples/sec Loss 6.3840 LearningRate 0.0741 Epoch: 12 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:52:15,673-Speed 18409.56 samples/sec Loss 6.3812 LearningRate 0.0740 Epoch: 12 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:52:20,117-Speed 18446.00 samples/sec Loss 6.3577 LearningRate 0.0740 Epoch: 12 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:52:24,592-Speed 18306.90 samples/sec Loss 6.3490 LearningRate 0.0739 Epoch: 12 Global Step: 63560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:52:29,018-Speed 18511.72 samples/sec Loss 6.3802 LearningRate 0.0739 Epoch: 12 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:52:33,467-Speed 18422.03 samples/sec Loss 6.3422 LearningRate 0.0739 Epoch: 12 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:52:37,941-Speed 18313.95 samples/sec Loss 6.3667 LearningRate 0.0738 Epoch: 12 Global Step: 63590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:52:42,437-Speed 18223.29 samples/sec Loss 6.4071 LearningRate 0.0738 Epoch: 12 Global Step: 63600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:52:46,874-Speed 18473.57 samples/sec Loss 6.3948 LearningRate 0.0738 Epoch: 12 Global Step: 63610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:52:51,301-Speed 18505.57 samples/sec Loss 6.3463 LearningRate 0.0737 Epoch: 12 Global Step: 63620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:52:55,722-Speed 18533.41 samples/sec Loss 6.3533 LearningRate 0.0737 Epoch: 12 Global Step: 63630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:53:00,204-Speed 18286.81 samples/sec Loss 6.3663 LearningRate 0.0737 Epoch: 12 Global Step: 63640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:53:04,653-Speed 18423.16 samples/sec Loss 6.3350 LearningRate 0.0736 Epoch: 12 Global Step: 63650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:53:09,098-Speed 18433.24 samples/sec Loss 6.3625 LearningRate 0.0736 Epoch: 12 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:13,574-Speed 18307.83 samples/sec Loss 6.3296 LearningRate 0.0735 Epoch: 12 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:18,053-Speed 18293.93 samples/sec Loss 6.3969 LearningRate 0.0735 Epoch: 12 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:22,556-Speed 18197.34 samples/sec Loss 6.4115 LearningRate 0.0735 Epoch: 12 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:26,986-Speed 18494.69 samples/sec Loss 6.3423 LearningRate 0.0734 Epoch: 12 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:31,465-Speed 18291.98 samples/sec Loss 6.3997 LearningRate 0.0734 Epoch: 12 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:35,923-Speed 18381.87 samples/sec Loss 6.3687 LearningRate 0.0734 Epoch: 12 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:40,386-Speed 18362.07 samples/sec Loss 6.3764 LearningRate 0.0733 Epoch: 12 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:44,828-Speed 18447.07 samples/sec Loss 6.3764 LearningRate 0.0733 Epoch: 12 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:49,293-Speed 18355.10 samples/sec Loss 6.3445 LearningRate 0.0732 Epoch: 12 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:53:53,726-Speed 18487.09 samples/sec Loss 6.4221 LearningRate 0.0732 Epoch: 12 Global Step: 63760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 05:53:58,143-Speed 18553.10 samples/sec Loss 6.3847 LearningRate 0.0732 Epoch: 12 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 05:54:02,564-Speed 18535.04 samples/sec Loss 6.3684 LearningRate 0.0731 Epoch: 12 Global Step: 63780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:06,986-Speed 18530.98 samples/sec Loss 6.2987 LearningRate 0.0731 Epoch: 12 Global Step: 63790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:11,431-Speed 18440.09 samples/sec Loss 6.3810 LearningRate 0.0731 Epoch: 12 Global Step: 63800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:15,884-Speed 18403.54 samples/sec Loss 6.3717 LearningRate 0.0730 Epoch: 12 Global Step: 63810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:20,376-Speed 18240.45 samples/sec Loss 6.3206 LearningRate 0.0730 Epoch: 12 Global Step: 63820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:24,847-Speed 18329.87 samples/sec Loss 6.3582 LearningRate 0.0730 Epoch: 12 Global Step: 63830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:29,295-Speed 18424.10 samples/sec Loss 6.4133 LearningRate 0.0729 Epoch: 12 Global Step: 63840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:33,758-Speed 18360.48 samples/sec Loss 6.3508 LearningRate 0.0729 Epoch: 12 Global Step: 63850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:38,199-Speed 18458.85 samples/sec Loss 6.3864 LearningRate 0.0728 Epoch: 12 Global Step: 63860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:42,651-Speed 18405.88 samples/sec Loss 6.3500 LearningRate 0.0728 Epoch: 12 Global Step: 63870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-01-14 05:54:47,085-Speed 18482.22 samples/sec Loss 6.3725 LearningRate 0.0728 Epoch: 12 Global Step: 63880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:54:51,497-Speed 18575.92 samples/sec Loss 6.3159 LearningRate 0.0727 Epoch: 12 Global Step: 63890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:54:55,881-Speed 18693.84 samples/sec Loss 6.3320 LearningRate 0.0727 Epoch: 12 Global Step: 63900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:00,306-Speed 18520.81 samples/sec Loss 6.3307 LearningRate 0.0727 Epoch: 12 Global Step: 63910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:04,767-Speed 18367.63 samples/sec Loss 6.3512 LearningRate 0.0726 Epoch: 12 Global Step: 63920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:09,177-Speed 18583.67 samples/sec Loss 6.3583 LearningRate 0.0726 Epoch: 12 Global Step: 63930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:13,603-Speed 18512.56 samples/sec Loss 6.3136 LearningRate 0.0726 Epoch: 12 Global Step: 63940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:18,015-Speed 18572.53 samples/sec Loss 6.3467 LearningRate 0.0725 Epoch: 12 Global Step: 63950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:22,447-Speed 18488.15 samples/sec Loss 6.3073 LearningRate 0.0725 Epoch: 12 Global Step: 63960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:26,856-Speed 18588.47 samples/sec Loss 6.3522 LearningRate 0.0724 Epoch: 12 Global Step: 63970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:31,288-Speed 18487.58 samples/sec Loss 6.3363 LearningRate 0.0724 Epoch: 12 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:55:35,679-Speed 18662.03 samples/sec Loss 6.3481 LearningRate 0.0724 Epoch: 12 Global Step: 63990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:40,157-Speed 18297.34 samples/sec Loss 6.3628 LearningRate 0.0723 Epoch: 12 Global Step: 64000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:44,607-Speed 18413.20 samples/sec Loss 6.3596 LearningRate 0.0723 Epoch: 12 Global Step: 64010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:49,105-Speed 18218.35 samples/sec Loss 6.3558 LearningRate 0.0723 Epoch: 12 Global Step: 64020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:53,544-Speed 18461.81 samples/sec Loss 6.3289 LearningRate 0.0722 Epoch: 12 Global Step: 64030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:55:58,029-Speed 18268.90 samples/sec Loss 6.3233 LearningRate 0.0722 Epoch: 12 Global Step: 64040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:02,474-Speed 18435.14 samples/sec Loss 6.3207 LearningRate 0.0721 Epoch: 12 Global Step: 64050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:06,938-Speed 18353.47 samples/sec Loss 6.2959 LearningRate 0.0721 Epoch: 12 Global Step: 64060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:11,380-Speed 18452.79 samples/sec Loss 6.3567 LearningRate 0.0721 Epoch: 12 Global Step: 64070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:15,782-Speed 18613.26 samples/sec Loss 6.3290 LearningRate 0.0720 Epoch: 12 Global Step: 64080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:20,195-Speed 18570.71 samples/sec Loss 6.3234 LearningRate 0.0720 Epoch: 12 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:56:24,754-Speed 17975.55 samples/sec Loss 6.3175 LearningRate 0.0720 Epoch: 12 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:56:29,225-Speed 18327.41 samples/sec Loss 6.3927 LearningRate 0.0719 Epoch: 12 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:56:33,633-Speed 18588.10 samples/sec Loss 6.3516 LearningRate 0.0719 Epoch: 12 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:56:38,077-Speed 18442.22 samples/sec Loss 6.3355 LearningRate 0.0719 Epoch: 12 Global Step: 64130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:47,954-Speed 8294.75 samples/sec Loss 6.3467 LearningRate 0.0718 Epoch: 12 Global Step: 64140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:52,343-Speed 18674.64 samples/sec Loss 6.3530 LearningRate 0.0718 Epoch: 12 Global Step: 64150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:56:56,777-Speed 18478.51 samples/sec Loss 6.3778 LearningRate 0.0717 Epoch: 12 Global Step: 64160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:01,241-Speed 18356.78 samples/sec Loss 6.3285 LearningRate 0.0717 Epoch: 12 Global Step: 64170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:05,622-Speed 18700.50 samples/sec Loss 6.3328 LearningRate 0.0717 Epoch: 12 Global Step: 64180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:10,015-Speed 18655.79 samples/sec Loss 6.3481 LearningRate 0.0716 Epoch: 12 Global Step: 64190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:14,422-Speed 18592.11 samples/sec Loss 6.2672 LearningRate 0.0716 Epoch: 12 Global Step: 64200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:18,854-Speed 18490.07 samples/sec Loss 6.3441 LearningRate 0.0716 Epoch: 12 Global Step: 64210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:23,333-Speed 18295.26 samples/sec Loss 6.3197 LearningRate 0.0715 Epoch: 12 Global Step: 64220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:27,740-Speed 18592.62 samples/sec Loss 6.3246 LearningRate 0.0715 Epoch: 12 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:57:32,181-Speed 18450.14 samples/sec Loss 6.3240 LearningRate 0.0715 Epoch: 12 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:57:36,616-Speed 18477.77 samples/sec Loss 6.3162 LearningRate 0.0714 Epoch: 12 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:57:41,100-Speed 18271.85 samples/sec Loss 6.3756 LearningRate 0.0714 Epoch: 12 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:57:45,526-Speed 18511.58 samples/sec Loss 6.2915 LearningRate 0.0714 Epoch: 12 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:57:49,958-Speed 18492.66 samples/sec Loss 6.3436 LearningRate 0.0713 Epoch: 12 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:57:54,369-Speed 18572.77 samples/sec Loss 6.3216 LearningRate 0.0713 Epoch: 12 Global Step: 64290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:57:58,757-Speed 18675.18 samples/sec Loss 6.2896 LearningRate 0.0712 Epoch: 12 Global Step: 64300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:03,189-Speed 18490.16 samples/sec Loss 6.2986 LearningRate 0.0712 Epoch: 12 Global Step: 64310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:07,636-Speed 18424.06 samples/sec Loss 6.2772 LearningRate 0.0712 Epoch: 12 Global Step: 64320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:12,058-Speed 18537.63 samples/sec Loss 6.3022 LearningRate 0.0711 Epoch: 12 Global Step: 64330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:16,457-Speed 18628.70 samples/sec Loss 6.3311 LearningRate 0.0711 Epoch: 12 Global Step: 64340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:20,874-Speed 18551.26 samples/sec Loss 6.3027 LearningRate 0.0711 Epoch: 12 Global Step: 64350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:25,293-Speed 18542.21 samples/sec Loss 6.3278 LearningRate 0.0710 Epoch: 12 Global Step: 64360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:29,706-Speed 18567.45 samples/sec Loss 6.3077 LearningRate 0.0710 Epoch: 12 Global Step: 64370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:34,113-Speed 18596.38 samples/sec Loss 6.2889 LearningRate 0.0710 Epoch: 12 Global Step: 64380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:58:38,544-Speed 18493.44 samples/sec Loss 6.3023 LearningRate 0.0709 Epoch: 12 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:58:43,013-Speed 18335.20 samples/sec Loss 6.3328 LearningRate 0.0709 Epoch: 12 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:58:47,534-Speed 18126.39 samples/sec Loss 6.3370 LearningRate 0.0708 Epoch: 12 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:58:52,007-Speed 18321.79 samples/sec Loss 6.2702 LearningRate 0.0708 Epoch: 12 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:58:56,454-Speed 18421.72 samples/sec Loss 6.3020 LearningRate 0.0708 Epoch: 12 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:00,932-Speed 18301.46 samples/sec Loss 6.3141 LearningRate 0.0707 Epoch: 12 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:05,375-Speed 18439.35 samples/sec Loss 6.3012 LearningRate 0.0707 Epoch: 12 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:09,818-Speed 18447.37 samples/sec Loss 6.3376 LearningRate 0.0707 Epoch: 12 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:14,341-Speed 18116.39 samples/sec Loss 6.3263 LearningRate 0.0706 Epoch: 12 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:18,777-Speed 18469.73 samples/sec Loss 6.3016 LearningRate 0.0706 Epoch: 12 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:23,202-Speed 18519.87 samples/sec Loss 6.3490 LearningRate 0.0706 Epoch: 12 Global Step: 64490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 05:59:27,652-Speed 18412.08 samples/sec Loss 6.2970 LearningRate 0.0705 Epoch: 12 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:32,083-Speed 18491.18 samples/sec Loss 6.2941 LearningRate 0.0705 Epoch: 12 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:36,520-Speed 18468.23 samples/sec Loss 6.3294 LearningRate 0.0704 Epoch: 12 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:40,932-Speed 18574.17 samples/sec Loss 6.2835 LearningRate 0.0704 Epoch: 12 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:45,439-Speed 18179.85 samples/sec Loss 6.2833 LearningRate 0.0704 Epoch: 12 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 05:59:49,932-Speed 18238.47 samples/sec Loss 6.3334 LearningRate 0.0703 Epoch: 12 Global Step: 64550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:59:54,408-Speed 18306.24 samples/sec Loss 6.3356 LearningRate 0.0703 Epoch: 12 Global Step: 64560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 05:59:58,842-Speed 18481.74 samples/sec Loss 6.3243 LearningRate 0.0703 Epoch: 12 Global Step: 64570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:03,271-Speed 18498.44 samples/sec Loss 6.2575 LearningRate 0.0702 Epoch: 12 Global Step: 64580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:07,764-Speed 18243.67 samples/sec Loss 6.3347 LearningRate 0.0702 Epoch: 12 Global Step: 64590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:12,168-Speed 18611.62 samples/sec Loss 6.3321 LearningRate 0.0702 Epoch: 12 Global Step: 64600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:16,599-Speed 18494.05 samples/sec Loss 6.2805 LearningRate 0.0701 Epoch: 12 Global Step: 64610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:21,071-Speed 18325.73 samples/sec Loss 6.3493 LearningRate 0.0701 Epoch: 12 Global Step: 64620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:25,536-Speed 18354.38 samples/sec Loss 6.2686 LearningRate 0.0701 Epoch: 12 Global Step: 64630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:29,977-Speed 18451.36 samples/sec Loss 6.2947 LearningRate 0.0700 Epoch: 12 Global Step: 64640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:34,493-Speed 18143.27 samples/sec Loss 6.3655 LearningRate 0.0700 Epoch: 12 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:00:38,928-Speed 18477.45 samples/sec Loss 6.2999 LearningRate 0.0699 Epoch: 12 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:00:43,379-Speed 18410.50 samples/sec Loss 6.2732 LearningRate 0.0699 Epoch: 12 Global Step: 64670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:47,805-Speed 18514.35 samples/sec Loss 6.2521 LearningRate 0.0699 Epoch: 12 Global Step: 64680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:52,305-Speed 18212.81 samples/sec Loss 6.2981 LearningRate 0.0698 Epoch: 12 Global Step: 64690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:00:56,707-Speed 18615.53 samples/sec Loss 6.2919 LearningRate 0.0698 Epoch: 12 Global Step: 64700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:01:01,154-Speed 18426.44 samples/sec Loss 6.3032 LearningRate 0.0698 Epoch: 12 Global Step: 64710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:01:05,597-Speed 18443.74 samples/sec Loss 6.2904 LearningRate 0.0697 Epoch: 12 Global Step: 64720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:01:10,060-Speed 18380.31 samples/sec Loss 6.2920 LearningRate 0.0697 Epoch: 12 Global Step: 64730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:01:14,507-Speed 18424.19 samples/sec Loss 6.3193 LearningRate 0.0697 Epoch: 12 Global Step: 64740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:01:18,955-Speed 18425.52 samples/sec Loss 6.2972 LearningRate 0.0696 Epoch: 12 Global Step: 64750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:01:23,395-Speed 18457.37 samples/sec Loss 6.3446 LearningRate 0.0696 Epoch: 12 Global Step: 64760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:01:27,831-Speed 18473.01 samples/sec Loss 6.3146 LearningRate 0.0696 Epoch: 12 Global Step: 64770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:01:32,275-Speed 18436.87 samples/sec Loss 6.3156 LearningRate 0.0695 Epoch: 12 Global Step: 64780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:01:36,687-Speed 18572.13 samples/sec Loss 6.3103 LearningRate 0.0695 Epoch: 12 Global Step: 64790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:01:41,119-Speed 18486.87 samples/sec Loss 6.3408 LearningRate 0.0694 Epoch: 12 Global Step: 64800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:01:45,561-Speed 18446.84 samples/sec Loss 6.2920 LearningRate 0.0694 Epoch: 12 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:01:50,027-Speed 18347.92 samples/sec Loss 6.2959 LearningRate 0.0694 Epoch: 12 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:01:54,467-Speed 18454.90 samples/sec Loss 6.3017 LearningRate 0.0693 Epoch: 12 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:01:58,910-Speed 18441.42 samples/sec Loss 6.2833 LearningRate 0.0693 Epoch: 12 Global Step: 64840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:02:03,317-Speed 18596.95 samples/sec Loss 6.2662 LearningRate 0.0693 Epoch: 12 Global Step: 64850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:02:07,747-Speed 18493.17 samples/sec Loss 6.2603 LearningRate 0.0692 Epoch: 12 Global Step: 64860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:02:12,232-Speed 18271.45 samples/sec Loss 6.2772 LearningRate 0.0692 Epoch: 12 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 06:02:16,747-Speed 18149.65 samples/sec Loss 6.2695 LearningRate 0.0692 Epoch: 12 Global Step: 64880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:02:21,208-Speed 18370.47 samples/sec Loss 6.2899 LearningRate 0.0691 Epoch: 12 Global Step: 64890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:02:25,628-Speed 18541.11 samples/sec Loss 6.3109 LearningRate 0.0691 Epoch: 12 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:02:30,066-Speed 18471.18 samples/sec Loss 6.2804 LearningRate 0.0691 Epoch: 12 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:02:34,497-Speed 18495.40 samples/sec Loss 6.3123 LearningRate 0.0690 Epoch: 12 Global Step: 64920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:02:38,946-Speed 18424.23 samples/sec Loss 6.3067 LearningRate 0.0690 Epoch: 12 Global Step: 64930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:02:43,353-Speed 18597.38 samples/sec Loss 6.2873 LearningRate 0.0689 Epoch: 12 Global Step: 64940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:02:47,784-Speed 18492.22 samples/sec Loss 6.3030 LearningRate 0.0689 Epoch: 12 Global Step: 64950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:02:52,248-Speed 18352.81 samples/sec Loss 6.2693 LearningRate 0.0689 Epoch: 12 Global Step: 64960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:02:56,682-Speed 18482.87 samples/sec Loss 6.3049 LearningRate 0.0688 Epoch: 12 Global Step: 64970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:01,162-Speed 18290.37 samples/sec Loss 6.2604 LearningRate 0.0688 Epoch: 12 Global Step: 64980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:05,606-Speed 18437.49 samples/sec Loss 6.2478 LearningRate 0.0688 Epoch: 12 Global Step: 64990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:10,064-Speed 18387.05 samples/sec Loss 6.3242 LearningRate 0.0687 Epoch: 12 Global Step: 65000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:14,527-Speed 18359.89 samples/sec Loss 6.3068 LearningRate 0.0687 Epoch: 12 Global Step: 65010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:19,005-Speed 18299.16 samples/sec Loss 6.2708 LearningRate 0.0687 Epoch: 12 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:03:23,489-Speed 18273.59 samples/sec Loss 6.2747 LearningRate 0.0686 Epoch: 12 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:03:27,927-Speed 18460.51 samples/sec Loss 6.2612 LearningRate 0.0686 Epoch: 12 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:03:32,356-Speed 18500.74 samples/sec Loss 6.2920 LearningRate 0.0686 Epoch: 12 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:03:36,836-Speed 18292.90 samples/sec Loss 6.2428 LearningRate 0.0685 Epoch: 12 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:03:41,288-Speed 18406.44 samples/sec Loss 6.2379 LearningRate 0.0685 Epoch: 12 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:03:45,753-Speed 18351.04 samples/sec Loss 6.3007 LearningRate 0.0684 Epoch: 12 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:50,241-Speed 18258.76 samples/sec Loss 6.2680 LearningRate 0.0684 Epoch: 12 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:54,638-Speed 18633.45 samples/sec Loss 6.2900 LearningRate 0.0684 Epoch: 12 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:03:59,053-Speed 18559.59 samples/sec Loss 6.2783 LearningRate 0.0683 Epoch: 12 Global Step: 65110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:04:03,467-Speed 18563.11 samples/sec Loss 6.3058 LearningRate 0.0683 Epoch: 12 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:04:07,897-Speed 18495.97 samples/sec Loss 6.2996 LearningRate 0.0683 Epoch: 12 Global Step: 65130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:04:12,325-Speed 18503.17 samples/sec Loss 6.2673 LearningRate 0.0682 Epoch: 12 Global Step: 65140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:04:16,765-Speed 18458.12 samples/sec Loss 6.3141 LearningRate 0.0682 Epoch: 12 Global Step: 65150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:04:21,223-Speed 18380.55 samples/sec Loss 6.2764 LearningRate 0.0682 Epoch: 12 Global Step: 65160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:04:25,672-Speed 18413.01 samples/sec Loss 6.2580 LearningRate 0.0681 Epoch: 12 Global Step: 65170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:04:30,085-Speed 18571.59 samples/sec Loss 6.3205 LearningRate 0.0681 Epoch: 12 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:04:34,530-Speed 18432.69 samples/sec Loss 6.2999 LearningRate 0.0681 Epoch: 12 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:04:38,947-Speed 18556.82 samples/sec Loss 6.2330 LearningRate 0.0680 Epoch: 12 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:04:43,386-Speed 18464.72 samples/sec Loss 6.2566 LearningRate 0.0680 Epoch: 12 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:04:47,822-Speed 18476.41 samples/sec Loss 6.2841 LearningRate 0.0680 Epoch: 12 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:04:52,296-Speed 18317.92 samples/sec Loss 6.2716 LearningRate 0.0679 Epoch: 12 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:04:56,769-Speed 18320.11 samples/sec Loss 6.3011 LearningRate 0.0679 Epoch: 12 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:01,218-Speed 18416.94 samples/sec Loss 6.2536 LearningRate 0.0678 Epoch: 12 Global Step: 65250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:05,674-Speed 18388.32 samples/sec Loss 6.2694 LearningRate 0.0678 Epoch: 12 Global Step: 65260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:10,107-Speed 18487.04 samples/sec Loss 6.3037 LearningRate 0.0678 Epoch: 12 Global Step: 65270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:14,557-Speed 18412.02 samples/sec Loss 6.2539 LearningRate 0.0677 Epoch: 12 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:18,997-Speed 18458.04 samples/sec Loss 6.2672 LearningRate 0.0677 Epoch: 12 Global Step: 65290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:23,401-Speed 18605.31 samples/sec Loss 6.2147 LearningRate 0.0677 Epoch: 12 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:27,832-Speed 18495.39 samples/sec Loss 6.2344 LearningRate 0.0676 Epoch: 12 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:32,244-Speed 18572.37 samples/sec Loss 6.2428 LearningRate 0.0676 Epoch: 12 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:36,704-Speed 18372.42 samples/sec Loss 6.2909 LearningRate 0.0676 Epoch: 12 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:41,121-Speed 18554.17 samples/sec Loss 6.2970 LearningRate 0.0675 Epoch: 12 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:05:45,600-Speed 18292.94 samples/sec Loss 6.2721 LearningRate 0.0675 Epoch: 12 Global Step: 65350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:05:50,011-Speed 18580.60 samples/sec Loss 6.3072 LearningRate 0.0675 Epoch: 12 Global Step: 65360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:05:54,471-Speed 18369.96 samples/sec Loss 6.2531 LearningRate 0.0674 Epoch: 12 Global Step: 65370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:05:58,923-Speed 18406.10 samples/sec Loss 6.2857 LearningRate 0.0674 Epoch: 12 Global Step: 65380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:06:03,378-Speed 18396.70 samples/sec Loss 6.2914 LearningRate 0.0674 Epoch: 12 Global Step: 65390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:06:07,843-Speed 18352.95 samples/sec Loss 6.2489 LearningRate 0.0673 Epoch: 12 Global Step: 65400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:06:12,255-Speed 18573.35 samples/sec Loss 6.2462 LearningRate 0.0673 Epoch: 12 Global Step: 65410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:06:16,647-Speed 18656.51 samples/sec Loss 6.2369 LearningRate 0.0672 Epoch: 12 Global Step: 65420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:06:21,035-Speed 18679.62 samples/sec Loss 6.2666 LearningRate 0.0672 Epoch: 12 Global Step: 65430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:06:25,475-Speed 18461.47 samples/sec Loss 6.2586 LearningRate 0.0672 Epoch: 12 Global Step: 65440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:06:30,016-Speed 18043.75 samples/sec Loss 6.2442 LearningRate 0.0671 Epoch: 12 Global Step: 65450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:06:34,507-Speed 18244.96 samples/sec Loss 6.2758 LearningRate 0.0671 Epoch: 12 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:06:38,939-Speed 18488.65 samples/sec Loss 6.2649 LearningRate 0.0671 Epoch: 12 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:06:43,387-Speed 18425.09 samples/sec Loss 6.3107 LearningRate 0.0670 Epoch: 12 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:06:47,803-Speed 18554.22 samples/sec Loss 6.2347 LearningRate 0.0670 Epoch: 12 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:06:52,239-Speed 18477.19 samples/sec Loss 6.1828 LearningRate 0.0670 Epoch: 12 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:06:56,643-Speed 18605.43 samples/sec Loss 6.2492 LearningRate 0.0669 Epoch: 12 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:07:01,067-Speed 18517.20 samples/sec Loss 6.2255 LearningRate 0.0669 Epoch: 12 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:07:05,472-Speed 18606.15 samples/sec Loss 6.2709 LearningRate 0.0669 Epoch: 12 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:07:09,939-Speed 18346.76 samples/sec Loss 6.2311 LearningRate 0.0668 Epoch: 12 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:07:14,338-Speed 18634.08 samples/sec Loss 6.2169 LearningRate 0.0668 Epoch: 12 Global Step: 65550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-14 06:07:18,760-Speed 18531.92 samples/sec Loss 6.2459 LearningRate 0.0668 Epoch: 12 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:07:23,278-Speed 18136.64 samples/sec Loss 6.2879 LearningRate 0.0667 Epoch: 12 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:07:27,735-Speed 18384.98 samples/sec Loss 6.1988 LearningRate 0.0667 Epoch: 12 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:07:32,225-Speed 18252.96 samples/sec Loss 6.2157 LearningRate 0.0667 Epoch: 12 Global Step: 65590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:07:36,683-Speed 18380.32 samples/sec Loss 6.2614 LearningRate 0.0666 Epoch: 12 Global Step: 65600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:07:41,103-Speed 18540.42 samples/sec Loss 6.2228 LearningRate 0.0666 Epoch: 12 Global Step: 65610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:07:45,507-Speed 18607.38 samples/sec Loss 6.2151 LearningRate 0.0665 Epoch: 12 Global Step: 65620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:07:49,967-Speed 18373.68 samples/sec Loss 6.2909 LearningRate 0.0665 Epoch: 12 Global Step: 65630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:07:54,406-Speed 18459.72 samples/sec Loss 6.2617 LearningRate 0.0665 Epoch: 12 Global Step: 65640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:07:58,910-Speed 18194.28 samples/sec Loss 6.2442 LearningRate 0.0664 Epoch: 12 Global Step: 65650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:08:03,345-Speed 18474.68 samples/sec Loss 6.2791 LearningRate 0.0664 Epoch: 12 Global Step: 65660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:08:07,805-Speed 18372.73 samples/sec Loss 6.2389 LearningRate 0.0664 Epoch: 12 Global Step: 65670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:08:12,239-Speed 18479.63 samples/sec Loss 6.2636 LearningRate 0.0663 Epoch: 12 Global Step: 65680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:08:16,684-Speed 18434.14 samples/sec Loss 6.2446 LearningRate 0.0663 Epoch: 12 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:08:21,112-Speed 18506.69 samples/sec Loss 6.2683 LearningRate 0.0663 Epoch: 12 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:08:25,541-Speed 18501.51 samples/sec Loss 6.2491 LearningRate 0.0662 Epoch: 12 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:08:29,954-Speed 18570.54 samples/sec Loss 6.2087 LearningRate 0.0662 Epoch: 12 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:08:34,369-Speed 18559.97 samples/sec Loss 6.2326 LearningRate 0.0662 Epoch: 12 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:08:38,785-Speed 18558.75 samples/sec Loss 6.1990 LearningRate 0.0661 Epoch: 12 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:08:43,180-Speed 18644.19 samples/sec Loss 6.2041 LearningRate 0.0661 Epoch: 12 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:08:47,624-Speed 18436.68 samples/sec Loss 6.2290 LearningRate 0.0661 Epoch: 12 Global Step: 65760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:08:52,046-Speed 18531.78 samples/sec Loss 6.2320 LearningRate 0.0660 Epoch: 12 Global Step: 65770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:08:56,474-Speed 18506.11 samples/sec Loss 6.2314 LearningRate 0.0660 Epoch: 12 Global Step: 65780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:00,893-Speed 18539.98 samples/sec Loss 6.2243 LearningRate 0.0660 Epoch: 12 Global Step: 65790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:05,313-Speed 18538.06 samples/sec Loss 6.2663 LearningRate 0.0659 Epoch: 12 Global Step: 65800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:09,757-Speed 18439.61 samples/sec Loss 6.2794 LearningRate 0.0659 Epoch: 12 Global Step: 65810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:14,169-Speed 18573.74 samples/sec Loss 6.2426 LearningRate 0.0658 Epoch: 12 Global Step: 65820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:18,613-Speed 18436.14 samples/sec Loss 6.2434 LearningRate 0.0658 Epoch: 12 Global Step: 65830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:23,039-Speed 18512.47 samples/sec Loss 6.1987 LearningRate 0.0658 Epoch: 12 Global Step: 65840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:27,485-Speed 18431.94 samples/sec Loss 6.2375 LearningRate 0.0657 Epoch: 12 Global Step: 65850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:31,886-Speed 18617.91 samples/sec Loss 6.2228 LearningRate 0.0657 Epoch: 12 Global Step: 65860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:36,290-Speed 18607.15 samples/sec Loss 6.2417 LearningRate 0.0657 Epoch: 12 Global Step: 65870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:40,710-Speed 18536.80 samples/sec Loss 6.1952 LearningRate 0.0656 Epoch: 12 Global Step: 65880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:45,128-Speed 18543.40 samples/sec Loss 6.2676 LearningRate 0.0656 Epoch: 12 Global Step: 65890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:49,522-Speed 18649.87 samples/sec Loss 6.1992 LearningRate 0.0656 Epoch: 12 Global Step: 65900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:53,936-Speed 18562.85 samples/sec Loss 6.2219 LearningRate 0.0655 Epoch: 12 Global Step: 65910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:09:58,357-Speed 18535.31 samples/sec Loss 6.2243 LearningRate 0.0655 Epoch: 12 Global Step: 65920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:10:02,820-Speed 18362.55 samples/sec Loss 6.2201 LearningRate 0.0655 Epoch: 12 Global Step: 65930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:10:07,205-Speed 18685.82 samples/sec Loss 6.2301 LearningRate 0.0654 Epoch: 12 Global Step: 65940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:10:11,625-Speed 18544.18 samples/sec Loss 6.2020 LearningRate 0.0654 Epoch: 12 Global Step: 65950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:10:16,007-Speed 18699.64 samples/sec Loss 6.2578 LearningRate 0.0654 Epoch: 12 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:20,463-Speed 18397.25 samples/sec Loss 6.2093 LearningRate 0.0653 Epoch: 12 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:24,865-Speed 18618.67 samples/sec Loss 6.1801 LearningRate 0.0653 Epoch: 12 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:29,274-Speed 18585.71 samples/sec Loss 6.2200 LearningRate 0.0653 Epoch: 12 Global Step: 65990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:33,672-Speed 18630.59 samples/sec Loss 6.2419 LearningRate 0.0652 Epoch: 12 Global Step: 66000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:38,084-Speed 18571.73 samples/sec Loss 6.2028 LearningRate 0.0652 Epoch: 12 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:42,502-Speed 18552.67 samples/sec Loss 6.1874 LearningRate 0.0652 Epoch: 12 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:46,938-Speed 18470.25 samples/sec Loss 6.2054 LearningRate 0.0651 Epoch: 12 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:51,425-Speed 18264.69 samples/sec Loss 6.1907 LearningRate 0.0651 Epoch: 12 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:10:55,852-Speed 18508.53 samples/sec Loss 6.2308 LearningRate 0.0651 Epoch: 12 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:11:00,300-Speed 18421.10 samples/sec Loss 6.1948 LearningRate 0.0650 Epoch: 12 Global Step: 66060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:11:04,713-Speed 18571.23 samples/sec Loss 6.2188 LearningRate 0.0650 Epoch: 12 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:11:09,147-Speed 18477.34 samples/sec Loss 6.2339 LearningRate 0.0649 Epoch: 12 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:11:13,605-Speed 18387.27 samples/sec Loss 6.2464 LearningRate 0.0649 Epoch: 12 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:11:18,044-Speed 18462.46 samples/sec Loss 6.1970 LearningRate 0.0649 Epoch: 12 Global Step: 66100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:22,478-Speed 18481.71 samples/sec Loss 6.2138 LearningRate 0.0648 Epoch: 12 Global Step: 66110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:26,878-Speed 18622.51 samples/sec Loss 6.2332 LearningRate 0.0648 Epoch: 12 Global Step: 66120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:31,306-Speed 18506.17 samples/sec Loss 6.2065 LearningRate 0.0648 Epoch: 12 Global Step: 66130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:35,846-Speed 18047.71 samples/sec Loss 6.2027 LearningRate 0.0647 Epoch: 12 Global Step: 66140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:40,245-Speed 18633.60 samples/sec Loss 6.2011 LearningRate 0.0647 Epoch: 12 Global Step: 66150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:44,646-Speed 18624.49 samples/sec Loss 6.1979 LearningRate 0.0647 Epoch: 12 Global Step: 66160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:49,069-Speed 18527.48 samples/sec Loss 6.2233 LearningRate 0.0646 Epoch: 12 Global Step: 66170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:53,470-Speed 18617.70 samples/sec Loss 6.2010 LearningRate 0.0646 Epoch: 12 Global Step: 66180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:11:57,888-Speed 18547.15 samples/sec Loss 6.1826 LearningRate 0.0646 Epoch: 12 Global Step: 66190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:02,304-Speed 18557.44 samples/sec Loss 6.2348 LearningRate 0.0645 Epoch: 12 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:12:06,669-Speed 18770.10 samples/sec Loss 6.2412 LearningRate 0.0645 Epoch: 12 Global Step: 66210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:11,137-Speed 18339.86 samples/sec Loss 6.2367 LearningRate 0.0645 Epoch: 12 Global Step: 66220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:15,577-Speed 18456.10 samples/sec Loss 6.2184 LearningRate 0.0644 Epoch: 12 Global Step: 66230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:20,003-Speed 18514.65 samples/sec Loss 6.2115 LearningRate 0.0644 Epoch: 12 Global Step: 66240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:24,413-Speed 18583.56 samples/sec Loss 6.2134 LearningRate 0.0644 Epoch: 12 Global Step: 66250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:28,839-Speed 18513.94 samples/sec Loss 6.1855 LearningRate 0.0643 Epoch: 12 Global Step: 66260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:33,277-Speed 18462.31 samples/sec Loss 6.2252 LearningRate 0.0643 Epoch: 12 Global Step: 66270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:37,722-Speed 18432.70 samples/sec Loss 6.1949 LearningRate 0.0643 Epoch: 12 Global Step: 66280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:42,118-Speed 18642.95 samples/sec Loss 6.2065 LearningRate 0.0642 Epoch: 12 Global Step: 66290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:46,535-Speed 18550.63 samples/sec Loss 6.2126 LearningRate 0.0642 Epoch: 12 Global Step: 66300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:12:50,958-Speed 18523.73 samples/sec Loss 6.2237 LearningRate 0.0642 Epoch: 12 Global Step: 66310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:12:55,396-Speed 18464.62 samples/sec Loss 6.1740 LearningRate 0.0641 Epoch: 12 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:12:59,808-Speed 18568.33 samples/sec Loss 6.2392 LearningRate 0.0641 Epoch: 12 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:04,257-Speed 18432.39 samples/sec Loss 6.1929 LearningRate 0.0641 Epoch: 12 Global Step: 66340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:08,689-Speed 18491.08 samples/sec Loss 6.2103 LearningRate 0.0640 Epoch: 12 Global Step: 66350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:13,184-Speed 18232.49 samples/sec Loss 6.2385 LearningRate 0.0640 Epoch: 12 Global Step: 66360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:17,614-Speed 18495.34 samples/sec Loss 6.2120 LearningRate 0.0639 Epoch: 12 Global Step: 66370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:22,045-Speed 18492.96 samples/sec Loss 6.1923 LearningRate 0.0639 Epoch: 12 Global Step: 66380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:26,444-Speed 18624.84 samples/sec Loss 6.1735 LearningRate 0.0639 Epoch: 12 Global Step: 66390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:30,864-Speed 18535.36 samples/sec Loss 6.1467 LearningRate 0.0638 Epoch: 12 Global Step: 66400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:35,284-Speed 18540.69 samples/sec Loss 6.1774 LearningRate 0.0638 Epoch: 12 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:13:39,667-Speed 18697.36 samples/sec Loss 6.1619 LearningRate 0.0638 Epoch: 12 Global Step: 66420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:13:44,104-Speed 18465.36 samples/sec Loss 6.2160 LearningRate 0.0637 Epoch: 12 Global Step: 66430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:13:48,562-Speed 18383.33 samples/sec Loss 6.2251 LearningRate 0.0637 Epoch: 12 Global Step: 66440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:13:52,988-Speed 18512.04 samples/sec Loss 6.1842 LearningRate 0.0637 Epoch: 12 Global Step: 66450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:13:57,407-Speed 18545.41 samples/sec Loss 6.1572 LearningRate 0.0636 Epoch: 12 Global Step: 66460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:01,860-Speed 18406.54 samples/sec Loss 6.2031 LearningRate 0.0636 Epoch: 12 Global Step: 66470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:06,328-Speed 18344.70 samples/sec Loss 6.2068 LearningRate 0.0636 Epoch: 12 Global Step: 66480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:10,777-Speed 18424.94 samples/sec Loss 6.2119 LearningRate 0.0635 Epoch: 12 Global Step: 66490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:15,193-Speed 18563.05 samples/sec Loss 6.1800 LearningRate 0.0635 Epoch: 12 Global Step: 66500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:19,616-Speed 18527.50 samples/sec Loss 6.2160 LearningRate 0.0635 Epoch: 12 Global Step: 66510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:24,099-Speed 18277.71 samples/sec Loss 6.1656 LearningRate 0.0634 Epoch: 12 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:14:28,551-Speed 18405.33 samples/sec Loss 6.1967 LearningRate 0.0634 Epoch: 12 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:14:32,962-Speed 18578.98 samples/sec Loss 6.1714 LearningRate 0.0634 Epoch: 12 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:14:37,413-Speed 18408.13 samples/sec Loss 6.2361 LearningRate 0.0633 Epoch: 12 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:14:41,857-Speed 18443.83 samples/sec Loss 6.1890 LearningRate 0.0633 Epoch: 12 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:14:46,254-Speed 18635.89 samples/sec Loss 6.1725 LearningRate 0.0633 Epoch: 12 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:14:50,686-Speed 18487.56 samples/sec Loss 6.1668 LearningRate 0.0632 Epoch: 12 Global Step: 66580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:55,124-Speed 18460.42 samples/sec Loss 6.1846 LearningRate 0.0632 Epoch: 12 Global Step: 66590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:14:59,557-Speed 18488.22 samples/sec Loss 6.1944 LearningRate 0.0632 Epoch: 12 Global Step: 66600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:03,976-Speed 18542.61 samples/sec Loss 6.1852 LearningRate 0.0631 Epoch: 12 Global Step: 66610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:08,403-Speed 18507.53 samples/sec Loss 6.1782 LearningRate 0.0631 Epoch: 12 Global Step: 66620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:12,826-Speed 18532.07 samples/sec Loss 6.1833 LearningRate 0.0631 Epoch: 12 Global Step: 66630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:17,316-Speed 18248.61 samples/sec Loss 6.1913 LearningRate 0.0630 Epoch: 12 Global Step: 66640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:21,787-Speed 18328.10 samples/sec Loss 6.1780 LearningRate 0.0630 Epoch: 12 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:26,231-Speed 18439.34 samples/sec Loss 6.1902 LearningRate 0.0630 Epoch: 12 Global Step: 66660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:30,655-Speed 18526.16 samples/sec Loss 6.1674 LearningRate 0.0629 Epoch: 12 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:15:35,103-Speed 18420.18 samples/sec Loss 6.1736 LearningRate 0.0629 Epoch: 12 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:15:39,584-Speed 18286.12 samples/sec Loss 6.1410 LearningRate 0.0629 Epoch: 12 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:15:44,016-Speed 18496.65 samples/sec Loss 6.1923 LearningRate 0.0628 Epoch: 12 Global Step: 66700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:15:48,431-Speed 18562.48 samples/sec Loss 6.1636 LearningRate 0.0628 Epoch: 12 Global Step: 66710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:15:52,842-Speed 18575.78 samples/sec Loss 6.1597 LearningRate 0.0628 Epoch: 12 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:15:57,313-Speed 18326.64 samples/sec Loss 6.1638 LearningRate 0.0627 Epoch: 12 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:16:01,738-Speed 18518.05 samples/sec Loss 6.2174 LearningRate 0.0627 Epoch: 12 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:16:06,173-Speed 18477.43 samples/sec Loss 6.1383 LearningRate 0.0627 Epoch: 12 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:16:10,576-Speed 18619.27 samples/sec Loss 6.1489 LearningRate 0.0626 Epoch: 12 Global Step: 66760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:16:14,993-Speed 18550.51 samples/sec Loss 6.1664 LearningRate 0.0626 Epoch: 12 Global Step: 66770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:19,436-Speed 18449.34 samples/sec Loss 6.1522 LearningRate 0.0626 Epoch: 12 Global Step: 66780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:23,936-Speed 18206.64 samples/sec Loss 6.2132 LearningRate 0.0625 Epoch: 12 Global Step: 66790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:28,369-Speed 18484.80 samples/sec Loss 6.1463 LearningRate 0.0625 Epoch: 12 Global Step: 66800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:32,833-Speed 18358.06 samples/sec Loss 6.1487 LearningRate 0.0624 Epoch: 12 Global Step: 66810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:37,277-Speed 18437.33 samples/sec Loss 6.1512 LearningRate 0.0624 Epoch: 12 Global Step: 66820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:41,696-Speed 18540.08 samples/sec Loss 6.2131 LearningRate 0.0624 Epoch: 12 Global Step: 66830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:46,109-Speed 18569.18 samples/sec Loss 6.1690 LearningRate 0.0623 Epoch: 12 Global Step: 66840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:50,517-Speed 18588.37 samples/sec Loss 6.1992 LearningRate 0.0623 Epoch: 12 Global Step: 66850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:54,952-Speed 18477.76 samples/sec Loss 6.1717 LearningRate 0.0623 Epoch: 12 Global Step: 66860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:16:59,396-Speed 18438.34 samples/sec Loss 6.1829 LearningRate 0.0622 Epoch: 12 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:03,841-Speed 18432.98 samples/sec Loss 6.1958 LearningRate 0.0622 Epoch: 12 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:08,335-Speed 18230.30 samples/sec Loss 6.1808 LearningRate 0.0622 Epoch: 12 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:12,794-Speed 18378.49 samples/sec Loss 6.1867 LearningRate 0.0621 Epoch: 12 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:17,228-Speed 18481.72 samples/sec Loss 6.1465 LearningRate 0.0621 Epoch: 12 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:21,659-Speed 18492.04 samples/sec Loss 6.1466 LearningRate 0.0621 Epoch: 12 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:26,131-Speed 18325.79 samples/sec Loss 6.1049 LearningRate 0.0620 Epoch: 12 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:30,618-Speed 18260.12 samples/sec Loss 6.1596 LearningRate 0.0620 Epoch: 12 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:35,094-Speed 18307.25 samples/sec Loss 6.1766 LearningRate 0.0620 Epoch: 12 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:39,560-Speed 18346.29 samples/sec Loss 6.1480 LearningRate 0.0619 Epoch: 12 Global Step: 66960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:43,980-Speed 18540.67 samples/sec Loss 6.1581 LearningRate 0.0619 Epoch: 12 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:48,430-Speed 18412.95 samples/sec Loss 6.1584 LearningRate 0.0619 Epoch: 12 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:52,902-Speed 18325.25 samples/sec Loss 6.1230 LearningRate 0.0618 Epoch: 12 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:17:57,390-Speed 18257.04 samples/sec Loss 6.1369 LearningRate 0.0618 Epoch: 12 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:18:01,846-Speed 18395.82 samples/sec Loss 6.1732 LearningRate 0.0618 Epoch: 12 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:18:06,304-Speed 18386.69 samples/sec Loss 6.1803 LearningRate 0.0617 Epoch: 12 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:18:10,771-Speed 18350.11 samples/sec Loss 6.1045 LearningRate 0.0617 Epoch: 12 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:18:15,228-Speed 18383.37 samples/sec Loss 6.1252 LearningRate 0.0617 Epoch: 12 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:19,688-Speed 18374.01 samples/sec Loss 6.1554 LearningRate 0.0616 Epoch: 12 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:24,122-Speed 18484.97 samples/sec Loss 6.1564 LearningRate 0.0616 Epoch: 12 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:28,555-Speed 18487.99 samples/sec Loss 6.1336 LearningRate 0.0616 Epoch: 12 Global Step: 67070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:33,013-Speed 18384.13 samples/sec Loss 6.1608 LearningRate 0.0615 Epoch: 12 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:37,476-Speed 18364.52 samples/sec Loss 6.2061 LearningRate 0.0615 Epoch: 12 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:41,948-Speed 18325.15 samples/sec Loss 6.1297 LearningRate 0.0615 Epoch: 12 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:46,398-Speed 18409.86 samples/sec Loss 6.1786 LearningRate 0.0614 Epoch: 12 Global Step: 67110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:50,853-Speed 18398.72 samples/sec Loss 6.2003 LearningRate 0.0614 Epoch: 12 Global Step: 67120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:55,268-Speed 18560.29 samples/sec Loss 6.1164 LearningRate 0.0614 Epoch: 12 Global Step: 67130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:18:59,814-Speed 18022.89 samples/sec Loss 6.1520 LearningRate 0.0613 Epoch: 12 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:19:04,282-Speed 18347.85 samples/sec Loss 6.1439 LearningRate 0.0613 Epoch: 12 Global Step: 67150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:19:08,782-Speed 18209.12 samples/sec Loss 6.0937 LearningRate 0.0613 Epoch: 12 Global Step: 67160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:19:13,265-Speed 18279.44 samples/sec Loss 6.1483 LearningRate 0.0612 Epoch: 12 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:19:17,718-Speed 18402.63 samples/sec Loss 6.1678 LearningRate 0.0612 Epoch: 12 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:19:22,190-Speed 18324.58 samples/sec Loss 6.1290 LearningRate 0.0612 Epoch: 12 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:19:26,596-Speed 18599.90 samples/sec Loss 6.1640 LearningRate 0.0611 Epoch: 12 Global Step: 67200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:19:31,028-Speed 18487.51 samples/sec Loss 6.1657 LearningRate 0.0611 Epoch: 12 Global Step: 67210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:19:35,482-Speed 18395.76 samples/sec Loss 6.1729 LearningRate 0.0611 Epoch: 12 Global Step: 67220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:19:39,948-Speed 18352.39 samples/sec Loss 6.1233 LearningRate 0.0610 Epoch: 12 Global Step: 67230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:19:44,408-Speed 18374.54 samples/sec Loss 6.1556 LearningRate 0.0610 Epoch: 12 Global Step: 67240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:19:48,880-Speed 18320.41 samples/sec Loss 6.1268 LearningRate 0.0610 Epoch: 12 Global Step: 67250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:19:53,339-Speed 18376.41 samples/sec Loss 6.1348 LearningRate 0.0609 Epoch: 12 Global Step: 67260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:19:57,749-Speed 18585.50 samples/sec Loss 6.2046 LearningRate 0.0609 Epoch: 12 Global Step: 67270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:02,161-Speed 18571.62 samples/sec Loss 6.1711 LearningRate 0.0609 Epoch: 12 Global Step: 67280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:06,576-Speed 18561.28 samples/sec Loss 6.1353 LearningRate 0.0608 Epoch: 12 Global Step: 67290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:10,997-Speed 18537.66 samples/sec Loss 6.1248 LearningRate 0.0608 Epoch: 12 Global Step: 67300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:15,394-Speed 18641.19 samples/sec Loss 6.1233 LearningRate 0.0608 Epoch: 12 Global Step: 67310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:19,800-Speed 18596.30 samples/sec Loss 6.1748 LearningRate 0.0607 Epoch: 12 Global Step: 67320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:24,222-Speed 18528.88 samples/sec Loss 6.1430 LearningRate 0.0607 Epoch: 12 Global Step: 67330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:28,611-Speed 18674.73 samples/sec Loss 6.1533 LearningRate 0.0607 Epoch: 12 Global Step: 67340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:33,023-Speed 18577.66 samples/sec Loss 6.1513 LearningRate 0.0606 Epoch: 12 Global Step: 67350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:37,434-Speed 18578.03 samples/sec Loss 6.1890 LearningRate 0.0606 Epoch: 12 Global Step: 67360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:41,837-Speed 18609.15 samples/sec Loss 6.1369 LearningRate 0.0606 Epoch: 12 Global Step: 67370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:46,265-Speed 18503.79 samples/sec Loss 6.1861 LearningRate 0.0605 Epoch: 12 Global Step: 67380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:50,692-Speed 18513.63 samples/sec Loss 6.1651 LearningRate 0.0605 Epoch: 12 Global Step: 67390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:20:55,133-Speed 18449.87 samples/sec Loss 6.1189 LearningRate 0.0605 Epoch: 12 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:21:13,725-Speed 4406.44 samples/sec Loss 6.0889 LearningRate 0.0604 Epoch: 13 Global Step: 67410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:21:18,147-Speed 18534.89 samples/sec Loss 6.1577 LearningRate 0.0604 Epoch: 13 Global Step: 67420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:21:22,653-Speed 18182.44 samples/sec Loss 6.1363 LearningRate 0.0604 Epoch: 13 Global Step: 67430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:21:27,083-Speed 18501.45 samples/sec Loss 6.1177 LearningRate 0.0603 Epoch: 13 Global Step: 67440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-14 06:21:31,488-Speed 18602.53 samples/sec Loss 6.0812 LearningRate 0.0603 Epoch: 13 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:21:35,875-Speed 18678.09 samples/sec Loss 6.1403 LearningRate 0.0603 Epoch: 13 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:21:40,287-Speed 18568.05 samples/sec Loss 6.1213 LearningRate 0.0602 Epoch: 13 Global Step: 67470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:21:44,657-Speed 18751.83 samples/sec Loss 6.1442 LearningRate 0.0602 Epoch: 13 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:21:49,089-Speed 18489.56 samples/sec Loss 6.1609 LearningRate 0.0602 Epoch: 13 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:21:53,530-Speed 18453.53 samples/sec Loss 6.0800 LearningRate 0.0601 Epoch: 13 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:21:57,979-Speed 18416.51 samples/sec Loss 6.1388 LearningRate 0.0601 Epoch: 13 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:22:02,478-Speed 18214.49 samples/sec Loss 6.1107 LearningRate 0.0601 Epoch: 13 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:22:06,951-Speed 18329.28 samples/sec Loss 6.1260 LearningRate 0.0600 Epoch: 13 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-14 06:22:11,353-Speed 18620.49 samples/sec Loss 6.0708 LearningRate 0.0600 Epoch: 13 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:15,739-Speed 18685.23 samples/sec Loss 6.1108 LearningRate 0.0600 Epoch: 13 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:22:20,145-Speed 18596.00 samples/sec Loss 6.1050 LearningRate 0.0599 Epoch: 13 Global Step: 67560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:24,558-Speed 18568.54 samples/sec Loss 6.1164 LearningRate 0.0599 Epoch: 13 Global Step: 67570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:28,976-Speed 18547.58 samples/sec Loss 6.0978 LearningRate 0.0599 Epoch: 13 Global Step: 67580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:33,405-Speed 18503.78 samples/sec Loss 6.1022 LearningRate 0.0598 Epoch: 13 Global Step: 67590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:37,875-Speed 18336.19 samples/sec Loss 6.0891 LearningRate 0.0598 Epoch: 13 Global Step: 67600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:42,325-Speed 18412.36 samples/sec Loss 6.1174 LearningRate 0.0598 Epoch: 13 Global Step: 67610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:46,756-Speed 18494.70 samples/sec Loss 6.1547 LearningRate 0.0597 Epoch: 13 Global Step: 67620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:51,151-Speed 18645.96 samples/sec Loss 6.1375 LearningRate 0.0597 Epoch: 13 Global Step: 67630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:55,574-Speed 18533.83 samples/sec Loss 6.1280 LearningRate 0.0597 Epoch: 13 Global Step: 67640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:22:59,969-Speed 18646.97 samples/sec Loss 6.1256 LearningRate 0.0596 Epoch: 13 Global Step: 67650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:04,365-Speed 18640.61 samples/sec Loss 6.1195 LearningRate 0.0596 Epoch: 13 Global Step: 67660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:23:08,795-Speed 18495.13 samples/sec Loss 6.1470 LearningRate 0.0596 Epoch: 13 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:23:13,204-Speed 18587.39 samples/sec Loss 6.1490 LearningRate 0.0595 Epoch: 13 Global Step: 67680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:17,609-Speed 18602.43 samples/sec Loss 6.1347 LearningRate 0.0595 Epoch: 13 Global Step: 67690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:22,037-Speed 18505.16 samples/sec Loss 6.1101 LearningRate 0.0595 Epoch: 13 Global Step: 67700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:26,455-Speed 18550.72 samples/sec Loss 6.1118 LearningRate 0.0594 Epoch: 13 Global Step: 67710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:30,916-Speed 18366.38 samples/sec Loss 6.1021 LearningRate 0.0594 Epoch: 13 Global Step: 67720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:35,335-Speed 18546.61 samples/sec Loss 6.1079 LearningRate 0.0594 Epoch: 13 Global Step: 67730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:39,770-Speed 18472.99 samples/sec Loss 6.1164 LearningRate 0.0593 Epoch: 13 Global Step: 67740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:44,167-Speed 18633.51 samples/sec Loss 6.1432 LearningRate 0.0593 Epoch: 13 Global Step: 67750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:48,582-Speed 18561.76 samples/sec Loss 6.1048 LearningRate 0.0593 Epoch: 13 Global Step: 67760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:52,978-Speed 18637.73 samples/sec Loss 6.1214 LearningRate 0.0592 Epoch: 13 Global Step: 67770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:23:57,403-Speed 18521.42 samples/sec Loss 6.1196 LearningRate 0.0592 Epoch: 13 Global Step: 67780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:24:01,925-Speed 18123.01 samples/sec Loss 6.0873 LearningRate 0.0592 Epoch: 13 Global Step: 67790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:24:06,339-Speed 18563.43 samples/sec Loss 6.1203 LearningRate 0.0591 Epoch: 13 Global Step: 67800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:10,769-Speed 18503.92 samples/sec Loss 6.0917 LearningRate 0.0591 Epoch: 13 Global Step: 67810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:15,220-Speed 18410.58 samples/sec Loss 6.1118 LearningRate 0.0591 Epoch: 13 Global Step: 67820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:19,711-Speed 18245.81 samples/sec Loss 6.1332 LearningRate 0.0590 Epoch: 13 Global Step: 67830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:24,198-Speed 18268.35 samples/sec Loss 6.0818 LearningRate 0.0590 Epoch: 13 Global Step: 67840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:28,607-Speed 18586.12 samples/sec Loss 6.1000 LearningRate 0.0590 Epoch: 13 Global Step: 67850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:33,008-Speed 18621.93 samples/sec Loss 6.1233 LearningRate 0.0589 Epoch: 13 Global Step: 67860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:37,456-Speed 18429.03 samples/sec Loss 6.1105 LearningRate 0.0589 Epoch: 13 Global Step: 67870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:41,871-Speed 18564.65 samples/sec Loss 6.0909 LearningRate 0.0589 Epoch: 13 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:46,347-Speed 18305.00 samples/sec Loss 6.0788 LearningRate 0.0588 Epoch: 13 Global Step: 67890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:24:50,835-Speed 18258.60 samples/sec Loss 6.1164 LearningRate 0.0588 Epoch: 13 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:24:55,270-Speed 18472.70 samples/sec Loss 6.1299 LearningRate 0.0588 Epoch: 13 Global Step: 67910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:24:59,690-Speed 18543.45 samples/sec Loss 6.0937 LearningRate 0.0587 Epoch: 13 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:25:04,134-Speed 18438.22 samples/sec Loss 6.1103 LearningRate 0.0587 Epoch: 13 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:25:08,560-Speed 18514.88 samples/sec Loss 6.0832 LearningRate 0.0587 Epoch: 13 Global Step: 67940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:13,017-Speed 18385.59 samples/sec Loss 6.1620 LearningRate 0.0586 Epoch: 13 Global Step: 67950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:17,489-Speed 18321.99 samples/sec Loss 6.1078 LearningRate 0.0586 Epoch: 13 Global Step: 67960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:21,936-Speed 18429.65 samples/sec Loss 6.0891 LearningRate 0.0586 Epoch: 13 Global Step: 67970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:26,348-Speed 18573.50 samples/sec Loss 6.1039 LearningRate 0.0585 Epoch: 13 Global Step: 67980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:30,845-Speed 18221.80 samples/sec Loss 6.1318 LearningRate 0.0585 Epoch: 13 Global Step: 67990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:35,292-Speed 18428.78 samples/sec Loss 6.0915 LearningRate 0.0585 Epoch: 13 Global Step: 68000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:39,737-Speed 18435.11 samples/sec Loss 6.0756 LearningRate 0.0585 Epoch: 13 Global Step: 68010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:44,186-Speed 18417.99 samples/sec Loss 6.0954 LearningRate 0.0584 Epoch: 13 Global Step: 68020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:48,626-Speed 18455.55 samples/sec Loss 6.1087 LearningRate 0.0584 Epoch: 13 Global Step: 68030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:25:53,066-Speed 18458.95 samples/sec Loss 6.0901 LearningRate 0.0584 Epoch: 13 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:25:57,520-Speed 18398.45 samples/sec Loss 6.0874 LearningRate 0.0583 Epoch: 13 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:01,966-Speed 18429.87 samples/sec Loss 6.0973 LearningRate 0.0583 Epoch: 13 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:06,445-Speed 18293.57 samples/sec Loss 6.0680 LearningRate 0.0583 Epoch: 13 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:10,873-Speed 18514.97 samples/sec Loss 6.0529 LearningRate 0.0582 Epoch: 13 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:15,275-Speed 18618.78 samples/sec Loss 6.0667 LearningRate 0.0582 Epoch: 13 Global Step: 68090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:19,718-Speed 18443.10 samples/sec Loss 6.0853 LearningRate 0.0582 Epoch: 13 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:24,189-Speed 18329.32 samples/sec Loss 6.1131 LearningRate 0.0581 Epoch: 13 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:32,806-Speed 9509.07 samples/sec Loss 6.0814 LearningRate 0.0581 Epoch: 13 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:37,229-Speed 18526.33 samples/sec Loss 6.0365 LearningRate 0.0581 Epoch: 13 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:41,668-Speed 18458.63 samples/sec Loss 6.0917 LearningRate 0.0580 Epoch: 13 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 06:26:46,062-Speed 18656.48 samples/sec Loss 6.1068 LearningRate 0.0580 Epoch: 13 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 06:26:50,452-Speed 18660.84 samples/sec Loss 6.0603 LearningRate 0.0580 Epoch: 13 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:54,889-Speed 18473.32 samples/sec Loss 6.0948 LearningRate 0.0579 Epoch: 13 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:26:59,331-Speed 18444.40 samples/sec Loss 6.0697 LearningRate 0.0579 Epoch: 13 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:27:03,798-Speed 18351.02 samples/sec Loss 6.0745 LearningRate 0.0579 Epoch: 13 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:27:08,212-Speed 18568.59 samples/sec Loss 6.0745 LearningRate 0.0578 Epoch: 13 Global Step: 68200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:12,694-Speed 18284.94 samples/sec Loss 6.0608 LearningRate 0.0578 Epoch: 13 Global Step: 68210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:17,165-Speed 18327.94 samples/sec Loss 6.0630 LearningRate 0.0578 Epoch: 13 Global Step: 68220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:21,595-Speed 18501.22 samples/sec Loss 6.1255 LearningRate 0.0577 Epoch: 13 Global Step: 68230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:26,056-Speed 18370.60 samples/sec Loss 6.0964 LearningRate 0.0577 Epoch: 13 Global Step: 68240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:30,479-Speed 18524.29 samples/sec Loss 6.1304 LearningRate 0.0577 Epoch: 13 Global Step: 68250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:34,929-Speed 18415.22 samples/sec Loss 6.0895 LearningRate 0.0576 Epoch: 13 Global Step: 68260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:39,387-Speed 18379.09 samples/sec Loss 6.1054 LearningRate 0.0576 Epoch: 13 Global Step: 68270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:43,850-Speed 18364.14 samples/sec Loss 6.0800 LearningRate 0.0576 Epoch: 13 Global Step: 68280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:48,343-Speed 18235.54 samples/sec Loss 6.0548 LearningRate 0.0575 Epoch: 13 Global Step: 68290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:27:52,795-Speed 18404.37 samples/sec Loss 6.0923 LearningRate 0.0575 Epoch: 13 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:27:57,271-Speed 18310.46 samples/sec Loss 6.0584 LearningRate 0.0575 Epoch: 13 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:28:01,671-Speed 18621.13 samples/sec Loss 6.1122 LearningRate 0.0574 Epoch: 13 Global Step: 68320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:28:06,067-Speed 18641.88 samples/sec Loss 6.0982 LearningRate 0.0574 Epoch: 13 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:28:10,461-Speed 18647.94 samples/sec Loss 6.0464 LearningRate 0.0574 Epoch: 13 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:28:14,874-Speed 18565.45 samples/sec Loss 6.0820 LearningRate 0.0573 Epoch: 13 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:28:19,327-Speed 18401.83 samples/sec Loss 6.0285 LearningRate 0.0573 Epoch: 13 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:28:23,832-Speed 18189.17 samples/sec Loss 6.0668 LearningRate 0.0573 Epoch: 13 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:28:28,224-Speed 18654.04 samples/sec Loss 6.0963 LearningRate 0.0572 Epoch: 13 Global Step: 68380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:28:32,656-Speed 18489.12 samples/sec Loss 6.0806 LearningRate 0.0572 Epoch: 13 Global Step: 68390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:28:37,112-Speed 18389.65 samples/sec Loss 6.0485 LearningRate 0.0572 Epoch: 13 Global Step: 68400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:28:41,527-Speed 18560.81 samples/sec Loss 6.0700 LearningRate 0.0571 Epoch: 13 Global Step: 68410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:28:45,950-Speed 18523.99 samples/sec Loss 6.0612 LearningRate 0.0571 Epoch: 13 Global Step: 68420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:28:50,404-Speed 18398.82 samples/sec Loss 6.0799 LearningRate 0.0571 Epoch: 13 Global Step: 68430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:28:54,844-Speed 18460.74 samples/sec Loss 6.0522 LearningRate 0.0571 Epoch: 13 Global Step: 68440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:28:59,241-Speed 18644.46 samples/sec Loss 6.0640 LearningRate 0.0570 Epoch: 13 Global Step: 68450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:29:03,648-Speed 18591.50 samples/sec Loss 6.0949 LearningRate 0.0570 Epoch: 13 Global Step: 68460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:29:08,091-Speed 18444.73 samples/sec Loss 6.0425 LearningRate 0.0570 Epoch: 13 Global Step: 68470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:29:12,516-Speed 18522.67 samples/sec Loss 6.0663 LearningRate 0.0569 Epoch: 13 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:16,960-Speed 18440.65 samples/sec Loss 6.0772 LearningRate 0.0569 Epoch: 13 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:21,434-Speed 18312.24 samples/sec Loss 6.0682 LearningRate 0.0569 Epoch: 13 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:25,831-Speed 18638.60 samples/sec Loss 6.0307 LearningRate 0.0568 Epoch: 13 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:30,237-Speed 18595.46 samples/sec Loss 6.0885 LearningRate 0.0568 Epoch: 13 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:34,659-Speed 18534.92 samples/sec Loss 6.0381 LearningRate 0.0568 Epoch: 13 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:39,069-Speed 18579.69 samples/sec Loss 6.0989 LearningRate 0.0567 Epoch: 13 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:43,548-Speed 18297.10 samples/sec Loss 6.1017 LearningRate 0.0567 Epoch: 13 Global Step: 68550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:47,967-Speed 18544.46 samples/sec Loss 6.0828 LearningRate 0.0567 Epoch: 13 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:52,428-Speed 18366.34 samples/sec Loss 6.0751 LearningRate 0.0566 Epoch: 13 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:29:56,840-Speed 18574.61 samples/sec Loss 6.0752 LearningRate 0.0566 Epoch: 13 Global Step: 68580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:01,249-Speed 18587.70 samples/sec Loss 6.0328 LearningRate 0.0566 Epoch: 13 Global Step: 68590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:05,720-Speed 18327.48 samples/sec Loss 6.0499 LearningRate 0.0565 Epoch: 13 Global Step: 68600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:10,172-Speed 18405.87 samples/sec Loss 6.0759 LearningRate 0.0565 Epoch: 13 Global Step: 68610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:14,600-Speed 18505.25 samples/sec Loss 6.0854 LearningRate 0.0565 Epoch: 13 Global Step: 68620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:19,019-Speed 18545.04 samples/sec Loss 6.0644 LearningRate 0.0564 Epoch: 13 Global Step: 68630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:23,471-Speed 18405.29 samples/sec Loss 6.0609 LearningRate 0.0564 Epoch: 13 Global Step: 68640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:27,888-Speed 18553.48 samples/sec Loss 6.0329 LearningRate 0.0564 Epoch: 13 Global Step: 68650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:32,279-Speed 18660.10 samples/sec Loss 6.0236 LearningRate 0.0563 Epoch: 13 Global Step: 68660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:36,678-Speed 18627.45 samples/sec Loss 6.0592 LearningRate 0.0563 Epoch: 13 Global Step: 68670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:30:41,070-Speed 18665.51 samples/sec Loss 6.0355 LearningRate 0.0563 Epoch: 13 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:30:45,466-Speed 18644.55 samples/sec Loss 6.0277 LearningRate 0.0562 Epoch: 13 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:30:49,878-Speed 18576.81 samples/sec Loss 6.0744 LearningRate 0.0562 Epoch: 13 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:30:54,295-Speed 18551.85 samples/sec Loss 6.0607 LearningRate 0.0562 Epoch: 13 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:30:58,711-Speed 18555.78 samples/sec Loss 6.0757 LearningRate 0.0561 Epoch: 13 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:03,164-Speed 18403.36 samples/sec Loss 6.0315 LearningRate 0.0561 Epoch: 13 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:07,546-Speed 18701.44 samples/sec Loss 6.0395 LearningRate 0.0561 Epoch: 13 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:11,941-Speed 18646.48 samples/sec Loss 6.0411 LearningRate 0.0561 Epoch: 13 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:16,360-Speed 18542.76 samples/sec Loss 6.0132 LearningRate 0.0560 Epoch: 13 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:20,785-Speed 18518.45 samples/sec Loss 6.0764 LearningRate 0.0560 Epoch: 13 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:25,220-Speed 18477.10 samples/sec Loss 6.0548 LearningRate 0.0560 Epoch: 13 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 06:31:29,649-Speed 18499.68 samples/sec Loss 6.0617 LearningRate 0.0559 Epoch: 13 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:34,051-Speed 18615.37 samples/sec Loss 6.0637 LearningRate 0.0559 Epoch: 13 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:38,450-Speed 18636.50 samples/sec Loss 6.0535 LearningRate 0.0559 Epoch: 13 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:42,877-Speed 18514.64 samples/sec Loss 6.0658 LearningRate 0.0558 Epoch: 13 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:47,305-Speed 18502.29 samples/sec Loss 6.0653 LearningRate 0.0558 Epoch: 13 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:31:51,716-Speed 18575.09 samples/sec Loss 6.0624 LearningRate 0.0558 Epoch: 13 Global Step: 68840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:31:56,154-Speed 18469.19 samples/sec Loss 6.0489 LearningRate 0.0557 Epoch: 13 Global Step: 68850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:00,565-Speed 18574.28 samples/sec Loss 6.0291 LearningRate 0.0557 Epoch: 13 Global Step: 68860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:04,997-Speed 18490.42 samples/sec Loss 6.0417 LearningRate 0.0557 Epoch: 13 Global Step: 68870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:09,398-Speed 18618.04 samples/sec Loss 6.0587 LearningRate 0.0556 Epoch: 13 Global Step: 68880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:13,833-Speed 18478.94 samples/sec Loss 6.0470 LearningRate 0.0556 Epoch: 13 Global Step: 68890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:18,237-Speed 18606.79 samples/sec Loss 6.0730 LearningRate 0.0556 Epoch: 13 Global Step: 68900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:22,668-Speed 18493.93 samples/sec Loss 6.0637 LearningRate 0.0555 Epoch: 13 Global Step: 68910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:27,112-Speed 18444.85 samples/sec Loss 6.0654 LearningRate 0.0555 Epoch: 13 Global Step: 68920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:31,532-Speed 18536.69 samples/sec Loss 6.0650 LearningRate 0.0555 Epoch: 13 Global Step: 68930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:32:35,933-Speed 18618.12 samples/sec Loss 6.0155 LearningRate 0.0554 Epoch: 13 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:32:40,360-Speed 18512.05 samples/sec Loss 6.0212 LearningRate 0.0554 Epoch: 13 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:32:44,770-Speed 18585.74 samples/sec Loss 6.0877 LearningRate 0.0554 Epoch: 13 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:32:49,165-Speed 18641.61 samples/sec Loss 6.0502 LearningRate 0.0553 Epoch: 13 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:32:53,616-Speed 18413.84 samples/sec Loss 6.0301 LearningRate 0.0553 Epoch: 13 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:32:58,141-Speed 18106.70 samples/sec Loss 6.0560 LearningRate 0.0553 Epoch: 13 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:33:02,568-Speed 18510.38 samples/sec Loss 6.0864 LearningRate 0.0553 Epoch: 13 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:33:07,003-Speed 18475.53 samples/sec Loss 6.0004 LearningRate 0.0552 Epoch: 13 Global Step: 69010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:11,468-Speed 18357.06 samples/sec Loss 6.0788 LearningRate 0.0552 Epoch: 13 Global Step: 69020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:15,943-Speed 18309.92 samples/sec Loss 6.0518 LearningRate 0.0552 Epoch: 13 Global Step: 69030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:20,378-Speed 18475.80 samples/sec Loss 6.0194 LearningRate 0.0551 Epoch: 13 Global Step: 69040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:24,856-Speed 18296.89 samples/sec Loss 6.0370 LearningRate 0.0551 Epoch: 13 Global Step: 69050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:29,276-Speed 18542.00 samples/sec Loss 6.0556 LearningRate 0.0551 Epoch: 13 Global Step: 69060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:33,670-Speed 18646.81 samples/sec Loss 6.0473 LearningRate 0.0550 Epoch: 13 Global Step: 69070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:38,115-Speed 18435.36 samples/sec Loss 6.0886 LearningRate 0.0550 Epoch: 13 Global Step: 69080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:42,543-Speed 18508.15 samples/sec Loss 6.0193 LearningRate 0.0550 Epoch: 13 Global Step: 69090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:46,963-Speed 18536.18 samples/sec Loss 6.0357 LearningRate 0.0549 Epoch: 13 Global Step: 69100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:51,377-Speed 18563.11 samples/sec Loss 6.0244 LearningRate 0.0549 Epoch: 13 Global Step: 69110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:33:55,794-Speed 18552.99 samples/sec Loss 6.0162 LearningRate 0.0549 Epoch: 13 Global Step: 69120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:00,248-Speed 18396.43 samples/sec Loss 6.0197 LearningRate 0.0548 Epoch: 13 Global Step: 69130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:04,654-Speed 18598.82 samples/sec Loss 6.0023 LearningRate 0.0548 Epoch: 13 Global Step: 69140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:09,069-Speed 18560.79 samples/sec Loss 6.0235 LearningRate 0.0548 Epoch: 13 Global Step: 69150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:13,550-Speed 18292.14 samples/sec Loss 6.0406 LearningRate 0.0547 Epoch: 13 Global Step: 69160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:17,965-Speed 18555.23 samples/sec Loss 5.9908 LearningRate 0.0547 Epoch: 13 Global Step: 69170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:22,374-Speed 18585.11 samples/sec Loss 6.0273 LearningRate 0.0547 Epoch: 13 Global Step: 69180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:26,807-Speed 18485.51 samples/sec Loss 6.0332 LearningRate 0.0546 Epoch: 13 Global Step: 69190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:31,235-Speed 18503.85 samples/sec Loss 6.0083 LearningRate 0.0546 Epoch: 13 Global Step: 69200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:35,673-Speed 18467.00 samples/sec Loss 6.0162 LearningRate 0.0546 Epoch: 13 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:34:40,081-Speed 18584.61 samples/sec Loss 6.0081 LearningRate 0.0546 Epoch: 13 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:34:44,489-Speed 18590.71 samples/sec Loss 6.0065 LearningRate 0.0545 Epoch: 13 Global Step: 69230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:48,873-Speed 18690.75 samples/sec Loss 6.0063 LearningRate 0.0545 Epoch: 13 Global Step: 69240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:53,297-Speed 18523.36 samples/sec Loss 6.0141 LearningRate 0.0545 Epoch: 13 Global Step: 69250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:34:57,699-Speed 18615.63 samples/sec Loss 5.9883 LearningRate 0.0544 Epoch: 13 Global Step: 69260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:35:02,134-Speed 18476.19 samples/sec Loss 6.0277 LearningRate 0.0544 Epoch: 13 Global Step: 69270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:35:06,547-Speed 18566.94 samples/sec Loss 5.9893 LearningRate 0.0544 Epoch: 13 Global Step: 69280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:35:11,002-Speed 18394.98 samples/sec Loss 5.9994 LearningRate 0.0543 Epoch: 13 Global Step: 69290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:35:15,486-Speed 18274.24 samples/sec Loss 5.9900 LearningRate 0.0543 Epoch: 13 Global Step: 69300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:35:19,944-Speed 18383.69 samples/sec Loss 6.0710 LearningRate 0.0543 Epoch: 13 Global Step: 69310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:35:24,370-Speed 18515.32 samples/sec Loss 5.9965 LearningRate 0.0542 Epoch: 13 Global Step: 69320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:35:28,862-Speed 18240.69 samples/sec Loss 6.0237 LearningRate 0.0542 Epoch: 13 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:35:33,315-Speed 18403.11 samples/sec Loss 6.0487 LearningRate 0.0542 Epoch: 13 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:35:37,762-Speed 18433.52 samples/sec Loss 5.9877 LearningRate 0.0541 Epoch: 13 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:35:42,257-Speed 18233.93 samples/sec Loss 6.0137 LearningRate 0.0541 Epoch: 13 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:35:46,675-Speed 18547.89 samples/sec Loss 6.0247 LearningRate 0.0541 Epoch: 13 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:35:51,098-Speed 18524.75 samples/sec Loss 6.0228 LearningRate 0.0540 Epoch: 13 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:35:55,520-Speed 18537.49 samples/sec Loss 6.0076 LearningRate 0.0540 Epoch: 13 Global Step: 69390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:35:59,938-Speed 18551.34 samples/sec Loss 6.0060 LearningRate 0.0540 Epoch: 13 Global Step: 69400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:04,413-Speed 18312.52 samples/sec Loss 6.0310 LearningRate 0.0540 Epoch: 13 Global Step: 69410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:08,893-Speed 18290.90 samples/sec Loss 6.0087 LearningRate 0.0539 Epoch: 13 Global Step: 69420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:13,355-Speed 18364.14 samples/sec Loss 6.0015 LearningRate 0.0539 Epoch: 13 Global Step: 69430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:17,787-Speed 18487.85 samples/sec Loss 5.9966 LearningRate 0.0539 Epoch: 13 Global Step: 69440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:22,203-Speed 18555.56 samples/sec Loss 5.9849 LearningRate 0.0538 Epoch: 13 Global Step: 69450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:26,644-Speed 18447.30 samples/sec Loss 6.0336 LearningRate 0.0538 Epoch: 13 Global Step: 69460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:31,114-Speed 18332.18 samples/sec Loss 6.0159 LearningRate 0.0538 Epoch: 13 Global Step: 69470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:35,539-Speed 18517.86 samples/sec Loss 5.9626 LearningRate 0.0537 Epoch: 13 Global Step: 69480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:39,970-Speed 18494.13 samples/sec Loss 6.0190 LearningRate 0.0537 Epoch: 13 Global Step: 69490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:44,392-Speed 18528.50 samples/sec Loss 6.0400 LearningRate 0.0537 Epoch: 13 Global Step: 69500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:48,829-Speed 18467.23 samples/sec Loss 6.0016 LearningRate 0.0536 Epoch: 13 Global Step: 69510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:53,257-Speed 18503.57 samples/sec Loss 5.9914 LearningRate 0.0536 Epoch: 13 Global Step: 69520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:36:57,704-Speed 18429.63 samples/sec Loss 5.9731 LearningRate 0.0536 Epoch: 13 Global Step: 69530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:37:02,147-Speed 18445.23 samples/sec Loss 5.9830 LearningRate 0.0535 Epoch: 13 Global Step: 69540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:37:06,630-Speed 18278.77 samples/sec Loss 5.9995 LearningRate 0.0535 Epoch: 13 Global Step: 69550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:37:11,026-Speed 18642.06 samples/sec Loss 5.9722 LearningRate 0.0535 Epoch: 13 Global Step: 69560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:37:15,468-Speed 18442.43 samples/sec Loss 5.9584 LearningRate 0.0535 Epoch: 13 Global Step: 69570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:37:19,893-Speed 18516.76 samples/sec Loss 5.9719 LearningRate 0.0534 Epoch: 13 Global Step: 69580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:37:24,311-Speed 18547.62 samples/sec Loss 5.9774 LearningRate 0.0534 Epoch: 13 Global Step: 69590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:37:28,749-Speed 18462.95 samples/sec Loss 6.0162 LearningRate 0.0534 Epoch: 13 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:37:33,137-Speed 18676.43 samples/sec Loss 6.0142 LearningRate 0.0533 Epoch: 13 Global Step: 69610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:37:37,633-Speed 18228.80 samples/sec Loss 5.9994 LearningRate 0.0533 Epoch: 13 Global Step: 69620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:37:42,038-Speed 18605.41 samples/sec Loss 5.9736 LearningRate 0.0533 Epoch: 13 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:37:46,472-Speed 18476.50 samples/sec Loss 5.9974 LearningRate 0.0532 Epoch: 13 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:37:50,896-Speed 18519.77 samples/sec Loss 6.0314 LearningRate 0.0532 Epoch: 13 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:37:55,332-Speed 18475.66 samples/sec Loss 5.9925 LearningRate 0.0532 Epoch: 13 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:37:59,784-Speed 18407.06 samples/sec Loss 5.9704 LearningRate 0.0531 Epoch: 13 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:04,232-Speed 18424.86 samples/sec Loss 5.9832 LearningRate 0.0531 Epoch: 13 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:08,703-Speed 18325.85 samples/sec Loss 6.0056 LearningRate 0.0531 Epoch: 13 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:13,120-Speed 18552.89 samples/sec Loss 5.9818 LearningRate 0.0530 Epoch: 13 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 06:38:17,517-Speed 18632.31 samples/sec Loss 5.9769 LearningRate 0.0530 Epoch: 13 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:21,959-Speed 18450.96 samples/sec Loss 5.9899 LearningRate 0.0530 Epoch: 13 Global Step: 69720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:26,344-Speed 18698.26 samples/sec Loss 6.0331 LearningRate 0.0529 Epoch: 13 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:30,786-Speed 18450.94 samples/sec Loss 5.9898 LearningRate 0.0529 Epoch: 13 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:35,276-Speed 18250.97 samples/sec Loss 5.9907 LearningRate 0.0529 Epoch: 13 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:39,704-Speed 18503.52 samples/sec Loss 5.9912 LearningRate 0.0529 Epoch: 13 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:38:44,123-Speed 18547.05 samples/sec Loss 6.0278 LearningRate 0.0528 Epoch: 13 Global Step: 69770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:38:48,553-Speed 18495.85 samples/sec Loss 6.0096 LearningRate 0.0528 Epoch: 13 Global Step: 69780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:38:52,930-Speed 18722.59 samples/sec Loss 5.9624 LearningRate 0.0528 Epoch: 13 Global Step: 69790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:38:57,335-Speed 18605.44 samples/sec Loss 5.9521 LearningRate 0.0527 Epoch: 13 Global Step: 69800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:01,758-Speed 18526.89 samples/sec Loss 5.9471 LearningRate 0.0527 Epoch: 13 Global Step: 69810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:06,177-Speed 18547.96 samples/sec Loss 5.9811 LearningRate 0.0527 Epoch: 13 Global Step: 69820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:10,608-Speed 18495.35 samples/sec Loss 5.9943 LearningRate 0.0526 Epoch: 13 Global Step: 69830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:15,014-Speed 18597.22 samples/sec Loss 6.0211 LearningRate 0.0526 Epoch: 13 Global Step: 69840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:19,428-Speed 18565.32 samples/sec Loss 6.0230 LearningRate 0.0526 Epoch: 13 Global Step: 69850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:23,868-Speed 18452.91 samples/sec Loss 5.9937 LearningRate 0.0525 Epoch: 13 Global Step: 69860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:28,305-Speed 18469.07 samples/sec Loss 5.9720 LearningRate 0.0525 Epoch: 13 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:39:32,722-Speed 18553.69 samples/sec Loss 6.0140 LearningRate 0.0525 Epoch: 13 Global Step: 69880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:37,160-Speed 18464.90 samples/sec Loss 5.9712 LearningRate 0.0525 Epoch: 13 Global Step: 69890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:41,567-Speed 18596.10 samples/sec Loss 5.9713 LearningRate 0.0524 Epoch: 13 Global Step: 69900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:45,961-Speed 18647.00 samples/sec Loss 6.0184 LearningRate 0.0524 Epoch: 13 Global Step: 69910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:50,385-Speed 18520.20 samples/sec Loss 5.9979 LearningRate 0.0524 Epoch: 13 Global Step: 69920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:54,783-Speed 18633.73 samples/sec Loss 5.9699 LearningRate 0.0523 Epoch: 13 Global Step: 69930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:39:59,200-Speed 18551.38 samples/sec Loss 5.9774 LearningRate 0.0523 Epoch: 13 Global Step: 69940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:40:03,691-Speed 18251.77 samples/sec Loss 5.9720 LearningRate 0.0523 Epoch: 13 Global Step: 69950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:40:08,140-Speed 18423.44 samples/sec Loss 5.9501 LearningRate 0.0522 Epoch: 13 Global Step: 69960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:40:12,568-Speed 18502.43 samples/sec Loss 5.9982 LearningRate 0.0522 Epoch: 13 Global Step: 69970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:40:17,088-Speed 18131.24 samples/sec Loss 6.0025 LearningRate 0.0522 Epoch: 13 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:40:21,521-Speed 18479.28 samples/sec Loss 5.9963 LearningRate 0.0521 Epoch: 13 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:40:25,984-Speed 18363.16 samples/sec Loss 5.9920 LearningRate 0.0521 Epoch: 13 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:40:30,396-Speed 18581.86 samples/sec Loss 5.9776 LearningRate 0.0521 Epoch: 13 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:40:34,825-Speed 18501.08 samples/sec Loss 5.9532 LearningRate 0.0520 Epoch: 13 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:40:39,288-Speed 18357.55 samples/sec Loss 5.9979 LearningRate 0.0520 Epoch: 13 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:40:43,728-Speed 18459.90 samples/sec Loss 5.9499 LearningRate 0.0520 Epoch: 13 Global Step: 70040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:40:48,202-Speed 18317.19 samples/sec Loss 5.9518 LearningRate 0.0520 Epoch: 13 Global Step: 70050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:40:52,632-Speed 18498.07 samples/sec Loss 5.9582 LearningRate 0.0519 Epoch: 13 Global Step: 70060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:40:57,040-Speed 18591.25 samples/sec Loss 5.9631 LearningRate 0.0519 Epoch: 13 Global Step: 70070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:01,539-Speed 18210.48 samples/sec Loss 5.9898 LearningRate 0.0519 Epoch: 13 Global Step: 70080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:05,969-Speed 18500.67 samples/sec Loss 5.9883 LearningRate 0.0518 Epoch: 13 Global Step: 70090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:10,464-Speed 18231.67 samples/sec Loss 5.9790 LearningRate 0.0518 Epoch: 13 Global Step: 70100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:14,940-Speed 18310.60 samples/sec Loss 5.9379 LearningRate 0.0518 Epoch: 13 Global Step: 70110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:19,386-Speed 18431.26 samples/sec Loss 5.9616 LearningRate 0.0517 Epoch: 13 Global Step: 70120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:23,831-Speed 18430.79 samples/sec Loss 5.9381 LearningRate 0.0517 Epoch: 13 Global Step: 70130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:28,276-Speed 18437.69 samples/sec Loss 5.9819 LearningRate 0.0517 Epoch: 13 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:41:32,765-Speed 18254.29 samples/sec Loss 6.0026 LearningRate 0.0516 Epoch: 13 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:41:37,258-Speed 18236.97 samples/sec Loss 6.0357 LearningRate 0.0516 Epoch: 13 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:41:41,787-Speed 18096.58 samples/sec Loss 5.9679 LearningRate 0.0516 Epoch: 13 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:41:51,987-Speed 8032.10 samples/sec Loss 5.9946 LearningRate 0.0516 Epoch: 13 Global Step: 70180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:41:56,434-Speed 18428.67 samples/sec Loss 5.9348 LearningRate 0.0515 Epoch: 13 Global Step: 70190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:00,865-Speed 18490.81 samples/sec Loss 5.9464 LearningRate 0.0515 Epoch: 13 Global Step: 70200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:05,267-Speed 18615.85 samples/sec Loss 5.9672 LearningRate 0.0515 Epoch: 13 Global Step: 70210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:09,702-Speed 18477.71 samples/sec Loss 5.9287 LearningRate 0.0514 Epoch: 13 Global Step: 70220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:14,107-Speed 18601.18 samples/sec Loss 5.9373 LearningRate 0.0514 Epoch: 13 Global Step: 70230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:18,545-Speed 18461.29 samples/sec Loss 5.9515 LearningRate 0.0514 Epoch: 13 Global Step: 70240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:22,963-Speed 18547.79 samples/sec Loss 6.0222 LearningRate 0.0513 Epoch: 13 Global Step: 70250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:27,353-Speed 18668.34 samples/sec Loss 5.9470 LearningRate 0.0513 Epoch: 13 Global Step: 70260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:31,796-Speed 18441.64 samples/sec Loss 5.9732 LearningRate 0.0513 Epoch: 13 Global Step: 70270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:36,232-Speed 18472.87 samples/sec Loss 5.9993 LearningRate 0.0512 Epoch: 13 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:42:40,712-Speed 18289.44 samples/sec Loss 5.9647 LearningRate 0.0512 Epoch: 13 Global Step: 70290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:45,158-Speed 18432.32 samples/sec Loss 5.9656 LearningRate 0.0512 Epoch: 13 Global Step: 70300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:49,657-Speed 18215.25 samples/sec Loss 5.9637 LearningRate 0.0512 Epoch: 13 Global Step: 70310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:54,109-Speed 18405.88 samples/sec Loss 5.9416 LearningRate 0.0511 Epoch: 13 Global Step: 70320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:42:58,539-Speed 18503.78 samples/sec Loss 5.9646 LearningRate 0.0511 Epoch: 13 Global Step: 70330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:02,994-Speed 18399.28 samples/sec Loss 6.0219 LearningRate 0.0511 Epoch: 13 Global Step: 70340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:07,489-Speed 18230.74 samples/sec Loss 5.9083 LearningRate 0.0510 Epoch: 13 Global Step: 70350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:11,954-Speed 18360.21 samples/sec Loss 5.9409 LearningRate 0.0510 Epoch: 13 Global Step: 70360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:16,435-Speed 18285.62 samples/sec Loss 5.9251 LearningRate 0.0510 Epoch: 13 Global Step: 70370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:20,899-Speed 18355.33 samples/sec Loss 5.9348 LearningRate 0.0509 Epoch: 13 Global Step: 70380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:25,397-Speed 18218.12 samples/sec Loss 5.9708 LearningRate 0.0509 Epoch: 13 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:43:29,867-Speed 18330.38 samples/sec Loss 5.9346 LearningRate 0.0509 Epoch: 13 Global Step: 70400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:34,286-Speed 18545.76 samples/sec Loss 5.9953 LearningRate 0.0508 Epoch: 13 Global Step: 70410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:38,721-Speed 18475.44 samples/sec Loss 5.9507 LearningRate 0.0508 Epoch: 13 Global Step: 70420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:43,175-Speed 18400.28 samples/sec Loss 5.9433 LearningRate 0.0508 Epoch: 13 Global Step: 70430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:47,592-Speed 18550.22 samples/sec Loss 5.9638 LearningRate 0.0508 Epoch: 13 Global Step: 70440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:52,032-Speed 18455.02 samples/sec Loss 5.9037 LearningRate 0.0507 Epoch: 13 Global Step: 70450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:43:56,441-Speed 18588.35 samples/sec Loss 5.9981 LearningRate 0.0507 Epoch: 13 Global Step: 70460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:44:00,875-Speed 18479.20 samples/sec Loss 5.9433 LearningRate 0.0507 Epoch: 13 Global Step: 70470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:05,352-Speed 18302.77 samples/sec Loss 5.9921 LearningRate 0.0506 Epoch: 13 Global Step: 70480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:09,799-Speed 18423.01 samples/sec Loss 5.9774 LearningRate 0.0506 Epoch: 13 Global Step: 70490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:14,235-Speed 18476.29 samples/sec Loss 5.9675 LearningRate 0.0506 Epoch: 13 Global Step: 70500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:18,679-Speed 18436.04 samples/sec Loss 5.9268 LearningRate 0.0505 Epoch: 13 Global Step: 70510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:23,135-Speed 18388.78 samples/sec Loss 5.9185 LearningRate 0.0505 Epoch: 13 Global Step: 70520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:27,561-Speed 18513.73 samples/sec Loss 5.9468 LearningRate 0.0505 Epoch: 13 Global Step: 70530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:32,053-Speed 18243.40 samples/sec Loss 5.9679 LearningRate 0.0505 Epoch: 13 Global Step: 70540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:36,590-Speed 18062.14 samples/sec Loss 5.9852 LearningRate 0.0504 Epoch: 13 Global Step: 70550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:41,012-Speed 18527.50 samples/sec Loss 5.9451 LearningRate 0.0504 Epoch: 13 Global Step: 70560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:44:45,461-Speed 18419.16 samples/sec Loss 5.9576 LearningRate 0.0504 Epoch: 13 Global Step: 70570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:44:49,918-Speed 18386.98 samples/sec Loss 5.9518 LearningRate 0.0503 Epoch: 13 Global Step: 70580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:44:54,353-Speed 18477.07 samples/sec Loss 5.9355 LearningRate 0.0503 Epoch: 13 Global Step: 70590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:44:58,817-Speed 18356.54 samples/sec Loss 5.9672 LearningRate 0.0503 Epoch: 13 Global Step: 70600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:03,237-Speed 18542.52 samples/sec Loss 5.9608 LearningRate 0.0502 Epoch: 13 Global Step: 70610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:07,665-Speed 18502.77 samples/sec Loss 5.8958 LearningRate 0.0502 Epoch: 13 Global Step: 70620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:12,094-Speed 18501.34 samples/sec Loss 5.9456 LearningRate 0.0502 Epoch: 13 Global Step: 70630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:16,558-Speed 18358.18 samples/sec Loss 5.8697 LearningRate 0.0501 Epoch: 13 Global Step: 70640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:20,978-Speed 18534.77 samples/sec Loss 5.9404 LearningRate 0.0501 Epoch: 13 Global Step: 70650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:25,407-Speed 18503.36 samples/sec Loss 5.9364 LearningRate 0.0501 Epoch: 13 Global Step: 70660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:29,857-Speed 18414.32 samples/sec Loss 5.9035 LearningRate 0.0501 Epoch: 13 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:45:34,294-Speed 18470.10 samples/sec Loss 5.9829 LearningRate 0.0500 Epoch: 13 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:45:38,716-Speed 18530.62 samples/sec Loss 5.9426 LearningRate 0.0500 Epoch: 13 Global Step: 70690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:43,183-Speed 18343.85 samples/sec Loss 5.9552 LearningRate 0.0500 Epoch: 13 Global Step: 70700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:47,683-Speed 18207.67 samples/sec Loss 5.9012 LearningRate 0.0499 Epoch: 13 Global Step: 70710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:52,100-Speed 18552.46 samples/sec Loss 5.8924 LearningRate 0.0499 Epoch: 13 Global Step: 70720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:45:56,532-Speed 18491.03 samples/sec Loss 5.9839 LearningRate 0.0499 Epoch: 13 Global Step: 70730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:00,938-Speed 18598.31 samples/sec Loss 5.9205 LearningRate 0.0498 Epoch: 13 Global Step: 70740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:05,342-Speed 18607.36 samples/sec Loss 5.9452 LearningRate 0.0498 Epoch: 13 Global Step: 70750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:09,771-Speed 18501.96 samples/sec Loss 5.9001 LearningRate 0.0498 Epoch: 13 Global Step: 70760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:14,203-Speed 18495.52 samples/sec Loss 5.9107 LearningRate 0.0498 Epoch: 13 Global Step: 70770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:18,662-Speed 18376.57 samples/sec Loss 5.9085 LearningRate 0.0497 Epoch: 13 Global Step: 70780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:23,095-Speed 18486.93 samples/sec Loss 5.9341 LearningRate 0.0497 Epoch: 13 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:46:27,525-Speed 18499.32 samples/sec Loss 5.9232 LearningRate 0.0497 Epoch: 13 Global Step: 70800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:46:31,945-Speed 18539.14 samples/sec Loss 5.8980 LearningRate 0.0496 Epoch: 13 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:46:36,385-Speed 18462.01 samples/sec Loss 5.9300 LearningRate 0.0496 Epoch: 13 Global Step: 70820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:46:40,854-Speed 18334.82 samples/sec Loss 5.9257 LearningRate 0.0496 Epoch: 13 Global Step: 70830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:46:45,278-Speed 18520.40 samples/sec Loss 5.9036 LearningRate 0.0495 Epoch: 13 Global Step: 70840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:46:49,690-Speed 18576.84 samples/sec Loss 5.9178 LearningRate 0.0495 Epoch: 13 Global Step: 70850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:54,075-Speed 18689.31 samples/sec Loss 5.8793 LearningRate 0.0495 Epoch: 13 Global Step: 70860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:46:58,493-Speed 18548.77 samples/sec Loss 5.9599 LearningRate 0.0495 Epoch: 13 Global Step: 70870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:02,896-Speed 18613.47 samples/sec Loss 5.9727 LearningRate 0.0494 Epoch: 13 Global Step: 70880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:07,336-Speed 18454.07 samples/sec Loss 5.9322 LearningRate 0.0494 Epoch: 13 Global Step: 70890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:11,771-Speed 18478.08 samples/sec Loss 5.9080 LearningRate 0.0494 Epoch: 13 Global Step: 70900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:16,193-Speed 18535.08 samples/sec Loss 5.9022 LearningRate 0.0493 Epoch: 13 Global Step: 70910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:20,617-Speed 18523.53 samples/sec Loss 5.9282 LearningRate 0.0493 Epoch: 13 Global Step: 70920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:25,020-Speed 18613.48 samples/sec Loss 5.9311 LearningRate 0.0493 Epoch: 13 Global Step: 70930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:29,413-Speed 18654.33 samples/sec Loss 5.9389 LearningRate 0.0492 Epoch: 13 Global Step: 70940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:33,818-Speed 18601.83 samples/sec Loss 5.9204 LearningRate 0.0492 Epoch: 13 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:47:38,213-Speed 18644.57 samples/sec Loss 5.9150 LearningRate 0.0492 Epoch: 13 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:47:42,619-Speed 18598.93 samples/sec Loss 5.9246 LearningRate 0.0492 Epoch: 13 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:47:47,096-Speed 18301.80 samples/sec Loss 5.9443 LearningRate 0.0491 Epoch: 13 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:47:51,521-Speed 18516.84 samples/sec Loss 5.8994 LearningRate 0.0491 Epoch: 13 Global Step: 70990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:47:56,009-Speed 18256.84 samples/sec Loss 5.9491 LearningRate 0.0491 Epoch: 13 Global Step: 71000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:00,422-Speed 18567.96 samples/sec Loss 5.9072 LearningRate 0.0490 Epoch: 13 Global Step: 71010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:04,835-Speed 18569.98 samples/sec Loss 5.9058 LearningRate 0.0490 Epoch: 13 Global Step: 71020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:09,219-Speed 18690.07 samples/sec Loss 5.9131 LearningRate 0.0490 Epoch: 13 Global Step: 71030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:13,658-Speed 18464.18 samples/sec Loss 5.9198 LearningRate 0.0489 Epoch: 13 Global Step: 71040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:18,139-Speed 18285.65 samples/sec Loss 5.9067 LearningRate 0.0489 Epoch: 13 Global Step: 71050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:22,545-Speed 18601.59 samples/sec Loss 5.9012 LearningRate 0.0489 Epoch: 13 Global Step: 71060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:26,951-Speed 18599.30 samples/sec Loss 5.8938 LearningRate 0.0489 Epoch: 13 Global Step: 71070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:31,336-Speed 18683.91 samples/sec Loss 5.9365 LearningRate 0.0488 Epoch: 13 Global Step: 71080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:35,789-Speed 18405.98 samples/sec Loss 5.9297 LearningRate 0.0488 Epoch: 13 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:48:40,221-Speed 18487.46 samples/sec Loss 5.9150 LearningRate 0.0488 Epoch: 13 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:48:44,613-Speed 18658.99 samples/sec Loss 5.9388 LearningRate 0.0487 Epoch: 13 Global Step: 71110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:49,026-Speed 18584.66 samples/sec Loss 5.9226 LearningRate 0.0487 Epoch: 13 Global Step: 71120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:53,456-Speed 18498.29 samples/sec Loss 5.8828 LearningRate 0.0487 Epoch: 13 Global Step: 71130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:48:57,891-Speed 18478.46 samples/sec Loss 5.9395 LearningRate 0.0486 Epoch: 13 Global Step: 71140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:49:02,321-Speed 18495.05 samples/sec Loss 5.9235 LearningRate 0.0486 Epoch: 13 Global Step: 71150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:49:06,736-Speed 18562.98 samples/sec Loss 5.9005 LearningRate 0.0486 Epoch: 13 Global Step: 71160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:49:11,143-Speed 18593.80 samples/sec Loss 5.8701 LearningRate 0.0486 Epoch: 13 Global Step: 71170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:49:15,538-Speed 18645.18 samples/sec Loss 5.8814 LearningRate 0.0485 Epoch: 13 Global Step: 71180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:49:19,962-Speed 18519.91 samples/sec Loss 5.8543 LearningRate 0.0485 Epoch: 13 Global Step: 71190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:49:24,382-Speed 18537.18 samples/sec Loss 5.9129 LearningRate 0.0485 Epoch: 13 Global Step: 71200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:49:28,846-Speed 18361.72 samples/sec Loss 5.8874 LearningRate 0.0484 Epoch: 13 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:49:33,264-Speed 18546.58 samples/sec Loss 5.8892 LearningRate 0.0484 Epoch: 13 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:49:37,684-Speed 18537.67 samples/sec Loss 5.9052 LearningRate 0.0484 Epoch: 13 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:49:42,139-Speed 18396.03 samples/sec Loss 5.9252 LearningRate 0.0483 Epoch: 13 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:49:46,545-Speed 18598.34 samples/sec Loss 5.8991 LearningRate 0.0483 Epoch: 13 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:49:50,967-Speed 18535.18 samples/sec Loss 5.9222 LearningRate 0.0483 Epoch: 13 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:49:55,395-Speed 18511.25 samples/sec Loss 5.9286 LearningRate 0.0483 Epoch: 13 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:49:59,849-Speed 18400.94 samples/sec Loss 5.8899 LearningRate 0.0482 Epoch: 13 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:04,297-Speed 18421.61 samples/sec Loss 5.8952 LearningRate 0.0482 Epoch: 13 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:08,730-Speed 18485.61 samples/sec Loss 5.8574 LearningRate 0.0482 Epoch: 13 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:13,201-Speed 18332.81 samples/sec Loss 5.8703 LearningRate 0.0481 Epoch: 13 Global Step: 71310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 06:50:17,663-Speed 18364.00 samples/sec Loss 5.8939 LearningRate 0.0481 Epoch: 13 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:22,092-Speed 18505.42 samples/sec Loss 5.9033 LearningRate 0.0481 Epoch: 13 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:26,504-Speed 18577.18 samples/sec Loss 5.9053 LearningRate 0.0480 Epoch: 13 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:30,930-Speed 18512.94 samples/sec Loss 5.9046 LearningRate 0.0480 Epoch: 13 Global Step: 71350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:35,334-Speed 18605.36 samples/sec Loss 5.9095 LearningRate 0.0480 Epoch: 13 Global Step: 71360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:39,795-Speed 18370.52 samples/sec Loss 5.8735 LearningRate 0.0480 Epoch: 13 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:50:44,218-Speed 18526.30 samples/sec Loss 5.8908 LearningRate 0.0479 Epoch: 13 Global Step: 71380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:50:48,634-Speed 18558.62 samples/sec Loss 5.9287 LearningRate 0.0479 Epoch: 13 Global Step: 71390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:50:53,091-Speed 18384.47 samples/sec Loss 5.9129 LearningRate 0.0479 Epoch: 13 Global Step: 71400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:50:57,519-Speed 18506.64 samples/sec Loss 5.9168 LearningRate 0.0478 Epoch: 13 Global Step: 71410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:01,933-Speed 18563.78 samples/sec Loss 5.9212 LearningRate 0.0478 Epoch: 13 Global Step: 71420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:06,346-Speed 18569.23 samples/sec Loss 5.9017 LearningRate 0.0478 Epoch: 13 Global Step: 71430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:10,798-Speed 18404.81 samples/sec Loss 5.9067 LearningRate 0.0478 Epoch: 13 Global Step: 71440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:15,256-Speed 18380.97 samples/sec Loss 5.8794 LearningRate 0.0477 Epoch: 13 Global Step: 71450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:19,684-Speed 18505.11 samples/sec Loss 5.8954 LearningRate 0.0477 Epoch: 13 Global Step: 71460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:24,161-Speed 18301.09 samples/sec Loss 5.9001 LearningRate 0.0477 Epoch: 13 Global Step: 71470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:28,635-Speed 18318.68 samples/sec Loss 5.8953 LearningRate 0.0476 Epoch: 13 Global Step: 71480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:51:33,110-Speed 18309.71 samples/sec Loss 5.8700 LearningRate 0.0476 Epoch: 13 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:51:37,509-Speed 18628.91 samples/sec Loss 5.8966 LearningRate 0.0476 Epoch: 13 Global Step: 71500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:44,336-Speed 12002.33 samples/sec Loss 5.8859 LearningRate 0.0475 Epoch: 13 Global Step: 71510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:48,721-Speed 18688.73 samples/sec Loss 5.8824 LearningRate 0.0475 Epoch: 13 Global Step: 71520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:53,127-Speed 18599.06 samples/sec Loss 5.8833 LearningRate 0.0475 Epoch: 13 Global Step: 71530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:51:57,534-Speed 18593.83 samples/sec Loss 5.9163 LearningRate 0.0475 Epoch: 13 Global Step: 71540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:52:01,974-Speed 18457.30 samples/sec Loss 5.8856 LearningRate 0.0474 Epoch: 13 Global Step: 71550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:52:06,378-Speed 18607.89 samples/sec Loss 5.9148 LearningRate 0.0474 Epoch: 13 Global Step: 71560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:52:10,797-Speed 18541.95 samples/sec Loss 5.8911 LearningRate 0.0474 Epoch: 13 Global Step: 71570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:52:15,188-Speed 18665.51 samples/sec Loss 5.9026 LearningRate 0.0473 Epoch: 13 Global Step: 71580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:52:19,604-Speed 18553.67 samples/sec Loss 5.8937 LearningRate 0.0473 Epoch: 13 Global Step: 71590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:52:24,037-Speed 18484.79 samples/sec Loss 5.8882 LearningRate 0.0473 Epoch: 13 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:28,456-Speed 18543.68 samples/sec Loss 5.8790 LearningRate 0.0472 Epoch: 13 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:32,881-Speed 18518.01 samples/sec Loss 5.8777 LearningRate 0.0472 Epoch: 13 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:37,306-Speed 18518.06 samples/sec Loss 5.8765 LearningRate 0.0472 Epoch: 13 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:41,791-Speed 18269.00 samples/sec Loss 5.8822 LearningRate 0.0472 Epoch: 13 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:46,179-Speed 18676.89 samples/sec Loss 5.8897 LearningRate 0.0471 Epoch: 13 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:50,624-Speed 18433.24 samples/sec Loss 5.8641 LearningRate 0.0471 Epoch: 13 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:55,024-Speed 18620.79 samples/sec Loss 5.8505 LearningRate 0.0471 Epoch: 13 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:52:59,438-Speed 18564.65 samples/sec Loss 5.8717 LearningRate 0.0470 Epoch: 13 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:53:03,921-Speed 18286.45 samples/sec Loss 5.8861 LearningRate 0.0470 Epoch: 13 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:53:08,344-Speed 18530.57 samples/sec Loss 5.9003 LearningRate 0.0470 Epoch: 13 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:53:12,804-Speed 18371.83 samples/sec Loss 5.8793 LearningRate 0.0470 Epoch: 13 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:53:17,242-Speed 18468.68 samples/sec Loss 5.9120 LearningRate 0.0469 Epoch: 13 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:53:21,689-Speed 18429.72 samples/sec Loss 5.8706 LearningRate 0.0469 Epoch: 13 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:53:26,163-Speed 18317.42 samples/sec Loss 5.9125 LearningRate 0.0469 Epoch: 13 Global Step: 71740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:53:30,564-Speed 18618.34 samples/sec Loss 5.9100 LearningRate 0.0468 Epoch: 13 Global Step: 71750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:53:34,961-Speed 18639.36 samples/sec Loss 5.8635 LearningRate 0.0468 Epoch: 13 Global Step: 71760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:53:39,438-Speed 18302.76 samples/sec Loss 5.8539 LearningRate 0.0468 Epoch: 13 Global Step: 71770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:53:43,853-Speed 18569.30 samples/sec Loss 5.8836 LearningRate 0.0467 Epoch: 13 Global Step: 71780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:53:48,256-Speed 18610.26 samples/sec Loss 5.8514 LearningRate 0.0467 Epoch: 13 Global Step: 71790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:53:52,680-Speed 18522.05 samples/sec Loss 5.8760 LearningRate 0.0467 Epoch: 13 Global Step: 71800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:53:57,087-Speed 18596.51 samples/sec Loss 5.8737 LearningRate 0.0467 Epoch: 13 Global Step: 71810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:01,503-Speed 18556.10 samples/sec Loss 5.9082 LearningRate 0.0466 Epoch: 13 Global Step: 71820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:06,024-Speed 18123.43 samples/sec Loss 5.8470 LearningRate 0.0466 Epoch: 13 Global Step: 71830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:10,436-Speed 18574.66 samples/sec Loss 5.8497 LearningRate 0.0466 Epoch: 13 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:54:14,844-Speed 18594.09 samples/sec Loss 5.8952 LearningRate 0.0465 Epoch: 13 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:54:19,295-Speed 18406.66 samples/sec Loss 5.8667 LearningRate 0.0465 Epoch: 13 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:54:23,723-Speed 18505.76 samples/sec Loss 5.8730 LearningRate 0.0465 Epoch: 13 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:54:28,186-Speed 18360.69 samples/sec Loss 5.8752 LearningRate 0.0465 Epoch: 13 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:54:32,610-Speed 18519.91 samples/sec Loss 5.8903 LearningRate 0.0464 Epoch: 13 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:54:37,013-Speed 18612.95 samples/sec Loss 5.8434 LearningRate 0.0464 Epoch: 13 Global Step: 71900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:41,446-Speed 18486.01 samples/sec Loss 5.8448 LearningRate 0.0464 Epoch: 13 Global Step: 71910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:45,867-Speed 18532.42 samples/sec Loss 5.8559 LearningRate 0.0463 Epoch: 13 Global Step: 71920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:50,287-Speed 18539.63 samples/sec Loss 5.8879 LearningRate 0.0463 Epoch: 13 Global Step: 71930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:54,717-Speed 18496.90 samples/sec Loss 5.8910 LearningRate 0.0463 Epoch: 13 Global Step: 71940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:54:59,177-Speed 18371.60 samples/sec Loss 5.8212 LearningRate 0.0463 Epoch: 13 Global Step: 71950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:03,595-Speed 18548.51 samples/sec Loss 5.8590 LearningRate 0.0462 Epoch: 13 Global Step: 71960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:08,005-Speed 18581.56 samples/sec Loss 5.8420 LearningRate 0.0462 Epoch: 13 Global Step: 71970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:12,385-Speed 18708.67 samples/sec Loss 5.8876 LearningRate 0.0462 Epoch: 13 Global Step: 71980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:16,767-Speed 18697.77 samples/sec Loss 5.8763 LearningRate 0.0461 Epoch: 13 Global Step: 71990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:21,173-Speed 18598.49 samples/sec Loss 5.8495 LearningRate 0.0461 Epoch: 13 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:55:25,599-Speed 18513.63 samples/sec Loss 5.8678 LearningRate 0.0461 Epoch: 13 Global Step: 72010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:30,083-Speed 18275.82 samples/sec Loss 5.8587 LearningRate 0.0460 Epoch: 13 Global Step: 72020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:34,480-Speed 18639.26 samples/sec Loss 5.8837 LearningRate 0.0460 Epoch: 13 Global Step: 72030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:38,919-Speed 18461.04 samples/sec Loss 5.8644 LearningRate 0.0460 Epoch: 13 Global Step: 72040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:43,371-Speed 18404.41 samples/sec Loss 5.8910 LearningRate 0.0460 Epoch: 13 Global Step: 72050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:47,795-Speed 18522.83 samples/sec Loss 5.8618 LearningRate 0.0459 Epoch: 13 Global Step: 72060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:55:52,220-Speed 18518.82 samples/sec Loss 5.8713 LearningRate 0.0459 Epoch: 13 Global Step: 72070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:55:56,626-Speed 18598.90 samples/sec Loss 5.8653 LearningRate 0.0459 Epoch: 13 Global Step: 72080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:01,035-Speed 18583.59 samples/sec Loss 5.8496 LearningRate 0.0458 Epoch: 13 Global Step: 72090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:05,467-Speed 18492.31 samples/sec Loss 5.8749 LearningRate 0.0458 Epoch: 13 Global Step: 72100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:09,862-Speed 18643.25 samples/sec Loss 5.8229 LearningRate 0.0458 Epoch: 13 Global Step: 72110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:14,344-Speed 18290.39 samples/sec Loss 5.8010 LearningRate 0.0458 Epoch: 13 Global Step: 72120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:18,748-Speed 18611.01 samples/sec Loss 5.8689 LearningRate 0.0457 Epoch: 13 Global Step: 72130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:23,263-Speed 18145.87 samples/sec Loss 5.8542 LearningRate 0.0457 Epoch: 13 Global Step: 72140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:27,744-Speed 18292.17 samples/sec Loss 5.8576 LearningRate 0.0457 Epoch: 13 Global Step: 72150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:32,173-Speed 18504.54 samples/sec Loss 5.8287 LearningRate 0.0456 Epoch: 13 Global Step: 72160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 06:56:36,553-Speed 18712.33 samples/sec Loss 5.8084 LearningRate 0.0456 Epoch: 13 Global Step: 72170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:56:40,953-Speed 18625.07 samples/sec Loss 5.8297 LearningRate 0.0456 Epoch: 13 Global Step: 72180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:56:45,355-Speed 18614.60 samples/sec Loss 5.8651 LearningRate 0.0456 Epoch: 13 Global Step: 72190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:56:49,777-Speed 18534.21 samples/sec Loss 5.8313 LearningRate 0.0455 Epoch: 13 Global Step: 72200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:56:54,245-Speed 18337.40 samples/sec Loss 5.8414 LearningRate 0.0455 Epoch: 13 Global Step: 72210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:56:58,715-Speed 18332.17 samples/sec Loss 5.8385 LearningRate 0.0455 Epoch: 13 Global Step: 72220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:03,143-Speed 18505.14 samples/sec Loss 5.8814 LearningRate 0.0454 Epoch: 13 Global Step: 72230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:07,559-Speed 18556.46 samples/sec Loss 5.8644 LearningRate 0.0454 Epoch: 13 Global Step: 72240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:11,976-Speed 18552.69 samples/sec Loss 5.7892 LearningRate 0.0454 Epoch: 13 Global Step: 72250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:16,408-Speed 18489.81 samples/sec Loss 5.8201 LearningRate 0.0454 Epoch: 13 Global Step: 72260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:20,836-Speed 18503.14 samples/sec Loss 5.8241 LearningRate 0.0453 Epoch: 13 Global Step: 72270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:25,227-Speed 18664.09 samples/sec Loss 5.8192 LearningRate 0.0453 Epoch: 13 Global Step: 72280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:29,618-Speed 18664.02 samples/sec Loss 5.8699 LearningRate 0.0453 Epoch: 13 Global Step: 72290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:34,032-Speed 18565.51 samples/sec Loss 5.8532 LearningRate 0.0452 Epoch: 13 Global Step: 72300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:38,420-Speed 18673.05 samples/sec Loss 5.8491 LearningRate 0.0452 Epoch: 13 Global Step: 72310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:42,836-Speed 18554.75 samples/sec Loss 5.8410 LearningRate 0.0452 Epoch: 13 Global Step: 72320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:47,274-Speed 18466.67 samples/sec Loss 5.8757 LearningRate 0.0452 Epoch: 13 Global Step: 72330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:51,772-Speed 18216.67 samples/sec Loss 5.8542 LearningRate 0.0451 Epoch: 13 Global Step: 72340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:57:56,245-Speed 18320.24 samples/sec Loss 5.8116 LearningRate 0.0451 Epoch: 13 Global Step: 72350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:00,713-Speed 18335.86 samples/sec Loss 5.8225 LearningRate 0.0451 Epoch: 13 Global Step: 72360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:05,168-Speed 18396.58 samples/sec Loss 5.8239 LearningRate 0.0450 Epoch: 13 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:58:09,589-Speed 18533.58 samples/sec Loss 5.8034 LearningRate 0.0450 Epoch: 13 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:58:14,053-Speed 18357.76 samples/sec Loss 5.9006 LearningRate 0.0450 Epoch: 13 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:58:18,524-Speed 18325.03 samples/sec Loss 5.8701 LearningRate 0.0449 Epoch: 13 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:58:22,989-Speed 18349.52 samples/sec Loss 5.8447 LearningRate 0.0449 Epoch: 13 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:58:27,435-Speed 18433.04 samples/sec Loss 5.8337 LearningRate 0.0449 Epoch: 13 Global Step: 72420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:31,849-Speed 18564.38 samples/sec Loss 5.8236 LearningRate 0.0449 Epoch: 13 Global Step: 72430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:36,261-Speed 18572.21 samples/sec Loss 5.8527 LearningRate 0.0448 Epoch: 13 Global Step: 72440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:40,713-Speed 18404.58 samples/sec Loss 5.8426 LearningRate 0.0448 Epoch: 13 Global Step: 72450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:45,183-Speed 18335.39 samples/sec Loss 5.8626 LearningRate 0.0448 Epoch: 13 Global Step: 72460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:49,614-Speed 18491.79 samples/sec Loss 5.8477 LearningRate 0.0447 Epoch: 13 Global Step: 72470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:54,040-Speed 18512.12 samples/sec Loss 5.8453 LearningRate 0.0447 Epoch: 13 Global Step: 72480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:58:58,469-Speed 18500.56 samples/sec Loss 5.8096 LearningRate 0.0447 Epoch: 13 Global Step: 72490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:59:02,939-Speed 18335.91 samples/sec Loss 5.8383 LearningRate 0.0447 Epoch: 13 Global Step: 72500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:59:07,432-Speed 18237.59 samples/sec Loss 5.8113 LearningRate 0.0446 Epoch: 13 Global Step: 72510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:59:11,974-Speed 18044.82 samples/sec Loss 5.8252 LearningRate 0.0446 Epoch: 13 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:59:16,443-Speed 18337.86 samples/sec Loss 5.7997 LearningRate 0.0446 Epoch: 13 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:59:20,888-Speed 18435.54 samples/sec Loss 5.8408 LearningRate 0.0445 Epoch: 13 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:59:25,312-Speed 18521.58 samples/sec Loss 5.8838 LearningRate 0.0445 Epoch: 13 Global Step: 72550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:59:29,709-Speed 18637.70 samples/sec Loss 5.8411 LearningRate 0.0445 Epoch: 13 Global Step: 72560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:59:34,153-Speed 18448.68 samples/sec Loss 5.8128 LearningRate 0.0445 Epoch: 13 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 06:59:38,579-Speed 18515.16 samples/sec Loss 5.8372 LearningRate 0.0444 Epoch: 13 Global Step: 72580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 06:59:57,157-Speed 4409.68 samples/sec Loss 5.8423 LearningRate 0.0444 Epoch: 14 Global Step: 72590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:01,556-Speed 18628.56 samples/sec Loss 5.8539 LearningRate 0.0444 Epoch: 14 Global Step: 72600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:06,003-Speed 18426.33 samples/sec Loss 5.8340 LearningRate 0.0443 Epoch: 14 Global Step: 72610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:10,418-Speed 18561.81 samples/sec Loss 5.8392 LearningRate 0.0443 Epoch: 14 Global Step: 72620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:14,830-Speed 18572.54 samples/sec Loss 5.7989 LearningRate 0.0443 Epoch: 14 Global Step: 72630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:19,226-Speed 18642.56 samples/sec Loss 5.7771 LearningRate 0.0443 Epoch: 14 Global Step: 72640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:23,657-Speed 18490.50 samples/sec Loss 5.8444 LearningRate 0.0442 Epoch: 14 Global Step: 72650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:28,096-Speed 18464.06 samples/sec Loss 5.8147 LearningRate 0.0442 Epoch: 14 Global Step: 72660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:32,537-Speed 18450.15 samples/sec Loss 5.7954 LearningRate 0.0442 Epoch: 14 Global Step: 72670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:00:36,971-Speed 18483.82 samples/sec Loss 5.8264 LearningRate 0.0441 Epoch: 14 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:00:41,397-Speed 18514.68 samples/sec Loss 5.8285 LearningRate 0.0441 Epoch: 14 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:00:45,801-Speed 18607.69 samples/sec Loss 5.7562 LearningRate 0.0441 Epoch: 14 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:00:50,252-Speed 18409.00 samples/sec Loss 5.8079 LearningRate 0.0441 Epoch: 14 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:00:54,766-Speed 18154.41 samples/sec Loss 5.8004 LearningRate 0.0440 Epoch: 14 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:00:59,189-Speed 18524.86 samples/sec Loss 5.7946 LearningRate 0.0440 Epoch: 14 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:01:03,674-Speed 18270.63 samples/sec Loss 5.7721 LearningRate 0.0440 Epoch: 14 Global Step: 72740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:08,085-Speed 18578.39 samples/sec Loss 5.7597 LearningRate 0.0439 Epoch: 14 Global Step: 72750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:12,546-Speed 18371.53 samples/sec Loss 5.8096 LearningRate 0.0439 Epoch: 14 Global Step: 72760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:16,989-Speed 18443.32 samples/sec Loss 5.8169 LearningRate 0.0439 Epoch: 14 Global Step: 72770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:21,447-Speed 18388.87 samples/sec Loss 5.8348 LearningRate 0.0439 Epoch: 14 Global Step: 72780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:25,885-Speed 18461.96 samples/sec Loss 5.7883 LearningRate 0.0438 Epoch: 14 Global Step: 72790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:30,384-Speed 18214.59 samples/sec Loss 5.8330 LearningRate 0.0438 Epoch: 14 Global Step: 72800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:34,858-Speed 18318.51 samples/sec Loss 5.8128 LearningRate 0.0438 Epoch: 14 Global Step: 72810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:39,861-Speed 16373.81 samples/sec Loss 5.7991 LearningRate 0.0437 Epoch: 14 Global Step: 72820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:44,290-Speed 18504.53 samples/sec Loss 5.8192 LearningRate 0.0437 Epoch: 14 Global Step: 72830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:01:48,699-Speed 18582.52 samples/sec Loss 5.8006 LearningRate 0.0437 Epoch: 14 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:01:53,167-Speed 18339.50 samples/sec Loss 5.8596 LearningRate 0.0437 Epoch: 14 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:01:57,599-Speed 18490.94 samples/sec Loss 5.8107 LearningRate 0.0436 Epoch: 14 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:02,050-Speed 18408.70 samples/sec Loss 5.8140 LearningRate 0.0436 Epoch: 14 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:06,520-Speed 18330.76 samples/sec Loss 5.7796 LearningRate 0.0436 Epoch: 14 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:10,978-Speed 18381.36 samples/sec Loss 5.8284 LearningRate 0.0436 Epoch: 14 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:15,439-Speed 18370.53 samples/sec Loss 5.8039 LearningRate 0.0435 Epoch: 14 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:19,893-Speed 18393.32 samples/sec Loss 5.8148 LearningRate 0.0435 Epoch: 14 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:24,309-Speed 18556.77 samples/sec Loss 5.8050 LearningRate 0.0435 Epoch: 14 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:28,732-Speed 18530.82 samples/sec Loss 5.8101 LearningRate 0.0434 Epoch: 14 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:02:33,158-Speed 18509.88 samples/sec Loss 5.7926 LearningRate 0.0434 Epoch: 14 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:02:37,552-Speed 18652.52 samples/sec Loss 5.8290 LearningRate 0.0434 Epoch: 14 Global Step: 72950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:02:42,008-Speed 18391.60 samples/sec Loss 5.7826 LearningRate 0.0434 Epoch: 14 Global Step: 72960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:02:46,409-Speed 18618.59 samples/sec Loss 5.7747 LearningRate 0.0433 Epoch: 14 Global Step: 72970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:02:50,824-Speed 18562.38 samples/sec Loss 5.8370 LearningRate 0.0433 Epoch: 14 Global Step: 72980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:02:55,224-Speed 18621.06 samples/sec Loss 5.7930 LearningRate 0.0433 Epoch: 14 Global Step: 72990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:02:59,651-Speed 18508.69 samples/sec Loss 5.7772 LearningRate 0.0432 Epoch: 14 Global Step: 73000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:03:04,042-Speed 18663.41 samples/sec Loss 5.8059 LearningRate 0.0432 Epoch: 14 Global Step: 73010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:03:08,464-Speed 18533.40 samples/sec Loss 5.8392 LearningRate 0.0432 Epoch: 14 Global Step: 73020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:03:12,881-Speed 18549.32 samples/sec Loss 5.8044 LearningRate 0.0432 Epoch: 14 Global Step: 73030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:03:17,351-Speed 18333.30 samples/sec Loss 5.8083 LearningRate 0.0431 Epoch: 14 Global Step: 73040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:03:21,773-Speed 18527.24 samples/sec Loss 5.8068 LearningRate 0.0431 Epoch: 14 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:03:26,247-Speed 18315.45 samples/sec Loss 5.8027 LearningRate 0.0431 Epoch: 14 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:03:30,773-Speed 18105.56 samples/sec Loss 5.7909 LearningRate 0.0430 Epoch: 14 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:03:35,268-Speed 18233.09 samples/sec Loss 5.8184 LearningRate 0.0430 Epoch: 14 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:03:39,747-Speed 18294.20 samples/sec Loss 5.8253 LearningRate 0.0430 Epoch: 14 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:03:44,253-Speed 18185.04 samples/sec Loss 5.7624 LearningRate 0.0430 Epoch: 14 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:03:48,631-Speed 18717.46 samples/sec Loss 5.7559 LearningRate 0.0429 Epoch: 14 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:03:53,123-Speed 18237.30 samples/sec Loss 5.7959 LearningRate 0.0429 Epoch: 14 Global Step: 73120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:03:57,592-Speed 18337.66 samples/sec Loss 5.8096 LearningRate 0.0429 Epoch: 14 Global Step: 73130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:02,082-Speed 18248.82 samples/sec Loss 5.7906 LearningRate 0.0428 Epoch: 14 Global Step: 73140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:06,475-Speed 18656.77 samples/sec Loss 5.7869 LearningRate 0.0428 Epoch: 14 Global Step: 73150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:10,872-Speed 18633.10 samples/sec Loss 5.7808 LearningRate 0.0428 Epoch: 14 Global Step: 73160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:15,303-Speed 18493.83 samples/sec Loss 5.8016 LearningRate 0.0428 Epoch: 14 Global Step: 73170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:19,756-Speed 18402.51 samples/sec Loss 5.7891 LearningRate 0.0427 Epoch: 14 Global Step: 73180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:24,165-Speed 18586.70 samples/sec Loss 5.8042 LearningRate 0.0427 Epoch: 14 Global Step: 73190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:28,555-Speed 18668.90 samples/sec Loss 5.7666 LearningRate 0.0427 Epoch: 14 Global Step: 73200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:32,963-Speed 18589.22 samples/sec Loss 5.7838 LearningRate 0.0427 Epoch: 14 Global Step: 73210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:04:37,359-Speed 18639.49 samples/sec Loss 5.7490 LearningRate 0.0426 Epoch: 14 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:04:41,817-Speed 18377.31 samples/sec Loss 5.7331 LearningRate 0.0426 Epoch: 14 Global Step: 73230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:04:46,213-Speed 18643.78 samples/sec Loss 5.7713 LearningRate 0.0426 Epoch: 14 Global Step: 73240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:04:50,623-Speed 18583.14 samples/sec Loss 5.7784 LearningRate 0.0425 Epoch: 14 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:04:55,024-Speed 18617.22 samples/sec Loss 5.8037 LearningRate 0.0425 Epoch: 14 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:04:59,430-Speed 18596.09 samples/sec Loss 5.7704 LearningRate 0.0425 Epoch: 14 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:05:03,828-Speed 18639.29 samples/sec Loss 5.8078 LearningRate 0.0425 Epoch: 14 Global Step: 73280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:08,213-Speed 18689.38 samples/sec Loss 5.7746 LearningRate 0.0424 Epoch: 14 Global Step: 73290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:12,652-Speed 18460.45 samples/sec Loss 5.7781 LearningRate 0.0424 Epoch: 14 Global Step: 73300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:17,086-Speed 18486.10 samples/sec Loss 5.7567 LearningRate 0.0424 Epoch: 14 Global Step: 73310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:21,501-Speed 18563.59 samples/sec Loss 5.7991 LearningRate 0.0423 Epoch: 14 Global Step: 73320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:25,937-Speed 18469.61 samples/sec Loss 5.7644 LearningRate 0.0423 Epoch: 14 Global Step: 73330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:30,388-Speed 18411.39 samples/sec Loss 5.8076 LearningRate 0.0423 Epoch: 14 Global Step: 73340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:34,798-Speed 18583.46 samples/sec Loss 5.7862 LearningRate 0.0423 Epoch: 14 Global Step: 73350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:39,217-Speed 18539.83 samples/sec Loss 5.7885 LearningRate 0.0422 Epoch: 14 Global Step: 73360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:43,627-Speed 18581.18 samples/sec Loss 5.7634 LearningRate 0.0422 Epoch: 14 Global Step: 73370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:05:48,059-Speed 18488.92 samples/sec Loss 5.7888 LearningRate 0.0422 Epoch: 14 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:05:52,467-Speed 18590.17 samples/sec Loss 5.7931 LearningRate 0.0421 Epoch: 14 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:05:56,921-Speed 18397.96 samples/sec Loss 5.8192 LearningRate 0.0421 Epoch: 14 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:06:06,222-Speed 8809.09 samples/sec Loss 5.7813 LearningRate 0.0421 Epoch: 14 Global Step: 73410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:10,644-Speed 18533.07 samples/sec Loss 5.7647 LearningRate 0.0421 Epoch: 14 Global Step: 73420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:15,053-Speed 18583.64 samples/sec Loss 5.7848 LearningRate 0.0420 Epoch: 14 Global Step: 73430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:19,467-Speed 18563.45 samples/sec Loss 5.7538 LearningRate 0.0420 Epoch: 14 Global Step: 73440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:23,916-Speed 18418.04 samples/sec Loss 5.7648 LearningRate 0.0420 Epoch: 14 Global Step: 73450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:28,343-Speed 18511.70 samples/sec Loss 5.7456 LearningRate 0.0420 Epoch: 14 Global Step: 73460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:32,762-Speed 18545.96 samples/sec Loss 5.7758 LearningRate 0.0419 Epoch: 14 Global Step: 73470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:37,175-Speed 18573.43 samples/sec Loss 5.7485 LearningRate 0.0419 Epoch: 14 Global Step: 73480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:41,586-Speed 18582.68 samples/sec Loss 5.7527 LearningRate 0.0419 Epoch: 14 Global Step: 73490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:46,026-Speed 18458.06 samples/sec Loss 5.7702 LearningRate 0.0418 Epoch: 14 Global Step: 73500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:06:50,424-Speed 18636.13 samples/sec Loss 5.7940 LearningRate 0.0418 Epoch: 14 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:06:54,833-Speed 18589.28 samples/sec Loss 5.7898 LearningRate 0.0418 Epoch: 14 Global Step: 73520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:06:59,267-Speed 18479.90 samples/sec Loss 5.7504 LearningRate 0.0418 Epoch: 14 Global Step: 73530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:03,652-Speed 18685.82 samples/sec Loss 5.7511 LearningRate 0.0417 Epoch: 14 Global Step: 73540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:08,049-Speed 18640.89 samples/sec Loss 5.7590 LearningRate 0.0417 Epoch: 14 Global Step: 73550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:12,507-Speed 18378.25 samples/sec Loss 5.7598 LearningRate 0.0417 Epoch: 14 Global Step: 73560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:16,957-Speed 18416.55 samples/sec Loss 5.7726 LearningRate 0.0416 Epoch: 14 Global Step: 73570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:21,377-Speed 18534.99 samples/sec Loss 5.7830 LearningRate 0.0416 Epoch: 14 Global Step: 73580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:25,806-Speed 18503.35 samples/sec Loss 5.7670 LearningRate 0.0416 Epoch: 14 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:30,246-Speed 18457.50 samples/sec Loss 5.7181 LearningRate 0.0416 Epoch: 14 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:34,649-Speed 18611.57 samples/sec Loss 5.8131 LearningRate 0.0415 Epoch: 14 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:39,064-Speed 18561.85 samples/sec Loss 5.7813 LearningRate 0.0415 Epoch: 14 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:43,485-Speed 18530.69 samples/sec Loss 5.7462 LearningRate 0.0415 Epoch: 14 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:47,901-Speed 18558.62 samples/sec Loss 5.7016 LearningRate 0.0415 Epoch: 14 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:52,319-Speed 18547.84 samples/sec Loss 5.7587 LearningRate 0.0414 Epoch: 14 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:07:56,712-Speed 18651.72 samples/sec Loss 5.7510 LearningRate 0.0414 Epoch: 14 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:08:01,136-Speed 18525.20 samples/sec Loss 5.7351 LearningRate 0.0414 Epoch: 14 Global Step: 73670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:05,560-Speed 18523.78 samples/sec Loss 5.7719 LearningRate 0.0413 Epoch: 14 Global Step: 73680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:10,019-Speed 18377.04 samples/sec Loss 5.7669 LearningRate 0.0413 Epoch: 14 Global Step: 73690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:14,478-Speed 18378.92 samples/sec Loss 5.7245 LearningRate 0.0413 Epoch: 14 Global Step: 73700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:18,940-Speed 18365.27 samples/sec Loss 5.7709 LearningRate 0.0413 Epoch: 14 Global Step: 73710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:23,400-Speed 18374.88 samples/sec Loss 5.7543 LearningRate 0.0412 Epoch: 14 Global Step: 73720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:27,835-Speed 18474.47 samples/sec Loss 5.7619 LearningRate 0.0412 Epoch: 14 Global Step: 73730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:32,254-Speed 18541.94 samples/sec Loss 5.7560 LearningRate 0.0412 Epoch: 14 Global Step: 73740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:36,697-Speed 18446.93 samples/sec Loss 5.7461 LearningRate 0.0412 Epoch: 14 Global Step: 73750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:41,196-Speed 18213.05 samples/sec Loss 5.7768 LearningRate 0.0411 Epoch: 14 Global Step: 73760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:08:45,611-Speed 18560.04 samples/sec Loss 5.7795 LearningRate 0.0411 Epoch: 14 Global Step: 73770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:08:50,074-Speed 18362.08 samples/sec Loss 5.7563 LearningRate 0.0411 Epoch: 14 Global Step: 73780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:08:54,509-Speed 18471.80 samples/sec Loss 5.7442 LearningRate 0.0410 Epoch: 14 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:08:58,949-Speed 18457.48 samples/sec Loss 5.7443 LearningRate 0.0410 Epoch: 14 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:09:03,356-Speed 18595.89 samples/sec Loss 5.7469 LearningRate 0.0410 Epoch: 14 Global Step: 73810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:09:07,762-Speed 18600.32 samples/sec Loss 5.7263 LearningRate 0.0410 Epoch: 14 Global Step: 73820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:09:12,159-Speed 18634.51 samples/sec Loss 5.7640 LearningRate 0.0409 Epoch: 14 Global Step: 73830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:09:16,597-Speed 18467.21 samples/sec Loss 5.7691 LearningRate 0.0409 Epoch: 14 Global Step: 73840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:09:21,032-Speed 18476.39 samples/sec Loss 5.7323 LearningRate 0.0409 Epoch: 14 Global Step: 73850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:09:25,447-Speed 18561.88 samples/sec Loss 5.7440 LearningRate 0.0409 Epoch: 14 Global Step: 73860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:09:29,842-Speed 18645.91 samples/sec Loss 5.7345 LearningRate 0.0408 Epoch: 14 Global Step: 73870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:09:34,260-Speed 18546.35 samples/sec Loss 5.7257 LearningRate 0.0408 Epoch: 14 Global Step: 73880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:09:38,645-Speed 18687.49 samples/sec Loss 5.7728 LearningRate 0.0408 Epoch: 14 Global Step: 73890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:09:43,040-Speed 18641.43 samples/sec Loss 5.7279 LearningRate 0.0407 Epoch: 14 Global Step: 73900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:09:47,506-Speed 18353.62 samples/sec Loss 5.7551 LearningRate 0.0407 Epoch: 14 Global Step: 73910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:09:51,951-Speed 18432.81 samples/sec Loss 5.7498 LearningRate 0.0407 Epoch: 14 Global Step: 73920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:09:56,350-Speed 18626.24 samples/sec Loss 5.7664 LearningRate 0.0407 Epoch: 14 Global Step: 73930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:10:00,764-Speed 18566.24 samples/sec Loss 5.7164 LearningRate 0.0406 Epoch: 14 Global Step: 73940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:10:05,173-Speed 18584.56 samples/sec Loss 5.7426 LearningRate 0.0406 Epoch: 14 Global Step: 73950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-01-14 07:10:09,569-Speed 18641.81 samples/sec Loss 5.7133 LearningRate 0.0406 Epoch: 14 Global Step: 73960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:13,969-Speed 18621.61 samples/sec Loss 5.7367 LearningRate 0.0405 Epoch: 14 Global Step: 73970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:18,422-Speed 18400.90 samples/sec Loss 5.7464 LearningRate 0.0405 Epoch: 14 Global Step: 73980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:22,849-Speed 18512.88 samples/sec Loss 5.7389 LearningRate 0.0405 Epoch: 14 Global Step: 73990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:27,308-Speed 18381.60 samples/sec Loss 5.7739 LearningRate 0.0405 Epoch: 14 Global Step: 74000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:31,735-Speed 18512.50 samples/sec Loss 5.7392 LearningRate 0.0404 Epoch: 14 Global Step: 74010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:36,215-Speed 18296.69 samples/sec Loss 5.7710 LearningRate 0.0404 Epoch: 14 Global Step: 74020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:40,646-Speed 18495.79 samples/sec Loss 5.7140 LearningRate 0.0404 Epoch: 14 Global Step: 74030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:45,052-Speed 18603.89 samples/sec Loss 5.7363 LearningRate 0.0404 Epoch: 14 Global Step: 74040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:49,470-Speed 18550.46 samples/sec Loss 5.7659 LearningRate 0.0403 Epoch: 14 Global Step: 74050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:10:53,867-Speed 18631.89 samples/sec Loss 5.7343 LearningRate 0.0403 Epoch: 14 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:10:58,247-Speed 18710.39 samples/sec Loss 5.7071 LearningRate 0.0403 Epoch: 14 Global Step: 74070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:02,649-Speed 18617.04 samples/sec Loss 5.7312 LearningRate 0.0403 Epoch: 14 Global Step: 74080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:07,039-Speed 18664.76 samples/sec Loss 5.7556 LearningRate 0.0402 Epoch: 14 Global Step: 74090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:11,447-Speed 18589.16 samples/sec Loss 5.7385 LearningRate 0.0402 Epoch: 14 Global Step: 74100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:15,870-Speed 18527.90 samples/sec Loss 5.7662 LearningRate 0.0402 Epoch: 14 Global Step: 74110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:20,269-Speed 18625.77 samples/sec Loss 5.7537 LearningRate 0.0401 Epoch: 14 Global Step: 74120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:24,661-Speed 18653.78 samples/sec Loss 5.7690 LearningRate 0.0401 Epoch: 14 Global Step: 74130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:29,046-Speed 18692.19 samples/sec Loss 5.7911 LearningRate 0.0401 Epoch: 14 Global Step: 74140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:33,477-Speed 18490.45 samples/sec Loss 5.7572 LearningRate 0.0401 Epoch: 14 Global Step: 74150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:37,932-Speed 18394.16 samples/sec Loss 5.7509 LearningRate 0.0400 Epoch: 14 Global Step: 74160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:42,466-Speed 18071.72 samples/sec Loss 5.7784 LearningRate 0.0400 Epoch: 14 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:11:47,050-Speed 17877.92 samples/sec Loss 5.7568 LearningRate 0.0400 Epoch: 14 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:11:51,480-Speed 18496.71 samples/sec Loss 5.7487 LearningRate 0.0400 Epoch: 14 Global Step: 74190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:11:55,876-Speed 18639.23 samples/sec Loss 5.7291 LearningRate 0.0399 Epoch: 14 Global Step: 74200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:00,300-Speed 18525.28 samples/sec Loss 5.7119 LearningRate 0.0399 Epoch: 14 Global Step: 74210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:04,745-Speed 18433.31 samples/sec Loss 5.7025 LearningRate 0.0399 Epoch: 14 Global Step: 74220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:09,156-Speed 18577.07 samples/sec Loss 5.7344 LearningRate 0.0398 Epoch: 14 Global Step: 74230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:13,592-Speed 18472.09 samples/sec Loss 5.7337 LearningRate 0.0398 Epoch: 14 Global Step: 74240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:17,988-Speed 18639.43 samples/sec Loss 5.7212 LearningRate 0.0398 Epoch: 14 Global Step: 74250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:22,401-Speed 18565.98 samples/sec Loss 5.7445 LearningRate 0.0398 Epoch: 14 Global Step: 74260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:26,859-Speed 18379.28 samples/sec Loss 5.7000 LearningRate 0.0397 Epoch: 14 Global Step: 74270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:31,224-Speed 18780.44 samples/sec Loss 5.7154 LearningRate 0.0397 Epoch: 14 Global Step: 74280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:35,635-Speed 18579.03 samples/sec Loss 5.7072 LearningRate 0.0397 Epoch: 14 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:12:40,118-Speed 18275.13 samples/sec Loss 5.7037 LearningRate 0.0397 Epoch: 14 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:12:44,524-Speed 18598.93 samples/sec Loss 5.7169 LearningRate 0.0396 Epoch: 14 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:12:48,961-Speed 18467.68 samples/sec Loss 5.7104 LearningRate 0.0396 Epoch: 14 Global Step: 74320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:53,377-Speed 18554.27 samples/sec Loss 5.7359 LearningRate 0.0396 Epoch: 14 Global Step: 74330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:12:57,788-Speed 18578.27 samples/sec Loss 5.7179 LearningRate 0.0395 Epoch: 14 Global Step: 74340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:02,213-Speed 18515.96 samples/sec Loss 5.7092 LearningRate 0.0395 Epoch: 14 Global Step: 74350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:06,647-Speed 18482.03 samples/sec Loss 5.7216 LearningRate 0.0395 Epoch: 14 Global Step: 74360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:11,084-Speed 18468.87 samples/sec Loss 5.7277 LearningRate 0.0395 Epoch: 14 Global Step: 74370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:15,476-Speed 18660.40 samples/sec Loss 5.7425 LearningRate 0.0394 Epoch: 14 Global Step: 74380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:19,895-Speed 18541.66 samples/sec Loss 5.7404 LearningRate 0.0394 Epoch: 14 Global Step: 74390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:24,284-Speed 18668.57 samples/sec Loss 5.7072 LearningRate 0.0394 Epoch: 14 Global Step: 74400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:28,693-Speed 18592.11 samples/sec Loss 5.7099 LearningRate 0.0394 Epoch: 14 Global Step: 74410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:13:33,120-Speed 18512.71 samples/sec Loss 5.7188 LearningRate 0.0393 Epoch: 14 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:13:37,554-Speed 18480.95 samples/sec Loss 5.7520 LearningRate 0.0393 Epoch: 14 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:13:41,958-Speed 18608.58 samples/sec Loss 5.6946 LearningRate 0.0393 Epoch: 14 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:13:46,405-Speed 18426.52 samples/sec Loss 5.7256 LearningRate 0.0393 Epoch: 14 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:13:50,862-Speed 18382.61 samples/sec Loss 5.7050 LearningRate 0.0392 Epoch: 14 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:13:55,311-Speed 18419.48 samples/sec Loss 5.6990 LearningRate 0.0392 Epoch: 14 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:13:59,762-Speed 18408.21 samples/sec Loss 5.7074 LearningRate 0.0392 Epoch: 14 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:14:04,273-Speed 18166.66 samples/sec Loss 5.6997 LearningRate 0.0391 Epoch: 14 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:14:08,693-Speed 18540.00 samples/sec Loss 5.7154 LearningRate 0.0391 Epoch: 14 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:14:13,116-Speed 18525.83 samples/sec Loss 5.7062 LearningRate 0.0391 Epoch: 14 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:14:17,570-Speed 18401.88 samples/sec Loss 5.6692 LearningRate 0.0391 Epoch: 14 Global Step: 74520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:14:21,986-Speed 18554.90 samples/sec Loss 5.6560 LearningRate 0.0390 Epoch: 14 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:14:26,395-Speed 18585.39 samples/sec Loss 5.7068 LearningRate 0.0390 Epoch: 14 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:14:30,788-Speed 18658.16 samples/sec Loss 5.7158 LearningRate 0.0390 Epoch: 14 Global Step: 74550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:14:35,184-Speed 18639.39 samples/sec Loss 5.7606 LearningRate 0.0390 Epoch: 14 Global Step: 74560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:14:39,615-Speed 18491.05 samples/sec Loss 5.6683 LearningRate 0.0389 Epoch: 14 Global Step: 74570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:14:44,089-Speed 18314.67 samples/sec Loss 5.6754 LearningRate 0.0389 Epoch: 14 Global Step: 74580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:14:48,524-Speed 18479.22 samples/sec Loss 5.7128 LearningRate 0.0389 Epoch: 14 Global Step: 74590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:14:52,978-Speed 18394.02 samples/sec Loss 5.7189 LearningRate 0.0388 Epoch: 14 Global Step: 74600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:14:57,409-Speed 18492.73 samples/sec Loss 5.6704 LearningRate 0.0388 Epoch: 14 Global Step: 74610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:15:01,824-Speed 18562.73 samples/sec Loss 5.7179 LearningRate 0.0388 Epoch: 14 Global Step: 74620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:15:06,247-Speed 18527.75 samples/sec Loss 5.7089 LearningRate 0.0388 Epoch: 14 Global Step: 74630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:15:10,681-Speed 18479.30 samples/sec Loss 5.7269 LearningRate 0.0387 Epoch: 14 Global Step: 74640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:15:15,150-Speed 18341.20 samples/sec Loss 5.7174 LearningRate 0.0387 Epoch: 14 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:19,588-Speed 18462.53 samples/sec Loss 5.7043 LearningRate 0.0387 Epoch: 14 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:24,013-Speed 18516.81 samples/sec Loss 5.7097 LearningRate 0.0387 Epoch: 14 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:28,454-Speed 18450.17 samples/sec Loss 5.6825 LearningRate 0.0386 Epoch: 14 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:32,881-Speed 18507.10 samples/sec Loss 5.6984 LearningRate 0.0386 Epoch: 14 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:37,320-Speed 18461.20 samples/sec Loss 5.6713 LearningRate 0.0386 Epoch: 14 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:41,803-Speed 18278.70 samples/sec Loss 5.6870 LearningRate 0.0386 Epoch: 14 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:46,307-Speed 18194.29 samples/sec Loss 5.6907 LearningRate 0.0385 Epoch: 14 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:50,730-Speed 18525.75 samples/sec Loss 5.6848 LearningRate 0.0385 Epoch: 14 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:55,142-Speed 18568.53 samples/sec Loss 5.6715 LearningRate 0.0385 Epoch: 14 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:15:59,544-Speed 18616.92 samples/sec Loss 5.6672 LearningRate 0.0384 Epoch: 14 Global Step: 74750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:03,966-Speed 18532.05 samples/sec Loss 5.7046 LearningRate 0.0384 Epoch: 14 Global Step: 74760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:08,395-Speed 18501.17 samples/sec Loss 5.6676 LearningRate 0.0384 Epoch: 14 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:12,828-Speed 18485.03 samples/sec Loss 5.6971 LearningRate 0.0384 Epoch: 14 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:17,232-Speed 18603.80 samples/sec Loss 5.6657 LearningRate 0.0383 Epoch: 14 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:21,675-Speed 18441.28 samples/sec Loss 5.6914 LearningRate 0.0383 Epoch: 14 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:26,062-Speed 18682.42 samples/sec Loss 5.7124 LearningRate 0.0383 Epoch: 14 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:30,434-Speed 18750.87 samples/sec Loss 5.7153 LearningRate 0.0383 Epoch: 14 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:34,896-Speed 18366.52 samples/sec Loss 5.7054 LearningRate 0.0382 Epoch: 14 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:39,328-Speed 18489.33 samples/sec Loss 5.7010 LearningRate 0.0382 Epoch: 14 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:43,718-Speed 18664.78 samples/sec Loss 5.7118 LearningRate 0.0382 Epoch: 14 Global Step: 74850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:16:48,151-Speed 18486.28 samples/sec Loss 5.6452 LearningRate 0.0382 Epoch: 14 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:52,541-Speed 18666.47 samples/sec Loss 5.6940 LearningRate 0.0381 Epoch: 14 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:16:56,957-Speed 18554.90 samples/sec Loss 5.6886 LearningRate 0.0381 Epoch: 14 Global Step: 74880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:01,385-Speed 18506.29 samples/sec Loss 5.7260 LearningRate 0.0381 Epoch: 14 Global Step: 74890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:05,795-Speed 18580.90 samples/sec Loss 5.6657 LearningRate 0.0381 Epoch: 14 Global Step: 74900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:10,203-Speed 18587.57 samples/sec Loss 5.6450 LearningRate 0.0380 Epoch: 14 Global Step: 74910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:14,611-Speed 18588.70 samples/sec Loss 5.7525 LearningRate 0.0380 Epoch: 14 Global Step: 74920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:19,051-Speed 18452.10 samples/sec Loss 5.7106 LearningRate 0.0380 Epoch: 14 Global Step: 74930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:23,483-Speed 18487.31 samples/sec Loss 5.7086 LearningRate 0.0379 Epoch: 14 Global Step: 74940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:27,964-Speed 18287.79 samples/sec Loss 5.6729 LearningRate 0.0379 Epoch: 14 Global Step: 74950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:32,398-Speed 18480.69 samples/sec Loss 5.6729 LearningRate 0.0379 Epoch: 14 Global Step: 74960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:36,796-Speed 18630.77 samples/sec Loss 5.7137 LearningRate 0.0379 Epoch: 14 Global Step: 74970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:17:41,233-Speed 18465.82 samples/sec Loss 5.6904 LearningRate 0.0378 Epoch: 14 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:17:45,657-Speed 18521.70 samples/sec Loss 5.6649 LearningRate 0.0378 Epoch: 14 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:17:50,076-Speed 18540.66 samples/sec Loss 5.6690 LearningRate 0.0378 Epoch: 14 Global Step: 75000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:17:54,479-Speed 18612.55 samples/sec Loss 5.6734 LearningRate 0.0378 Epoch: 14 Global Step: 75010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:17:58,966-Speed 18262.41 samples/sec Loss 5.7083 LearningRate 0.0377 Epoch: 14 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:18:03,374-Speed 18588.00 samples/sec Loss 5.7009 LearningRate 0.0377 Epoch: 14 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:18:07,799-Speed 18520.28 samples/sec Loss 5.6852 LearningRate 0.0377 Epoch: 14 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:18:12,247-Speed 18420.71 samples/sec Loss 5.6781 LearningRate 0.0377 Epoch: 14 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:18:16,695-Speed 18423.11 samples/sec Loss 5.6942 LearningRate 0.0376 Epoch: 14 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:18:21,109-Speed 18564.15 samples/sec Loss 5.6615 LearningRate 0.0376 Epoch: 14 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:18:25,526-Speed 18552.00 samples/sec Loss 5.7006 LearningRate 0.0376 Epoch: 14 Global Step: 75080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:18:29,947-Speed 18533.77 samples/sec Loss 5.6969 LearningRate 0.0376 Epoch: 14 Global Step: 75090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:18:34,380-Speed 18484.74 samples/sec Loss 5.7074 LearningRate 0.0375 Epoch: 14 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:18:38,814-Speed 18480.83 samples/sec Loss 5.6595 LearningRate 0.0375 Epoch: 14 Global Step: 75110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:18:43,210-Speed 18640.31 samples/sec Loss 5.6790 LearningRate 0.0375 Epoch: 14 Global Step: 75120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:18:47,615-Speed 18605.34 samples/sec Loss 5.6850 LearningRate 0.0374 Epoch: 14 Global Step: 75130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:18:52,043-Speed 18502.40 samples/sec Loss 5.6600 LearningRate 0.0374 Epoch: 14 Global Step: 75140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:18:56,457-Speed 18566.62 samples/sec Loss 5.7124 LearningRate 0.0374 Epoch: 14 Global Step: 75150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:19:00,886-Speed 18506.91 samples/sec Loss 5.6962 LearningRate 0.0374 Epoch: 14 Global Step: 75160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:19:05,302-Speed 18556.80 samples/sec Loss 5.6756 LearningRate 0.0373 Epoch: 14 Global Step: 75170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:19:09,719-Speed 18552.24 samples/sec Loss 5.6914 LearningRate 0.0373 Epoch: 14 Global Step: 75180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:19:14,131-Speed 18571.21 samples/sec Loss 5.7018 LearningRate 0.0373 Epoch: 14 Global Step: 75190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:19:18,545-Speed 18569.12 samples/sec Loss 5.6862 LearningRate 0.0373 Epoch: 14 Global Step: 75200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:19:23,018-Speed 18323.32 samples/sec Loss 5.6746 LearningRate 0.0372 Epoch: 14 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:27,447-Speed 18499.83 samples/sec Loss 5.6438 LearningRate 0.0372 Epoch: 14 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:31,911-Speed 18361.04 samples/sec Loss 5.6875 LearningRate 0.0372 Epoch: 14 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:36,342-Speed 18491.83 samples/sec Loss 5.6921 LearningRate 0.0372 Epoch: 14 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:40,791-Speed 18413.91 samples/sec Loss 5.6394 LearningRate 0.0371 Epoch: 14 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:45,279-Speed 18260.00 samples/sec Loss 5.6496 LearningRate 0.0371 Epoch: 14 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:49,774-Speed 18230.20 samples/sec Loss 5.6716 LearningRate 0.0371 Epoch: 14 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:54,216-Speed 18445.22 samples/sec Loss 5.6734 LearningRate 0.0371 Epoch: 14 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:19:58,630-Speed 18569.18 samples/sec Loss 5.6492 LearningRate 0.0370 Epoch: 14 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:03,077-Speed 18425.09 samples/sec Loss 5.6538 LearningRate 0.0370 Epoch: 14 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:07,508-Speed 18494.17 samples/sec Loss 5.6733 LearningRate 0.0370 Epoch: 14 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:20:11,962-Speed 18393.77 samples/sec Loss 5.6827 LearningRate 0.0369 Epoch: 14 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:16,366-Speed 18610.00 samples/sec Loss 5.6643 LearningRate 0.0369 Epoch: 14 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:20,787-Speed 18532.19 samples/sec Loss 5.6794 LearningRate 0.0369 Epoch: 14 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:25,258-Speed 18324.76 samples/sec Loss 5.6568 LearningRate 0.0369 Epoch: 14 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:29,687-Speed 18504.83 samples/sec Loss 5.6202 LearningRate 0.0368 Epoch: 14 Global Step: 75360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:34,145-Speed 18379.68 samples/sec Loss 5.7106 LearningRate 0.0368 Epoch: 14 Global Step: 75370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:38,611-Speed 18349.83 samples/sec Loss 5.6967 LearningRate 0.0368 Epoch: 14 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:43,098-Speed 18260.06 samples/sec Loss 5.6514 LearningRate 0.0368 Epoch: 14 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:47,495-Speed 18639.06 samples/sec Loss 5.7016 LearningRate 0.0367 Epoch: 14 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:51,947-Speed 18403.81 samples/sec Loss 5.6662 LearningRate 0.0367 Epoch: 14 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:20:56,388-Speed 18450.26 samples/sec Loss 5.6624 LearningRate 0.0367 Epoch: 14 Global Step: 75420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-14 07:21:00,794-Speed 18597.82 samples/sec Loss 5.6702 LearningRate 0.0367 Epoch: 14 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:05,236-Speed 18449.61 samples/sec Loss 5.6767 LearningRate 0.0366 Epoch: 14 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:09,695-Speed 18385.83 samples/sec Loss 5.6442 LearningRate 0.0366 Epoch: 14 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:14,137-Speed 18447.03 samples/sec Loss 5.6649 LearningRate 0.0366 Epoch: 14 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:18,599-Speed 18366.37 samples/sec Loss 5.6616 LearningRate 0.0366 Epoch: 14 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:23,061-Speed 18368.45 samples/sec Loss 5.6677 LearningRate 0.0365 Epoch: 14 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:27,451-Speed 18669.67 samples/sec Loss 5.6582 LearningRate 0.0365 Epoch: 14 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:31,833-Speed 18695.59 samples/sec Loss 5.6544 LearningRate 0.0365 Epoch: 14 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:36,249-Speed 18556.26 samples/sec Loss 5.6370 LearningRate 0.0365 Epoch: 14 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:40,693-Speed 18440.50 samples/sec Loss 5.6606 LearningRate 0.0364 Epoch: 14 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-14 07:21:45,085-Speed 18654.88 samples/sec Loss 5.6491 LearningRate 0.0364 Epoch: 14 Global Step: 75530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:21:49,498-Speed 18566.08 samples/sec Loss 5.6818 LearningRate 0.0364 Epoch: 14 Global Step: 75540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:21:53,914-Speed 18559.29 samples/sec Loss 5.6656 LearningRate 0.0364 Epoch: 14 Global Step: 75550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:21:58,374-Speed 18372.27 samples/sec Loss 5.6584 LearningRate 0.0363 Epoch: 14 Global Step: 75560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-14 07:22:02,771-Speed 18638.15 samples/sec Loss 5.6434 LearningRate 0.0363 Epoch: 14 Global Step: 75570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:07,221-Speed 18414.69 samples/sec Loss 5.6535 LearningRate 0.0363 Epoch: 14 Global Step: 75580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:11,640-Speed 18542.03 samples/sec Loss 5.6904 LearningRate 0.0362 Epoch: 14 Global Step: 75590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:16,054-Speed 18564.62 samples/sec Loss 5.6448 LearningRate 0.0362 Epoch: 14 Global Step: 75600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:20,478-Speed 18524.42 samples/sec Loss 5.6258 LearningRate 0.0362 Epoch: 14 Global Step: 75610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:24,919-Speed 18450.10 samples/sec Loss 5.6360 LearningRate 0.0362 Epoch: 14 Global Step: 75620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:29,355-Speed 18473.92 samples/sec Loss 5.6307 LearningRate 0.0361 Epoch: 14 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:22:33,800-Speed 18432.32 samples/sec Loss 5.6615 LearningRate 0.0361 Epoch: 14 Global Step: 75640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:38,227-Speed 18513.79 samples/sec Loss 5.6166 LearningRate 0.0361 Epoch: 14 Global Step: 75650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:42,664-Speed 18464.93 samples/sec Loss 5.6569 LearningRate 0.0361 Epoch: 14 Global Step: 75660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:47,077-Speed 18568.44 samples/sec Loss 5.6375 LearningRate 0.0360 Epoch: 14 Global Step: 75670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:51,545-Speed 18337.79 samples/sec Loss 5.6533 LearningRate 0.0360 Epoch: 14 Global Step: 75680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:22:55,968-Speed 18528.21 samples/sec Loss 5.6431 LearningRate 0.0360 Epoch: 14 Global Step: 75690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:00,368-Speed 18628.84 samples/sec Loss 5.6162 LearningRate 0.0360 Epoch: 14 Global Step: 75700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:04,813-Speed 18440.35 samples/sec Loss 5.6753 LearningRate 0.0359 Epoch: 14 Global Step: 75710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:09,285-Speed 18324.14 samples/sec Loss 5.6915 LearningRate 0.0359 Epoch: 14 Global Step: 75720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:13,701-Speed 18551.92 samples/sec Loss 5.6331 LearningRate 0.0359 Epoch: 14 Global Step: 75730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:18,113-Speed 18581.93 samples/sec Loss 5.6099 LearningRate 0.0359 Epoch: 14 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:23:22,578-Speed 18350.36 samples/sec Loss 5.6337 LearningRate 0.0358 Epoch: 14 Global Step: 75750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:27,092-Speed 18152.59 samples/sec Loss 5.6459 LearningRate 0.0358 Epoch: 14 Global Step: 75760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:31,553-Speed 18373.00 samples/sec Loss 5.6355 LearningRate 0.0358 Epoch: 14 Global Step: 75770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:35,961-Speed 18587.40 samples/sec Loss 5.6512 LearningRate 0.0358 Epoch: 14 Global Step: 75780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:40,413-Speed 18406.31 samples/sec Loss 5.6028 LearningRate 0.0357 Epoch: 14 Global Step: 75790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:44,869-Speed 18387.45 samples/sec Loss 5.6599 LearningRate 0.0357 Epoch: 14 Global Step: 75800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:49,331-Speed 18368.61 samples/sec Loss 5.6304 LearningRate 0.0357 Epoch: 14 Global Step: 75810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:53,708-Speed 18717.11 samples/sec Loss 5.6162 LearningRate 0.0357 Epoch: 14 Global Step: 75820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:23:58,105-Speed 18637.36 samples/sec Loss 5.6415 LearningRate 0.0356 Epoch: 14 Global Step: 75830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:02,500-Speed 18645.33 samples/sec Loss 5.6469 LearningRate 0.0356 Epoch: 14 Global Step: 75840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:06,928-Speed 18508.63 samples/sec Loss 5.6124 LearningRate 0.0356 Epoch: 14 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:24:11,352-Speed 18520.97 samples/sec Loss 5.6896 LearningRate 0.0356 Epoch: 14 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:24:15,758-Speed 18599.30 samples/sec Loss 5.6412 LearningRate 0.0355 Epoch: 14 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:24:20,192-Speed 18481.08 samples/sec Loss 5.6669 LearningRate 0.0355 Epoch: 14 Global Step: 75880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:24,622-Speed 18494.12 samples/sec Loss 5.6217 LearningRate 0.0355 Epoch: 14 Global Step: 75890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:29,032-Speed 18581.95 samples/sec Loss 5.6350 LearningRate 0.0355 Epoch: 14 Global Step: 75900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:33,503-Speed 18327.86 samples/sec Loss 5.6359 LearningRate 0.0354 Epoch: 14 Global Step: 75910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:37,899-Speed 18640.52 samples/sec Loss 5.6439 LearningRate 0.0354 Epoch: 14 Global Step: 75920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:42,326-Speed 18505.72 samples/sec Loss 5.6466 LearningRate 0.0354 Epoch: 14 Global Step: 75930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:46,760-Speed 18483.22 samples/sec Loss 5.6431 LearningRate 0.0354 Epoch: 14 Global Step: 75940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:51,216-Speed 18389.70 samples/sec Loss 5.6231 LearningRate 0.0353 Epoch: 14 Global Step: 75950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:24:55,613-Speed 18631.57 samples/sec Loss 5.6209 LearningRate 0.0353 Epoch: 14 Global Step: 75960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:00,060-Speed 18427.89 samples/sec Loss 5.6289 LearningRate 0.0353 Epoch: 14 Global Step: 75970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:04,474-Speed 18562.31 samples/sec Loss 5.6264 LearningRate 0.0352 Epoch: 14 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:25:08,927-Speed 18403.58 samples/sec Loss 5.6017 LearningRate 0.0352 Epoch: 14 Global Step: 75990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:13,328-Speed 18617.09 samples/sec Loss 5.6128 LearningRate 0.0352 Epoch: 14 Global Step: 76000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:17,803-Speed 18311.94 samples/sec Loss 5.6326 LearningRate 0.0352 Epoch: 14 Global Step: 76010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:22,289-Speed 18266.87 samples/sec Loss 5.6007 LearningRate 0.0351 Epoch: 14 Global Step: 76020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:26,707-Speed 18546.52 samples/sec Loss 5.6098 LearningRate 0.0351 Epoch: 14 Global Step: 76030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:31,169-Speed 18373.48 samples/sec Loss 5.6016 LearningRate 0.0351 Epoch: 14 Global Step: 76040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:35,622-Speed 18401.59 samples/sec Loss 5.6371 LearningRate 0.0351 Epoch: 14 Global Step: 76050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:40,087-Speed 18353.02 samples/sec Loss 5.5703 LearningRate 0.0350 Epoch: 14 Global Step: 76060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:44,537-Speed 18414.01 samples/sec Loss 5.6072 LearningRate 0.0350 Epoch: 14 Global Step: 76070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:48,972-Speed 18480.26 samples/sec Loss 5.6158 LearningRate 0.0350 Epoch: 14 Global Step: 76080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:25:53,427-Speed 18397.91 samples/sec Loss 5.6349 LearningRate 0.0350 Epoch: 14 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:25:57,853-Speed 18514.51 samples/sec Loss 5.6640 LearningRate 0.0349 Epoch: 14 Global Step: 76100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:02,291-Speed 18466.25 samples/sec Loss 5.6604 LearningRate 0.0349 Epoch: 14 Global Step: 76110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:06,706-Speed 18561.18 samples/sec Loss 5.6337 LearningRate 0.0349 Epoch: 14 Global Step: 76120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:11,104-Speed 18630.62 samples/sec Loss 5.6031 LearningRate 0.0349 Epoch: 14 Global Step: 76130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:15,514-Speed 18583.92 samples/sec Loss 5.6313 LearningRate 0.0348 Epoch: 14 Global Step: 76140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:19,918-Speed 18605.49 samples/sec Loss 5.6232 LearningRate 0.0348 Epoch: 14 Global Step: 76150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:24,374-Speed 18385.81 samples/sec Loss 5.6045 LearningRate 0.0348 Epoch: 14 Global Step: 76160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:28,816-Speed 18450.52 samples/sec Loss 5.6213 LearningRate 0.0348 Epoch: 14 Global Step: 76170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:33,220-Speed 18605.37 samples/sec Loss 5.6150 LearningRate 0.0347 Epoch: 14 Global Step: 76180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:37,696-Speed 18306.63 samples/sec Loss 5.6280 LearningRate 0.0347 Epoch: 14 Global Step: 76190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:26:42,127-Speed 18499.15 samples/sec Loss 5.6090 LearningRate 0.0347 Epoch: 14 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:26:46,561-Speed 18485.54 samples/sec Loss 5.6053 LearningRate 0.0347 Epoch: 14 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:26:51,086-Speed 18106.96 samples/sec Loss 5.6274 LearningRate 0.0346 Epoch: 14 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:26:55,554-Speed 18341.21 samples/sec Loss 5.6232 LearningRate 0.0346 Epoch: 14 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:26:59,976-Speed 18531.51 samples/sec Loss 5.6157 LearningRate 0.0346 Epoch: 14 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:04,396-Speed 18537.67 samples/sec Loss 5.5637 LearningRate 0.0346 Epoch: 14 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:08,814-Speed 18550.00 samples/sec Loss 5.5858 LearningRate 0.0345 Epoch: 14 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:13,237-Speed 18527.80 samples/sec Loss 5.5867 LearningRate 0.0345 Epoch: 14 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:17,658-Speed 18534.42 samples/sec Loss 5.6664 LearningRate 0.0345 Epoch: 14 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:22,092-Speed 18481.80 samples/sec Loss 5.6199 LearningRate 0.0345 Epoch: 14 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:26,503-Speed 18576.66 samples/sec Loss 5.5922 LearningRate 0.0344 Epoch: 14 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:30,952-Speed 18417.83 samples/sec Loss 5.6158 LearningRate 0.0344 Epoch: 14 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:35,386-Speed 18478.72 samples/sec Loss 5.6487 LearningRate 0.0344 Epoch: 14 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:39,802-Speed 18555.28 samples/sec Loss 5.6023 LearningRate 0.0344 Epoch: 14 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:44,255-Speed 18402.28 samples/sec Loss 5.6330 LearningRate 0.0343 Epoch: 14 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:48,682-Speed 18508.40 samples/sec Loss 5.6333 LearningRate 0.0343 Epoch: 14 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:27:53,080-Speed 18636.11 samples/sec Loss 5.5565 LearningRate 0.0343 Epoch: 14 Global Step: 76360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:27:57,605-Speed 18110.58 samples/sec Loss 5.5952 LearningRate 0.0343 Epoch: 14 Global Step: 76370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:28:02,072-Speed 18347.06 samples/sec Loss 5.6081 LearningRate 0.0342 Epoch: 14 Global Step: 76380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:28:06,507-Speed 18479.45 samples/sec Loss 5.5677 LearningRate 0.0342 Epoch: 14 Global Step: 76390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:28:10,957-Speed 18416.17 samples/sec Loss 5.6099 LearningRate 0.0342 Epoch: 14 Global Step: 76400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:15,399-Speed 18444.55 samples/sec Loss 5.5893 LearningRate 0.0342 Epoch: 14 Global Step: 76410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:19,816-Speed 18552.80 samples/sec Loss 5.6328 LearningRate 0.0341 Epoch: 14 Global Step: 76420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:24,227-Speed 18577.89 samples/sec Loss 5.6135 LearningRate 0.0341 Epoch: 14 Global Step: 76430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:28,730-Speed 18197.40 samples/sec Loss 5.6020 LearningRate 0.0341 Epoch: 14 Global Step: 76440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:33,163-Speed 18483.84 samples/sec Loss 5.5756 LearningRate 0.0341 Epoch: 14 Global Step: 76450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:37,589-Speed 18514.77 samples/sec Loss 5.6381 LearningRate 0.0340 Epoch: 14 Global Step: 76460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:42,011-Speed 18530.97 samples/sec Loss 5.5565 LearningRate 0.0340 Epoch: 14 Global Step: 76470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:46,443-Speed 18490.01 samples/sec Loss 5.5969 LearningRate 0.0340 Epoch: 14 Global Step: 76480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:50,834-Speed 18661.75 samples/sec Loss 5.6481 LearningRate 0.0340 Epoch: 14 Global Step: 76490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:28:55,264-Speed 18495.09 samples/sec Loss 5.6069 LearningRate 0.0339 Epoch: 14 Global Step: 76500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:28:59,673-Speed 18586.07 samples/sec Loss 5.6042 LearningRate 0.0339 Epoch: 14 Global Step: 76510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:04,110-Speed 18470.16 samples/sec Loss 5.5579 LearningRate 0.0339 Epoch: 14 Global Step: 76520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:08,507-Speed 18636.30 samples/sec Loss 5.5771 LearningRate 0.0339 Epoch: 14 Global Step: 76530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:12,935-Speed 18508.60 samples/sec Loss 5.6506 LearningRate 0.0338 Epoch: 14 Global Step: 76540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:17,353-Speed 18550.61 samples/sec Loss 5.5641 LearningRate 0.0338 Epoch: 14 Global Step: 76550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:21,759-Speed 18595.54 samples/sec Loss 5.6351 LearningRate 0.0338 Epoch: 14 Global Step: 76560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:26,160-Speed 18616.88 samples/sec Loss 5.5752 LearningRate 0.0338 Epoch: 14 Global Step: 76570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:30,613-Speed 18401.40 samples/sec Loss 5.5757 LearningRate 0.0337 Epoch: 14 Global Step: 76580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:35,039-Speed 18515.09 samples/sec Loss 5.6193 LearningRate 0.0337 Epoch: 14 Global Step: 76590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:29:39,463-Speed 18523.55 samples/sec Loss 5.5839 LearningRate 0.0337 Epoch: 14 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:29:43,892-Speed 18501.57 samples/sec Loss 5.6279 LearningRate 0.0337 Epoch: 14 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:29:48,328-Speed 18473.03 samples/sec Loss 5.6118 LearningRate 0.0336 Epoch: 14 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:29:52,785-Speed 18381.69 samples/sec Loss 5.6229 LearningRate 0.0336 Epoch: 14 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:29:57,242-Speed 18386.61 samples/sec Loss 5.6006 LearningRate 0.0336 Epoch: 14 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:30:01,680-Speed 18461.63 samples/sec Loss 5.5787 LearningRate 0.0336 Epoch: 14 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:30:06,109-Speed 18504.54 samples/sec Loss 5.6115 LearningRate 0.0335 Epoch: 14 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:30:10,554-Speed 18436.29 samples/sec Loss 5.5932 LearningRate 0.0335 Epoch: 14 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:30:14,972-Speed 18542.36 samples/sec Loss 5.6080 LearningRate 0.0335 Epoch: 14 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:30:19,386-Speed 18566.82 samples/sec Loss 5.6021 LearningRate 0.0335 Epoch: 14 Global Step: 76690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:23,815-Speed 18499.58 samples/sec Loss 5.5712 LearningRate 0.0334 Epoch: 14 Global Step: 76700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:28,257-Speed 18445.81 samples/sec Loss 5.6098 LearningRate 0.0334 Epoch: 14 Global Step: 76710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:32,696-Speed 18461.98 samples/sec Loss 5.5573 LearningRate 0.0334 Epoch: 14 Global Step: 76720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:37,104-Speed 18590.26 samples/sec Loss 5.5900 LearningRate 0.0334 Epoch: 14 Global Step: 76730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:41,542-Speed 18462.92 samples/sec Loss 5.5969 LearningRate 0.0333 Epoch: 14 Global Step: 76740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:45,976-Speed 18483.10 samples/sec Loss 5.5583 LearningRate 0.0333 Epoch: 14 Global Step: 76750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:50,410-Speed 18480.86 samples/sec Loss 5.5879 LearningRate 0.0333 Epoch: 14 Global Step: 76760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:54,824-Speed 18562.50 samples/sec Loss 5.5485 LearningRate 0.0333 Epoch: 14 Global Step: 76770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:30:59,255-Speed 18496.35 samples/sec Loss 5.5924 LearningRate 0.0332 Epoch: 14 Global Step: 76780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:03,693-Speed 18463.43 samples/sec Loss 5.6068 LearningRate 0.0332 Epoch: 14 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:31:08,089-Speed 18637.94 samples/sec Loss 5.5663 LearningRate 0.0332 Epoch: 14 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:31:12,550-Speed 18368.87 samples/sec Loss 5.5691 LearningRate 0.0332 Epoch: 14 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:31:17,002-Speed 18407.97 samples/sec Loss 5.5701 LearningRate 0.0331 Epoch: 14 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:31:21,390-Speed 18675.12 samples/sec Loss 5.5501 LearningRate 0.0331 Epoch: 14 Global Step: 76830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:25,812-Speed 18530.96 samples/sec Loss 5.5709 LearningRate 0.0331 Epoch: 14 Global Step: 76840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:30,206-Speed 18649.97 samples/sec Loss 5.6069 LearningRate 0.0331 Epoch: 14 Global Step: 76850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:34,619-Speed 18568.81 samples/sec Loss 5.5722 LearningRate 0.0330 Epoch: 14 Global Step: 76860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:39,112-Speed 18291.25 samples/sec Loss 5.5754 LearningRate 0.0330 Epoch: 14 Global Step: 76870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:43,559-Speed 18428.05 samples/sec Loss 5.5873 LearningRate 0.0330 Epoch: 14 Global Step: 76880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:47,989-Speed 18500.68 samples/sec Loss 5.5806 LearningRate 0.0330 Epoch: 14 Global Step: 76890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:52,390-Speed 18623.54 samples/sec Loss 5.5958 LearningRate 0.0329 Epoch: 14 Global Step: 76900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:31:56,809-Speed 18543.49 samples/sec Loss 5.5354 LearningRate 0.0329 Epoch: 14 Global Step: 76910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:01,183-Speed 18737.60 samples/sec Loss 5.6089 LearningRate 0.0329 Epoch: 14 Global Step: 76920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:05,593-Speed 18580.01 samples/sec Loss 5.6128 LearningRate 0.0329 Epoch: 14 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:32:10,073-Speed 18294.34 samples/sec Loss 5.5715 LearningRate 0.0328 Epoch: 14 Global Step: 76940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:32:14,458-Speed 18685.73 samples/sec Loss 5.5825 LearningRate 0.0328 Epoch: 14 Global Step: 76950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:18,852-Speed 18657.61 samples/sec Loss 5.5466 LearningRate 0.0328 Epoch: 14 Global Step: 76960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:23,277-Speed 18519.37 samples/sec Loss 5.5912 LearningRate 0.0328 Epoch: 14 Global Step: 76970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:27,693-Speed 18557.87 samples/sec Loss 5.5590 LearningRate 0.0327 Epoch: 14 Global Step: 76980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:32,087-Speed 18651.66 samples/sec Loss 5.5978 LearningRate 0.0327 Epoch: 14 Global Step: 76990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:36,488-Speed 18619.31 samples/sec Loss 5.5905 LearningRate 0.0327 Epoch: 14 Global Step: 77000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:40,915-Speed 18510.57 samples/sec Loss 5.5550 LearningRate 0.0327 Epoch: 14 Global Step: 77010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:45,285-Speed 18751.73 samples/sec Loss 5.5957 LearningRate 0.0327 Epoch: 14 Global Step: 77020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:49,700-Speed 18562.88 samples/sec Loss 5.5874 LearningRate 0.0326 Epoch: 14 Global Step: 77030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:54,075-Speed 18737.66 samples/sec Loss 5.5946 LearningRate 0.0326 Epoch: 14 Global Step: 77040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:32:58,514-Speed 18466.93 samples/sec Loss 5.5780 LearningRate 0.0326 Epoch: 14 Global Step: 77050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:02,964-Speed 18415.19 samples/sec Loss 5.5454 LearningRate 0.0326 Epoch: 14 Global Step: 77060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:07,370-Speed 18598.84 samples/sec Loss 5.5994 LearningRate 0.0325 Epoch: 14 Global Step: 77070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:11,781-Speed 18575.13 samples/sec Loss 5.5903 LearningRate 0.0325 Epoch: 14 Global Step: 77080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:16,198-Speed 18556.51 samples/sec Loss 5.5607 LearningRate 0.0325 Epoch: 14 Global Step: 77090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:20,599-Speed 18618.89 samples/sec Loss 5.5188 LearningRate 0.0325 Epoch: 14 Global Step: 77100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:24,992-Speed 18661.71 samples/sec Loss 5.5818 LearningRate 0.0324 Epoch: 14 Global Step: 77110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:29,454-Speed 18362.64 samples/sec Loss 5.5438 LearningRate 0.0324 Epoch: 14 Global Step: 77120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:33,857-Speed 18609.53 samples/sec Loss 5.5704 LearningRate 0.0324 Epoch: 14 Global Step: 77130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:38,282-Speed 18518.67 samples/sec Loss 5.5617 LearningRate 0.0324 Epoch: 14 Global Step: 77140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:33:42,675-Speed 18656.83 samples/sec Loss 5.5866 LearningRate 0.0323 Epoch: 14 Global Step: 77150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:33:47,098-Speed 18525.73 samples/sec Loss 5.5718 LearningRate 0.0323 Epoch: 14 Global Step: 77160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:33:51,505-Speed 18593.98 samples/sec Loss 5.5744 LearningRate 0.0323 Epoch: 14 Global Step: 77170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:33:55,905-Speed 18627.94 samples/sec Loss 5.5510 LearningRate 0.0323 Epoch: 14 Global Step: 77180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:00,305-Speed 18624.61 samples/sec Loss 5.5568 LearningRate 0.0322 Epoch: 14 Global Step: 77190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:04,724-Speed 18539.70 samples/sec Loss 5.5460 LearningRate 0.0322 Epoch: 14 Global Step: 77200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:09,220-Speed 18228.89 samples/sec Loss 5.5392 LearningRate 0.0322 Epoch: 14 Global Step: 77210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:13,672-Speed 18401.93 samples/sec Loss 5.5592 LearningRate 0.0322 Epoch: 14 Global Step: 77220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:18,128-Speed 18393.39 samples/sec Loss 5.5722 LearningRate 0.0321 Epoch: 14 Global Step: 77230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:22,566-Speed 18463.40 samples/sec Loss 5.5494 LearningRate 0.0321 Epoch: 14 Global Step: 77240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:26,971-Speed 18598.04 samples/sec Loss 5.5679 LearningRate 0.0321 Epoch: 14 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:34:31,372-Speed 18624.96 samples/sec Loss 5.5864 LearningRate 0.0321 Epoch: 14 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:34:35,775-Speed 18608.86 samples/sec Loss 5.5376 LearningRate 0.0320 Epoch: 14 Global Step: 77270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:40,182-Speed 18591.58 samples/sec Loss 5.5517 LearningRate 0.0320 Epoch: 14 Global Step: 77280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:44,657-Speed 18317.04 samples/sec Loss 5.5499 LearningRate 0.0320 Epoch: 14 Global Step: 77290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:49,118-Speed 18368.60 samples/sec Loss 5.5547 LearningRate 0.0320 Epoch: 14 Global Step: 77300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:53,542-Speed 18523.91 samples/sec Loss 5.5503 LearningRate 0.0319 Epoch: 14 Global Step: 77310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:34:58,012-Speed 18328.43 samples/sec Loss 5.5747 LearningRate 0.0319 Epoch: 14 Global Step: 77320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:35:02,449-Speed 18467.48 samples/sec Loss 5.5276 LearningRate 0.0319 Epoch: 14 Global Step: 77330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:35:06,896-Speed 18429.09 samples/sec Loss 5.6016 LearningRate 0.0319 Epoch: 14 Global Step: 77340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:35:11,311-Speed 18559.69 samples/sec Loss 5.5704 LearningRate 0.0318 Epoch: 14 Global Step: 77350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:35:15,702-Speed 18662.31 samples/sec Loss 5.5190 LearningRate 0.0318 Epoch: 14 Global Step: 77360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:35:20,111-Speed 18583.93 samples/sec Loss 5.5488 LearningRate 0.0318 Epoch: 14 Global Step: 77370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:24,535-Speed 18522.10 samples/sec Loss 5.5518 LearningRate 0.0318 Epoch: 14 Global Step: 77380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:28,931-Speed 18643.07 samples/sec Loss 5.5223 LearningRate 0.0318 Epoch: 14 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:33,323-Speed 18656.04 samples/sec Loss 5.5551 LearningRate 0.0317 Epoch: 14 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:37,747-Speed 18525.25 samples/sec Loss 5.5765 LearningRate 0.0317 Epoch: 14 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:42,197-Speed 18410.10 samples/sec Loss 5.5573 LearningRate 0.0317 Epoch: 14 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:46,673-Speed 18309.80 samples/sec Loss 5.5322 LearningRate 0.0317 Epoch: 14 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:51,128-Speed 18392.42 samples/sec Loss 5.5147 LearningRate 0.0316 Epoch: 14 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:55,583-Speed 18394.67 samples/sec Loss 5.5353 LearningRate 0.0316 Epoch: 14 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:35:59,994-Speed 18575.30 samples/sec Loss 5.5480 LearningRate 0.0316 Epoch: 14 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:36:04,391-Speed 18635.39 samples/sec Loss 5.5310 LearningRate 0.0316 Epoch: 14 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:36:08,796-Speed 18607.46 samples/sec Loss 5.5624 LearningRate 0.0315 Epoch: 14 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:36:13,215-Speed 18545.92 samples/sec Loss 5.5703 LearningRate 0.0315 Epoch: 14 Global Step: 77490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:36:17,637-Speed 18535.80 samples/sec Loss 5.5625 LearningRate 0.0315 Epoch: 14 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:36:22,024-Speed 18678.34 samples/sec Loss 5.5149 LearningRate 0.0315 Epoch: 14 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:36:26,439-Speed 18561.18 samples/sec Loss 5.5405 LearningRate 0.0314 Epoch: 14 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:36:30,910-Speed 18330.85 samples/sec Loss 5.5476 LearningRate 0.0314 Epoch: 14 Global Step: 77530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:36:35,341-Speed 18492.19 samples/sec Loss 5.5899 LearningRate 0.0314 Epoch: 14 Global Step: 77540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:36:39,808-Speed 18342.70 samples/sec Loss 5.5569 LearningRate 0.0314 Epoch: 14 Global Step: 77550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:36:44,213-Speed 18605.26 samples/sec Loss 5.5743 LearningRate 0.0313 Epoch: 14 Global Step: 77560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:36:48,594-Speed 18704.03 samples/sec Loss 5.5254 LearningRate 0.0313 Epoch: 14 Global Step: 77570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:36:53,021-Speed 18509.08 samples/sec Loss 5.5387 LearningRate 0.0313 Epoch: 14 Global Step: 77580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:36:57,423-Speed 18614.47 samples/sec Loss 5.4983 LearningRate 0.0313 Epoch: 14 Global Step: 77590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:01,848-Speed 18520.03 samples/sec Loss 5.5726 LearningRate 0.0312 Epoch: 14 Global Step: 77600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:06,271-Speed 18531.24 samples/sec Loss 5.5728 LearningRate 0.0312 Epoch: 14 Global Step: 77610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:10,671-Speed 18626.98 samples/sec Loss 5.5026 LearningRate 0.0312 Epoch: 14 Global Step: 77620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:15,064-Speed 18651.03 samples/sec Loss 5.5540 LearningRate 0.0312 Epoch: 14 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:37:19,489-Speed 18524.61 samples/sec Loss 5.5302 LearningRate 0.0312 Epoch: 14 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:37:23,914-Speed 18513.17 samples/sec Loss 5.5320 LearningRate 0.0311 Epoch: 14 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:37:28,338-Speed 18524.90 samples/sec Loss 5.5762 LearningRate 0.0311 Epoch: 14 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:37:32,760-Speed 18533.60 samples/sec Loss 5.5407 LearningRate 0.0311 Epoch: 14 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:37:37,186-Speed 18510.92 samples/sec Loss 5.5525 LearningRate 0.0311 Epoch: 14 Global Step: 77680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:41,616-Speed 18499.56 samples/sec Loss 5.5469 LearningRate 0.0310 Epoch: 14 Global Step: 77690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:46,009-Speed 18651.26 samples/sec Loss 5.5680 LearningRate 0.0310 Epoch: 14 Global Step: 77700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:50,396-Speed 18683.36 samples/sec Loss 5.5381 LearningRate 0.0310 Epoch: 14 Global Step: 77710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:54,836-Speed 18451.62 samples/sec Loss 5.5143 LearningRate 0.0310 Epoch: 14 Global Step: 77720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:37:59,257-Speed 18537.12 samples/sec Loss 5.5258 LearningRate 0.0309 Epoch: 14 Global Step: 77730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:38:03,708-Speed 18410.05 samples/sec Loss 5.5369 LearningRate 0.0309 Epoch: 14 Global Step: 77740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:38:08,153-Speed 18434.40 samples/sec Loss 5.5475 LearningRate 0.0309 Epoch: 14 Global Step: 77750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:38:12,551-Speed 18633.22 samples/sec Loss 5.5463 LearningRate 0.0309 Epoch: 14 Global Step: 77760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:38:16,973-Speed 18534.14 samples/sec Loss 5.5438 LearningRate 0.0308 Epoch: 14 Global Step: 77770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:38:35,373-Speed 4452.48 samples/sec Loss 5.5257 LearningRate 0.0308 Epoch: 15 Global Step: 77780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:38:39,814-Speed 18449.05 samples/sec Loss 5.5297 LearningRate 0.0308 Epoch: 15 Global Step: 77790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:38:44,247-Speed 18490.79 samples/sec Loss 5.5053 LearningRate 0.0308 Epoch: 15 Global Step: 77800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:38:48,736-Speed 18254.15 samples/sec Loss 5.4988 LearningRate 0.0307 Epoch: 15 Global Step: 77810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:38:53,121-Speed 18693.59 samples/sec Loss 5.5658 LearningRate 0.0307 Epoch: 15 Global Step: 77820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:38:57,571-Speed 18416.97 samples/sec Loss 5.5338 LearningRate 0.0307 Epoch: 15 Global Step: 77830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:39:01,994-Speed 18528.43 samples/sec Loss 5.5433 LearningRate 0.0307 Epoch: 15 Global Step: 77840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:39:06,397-Speed 18609.80 samples/sec Loss 5.4872 LearningRate 0.0307 Epoch: 15 Global Step: 77850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:39:10,820-Speed 18534.59 samples/sec Loss 5.5291 LearningRate 0.0306 Epoch: 15 Global Step: 77860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:15,245-Speed 18521.36 samples/sec Loss 5.4725 LearningRate 0.0306 Epoch: 15 Global Step: 77870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:19,665-Speed 18538.30 samples/sec Loss 5.5597 LearningRate 0.0306 Epoch: 15 Global Step: 77880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:24,074-Speed 18587.88 samples/sec Loss 5.4869 LearningRate 0.0306 Epoch: 15 Global Step: 77890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:28,494-Speed 18536.26 samples/sec Loss 5.5123 LearningRate 0.0305 Epoch: 15 Global Step: 77900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:32,895-Speed 18619.54 samples/sec Loss 5.5294 LearningRate 0.0305 Epoch: 15 Global Step: 77910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:37,283-Speed 18677.06 samples/sec Loss 5.5094 LearningRate 0.0305 Epoch: 15 Global Step: 77920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:41,678-Speed 18645.73 samples/sec Loss 5.4974 LearningRate 0.0305 Epoch: 15 Global Step: 77930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:46,101-Speed 18526.64 samples/sec Loss 5.4914 LearningRate 0.0304 Epoch: 15 Global Step: 77940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:50,573-Speed 18323.72 samples/sec Loss 5.5448 LearningRate 0.0304 Epoch: 15 Global Step: 77950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:39:55,019-Speed 18431.18 samples/sec Loss 5.5259 LearningRate 0.0304 Epoch: 15 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:39:59,465-Speed 18431.09 samples/sec Loss 5.4833 LearningRate 0.0304 Epoch: 15 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:40:03,907-Speed 18445.48 samples/sec Loss 5.5146 LearningRate 0.0303 Epoch: 15 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:40:08,345-Speed 18465.31 samples/sec Loss 5.5130 LearningRate 0.0303 Epoch: 15 Global Step: 77990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:40:12,820-Speed 18310.06 samples/sec Loss 5.5112 LearningRate 0.0303 Epoch: 15 Global Step: 78000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:17,305-Speed 18273.70 samples/sec Loss 5.5143 LearningRate 0.0303 Epoch: 15 Global Step: 78010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:21,785-Speed 18289.62 samples/sec Loss 5.5285 LearningRate 0.0302 Epoch: 15 Global Step: 78020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:26,206-Speed 18536.39 samples/sec Loss 5.5115 LearningRate 0.0302 Epoch: 15 Global Step: 78030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:30,647-Speed 18449.97 samples/sec Loss 5.5151 LearningRate 0.0302 Epoch: 15 Global Step: 78040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:35,149-Speed 18198.89 samples/sec Loss 5.5385 LearningRate 0.0302 Epoch: 15 Global Step: 78050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:39,593-Speed 18444.92 samples/sec Loss 5.5200 LearningRate 0.0302 Epoch: 15 Global Step: 78060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:44,031-Speed 18465.93 samples/sec Loss 5.4855 LearningRate 0.0301 Epoch: 15 Global Step: 78070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:48,439-Speed 18588.59 samples/sec Loss 5.5174 LearningRate 0.0301 Epoch: 15 Global Step: 78080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:53,370-Speed 16616.52 samples/sec Loss 5.5322 LearningRate 0.0301 Epoch: 15 Global Step: 78090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:40:57,869-Speed 18216.77 samples/sec Loss 5.5217 LearningRate 0.0301 Epoch: 15 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:02,341-Speed 18320.27 samples/sec Loss 5.5340 LearningRate 0.0300 Epoch: 15 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:06,771-Speed 18499.55 samples/sec Loss 5.5000 LearningRate 0.0300 Epoch: 15 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:11,195-Speed 18519.89 samples/sec Loss 5.5249 LearningRate 0.0300 Epoch: 15 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:15,726-Speed 18087.22 samples/sec Loss 5.5143 LearningRate 0.0300 Epoch: 15 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:20,154-Speed 18504.82 samples/sec Loss 5.5299 LearningRate 0.0299 Epoch: 15 Global Step: 78150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:24,555-Speed 18617.31 samples/sec Loss 5.5300 LearningRate 0.0299 Epoch: 15 Global Step: 78160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:28,978-Speed 18529.04 samples/sec Loss 5.5012 LearningRate 0.0299 Epoch: 15 Global Step: 78170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:33,472-Speed 18230.12 samples/sec Loss 5.5380 LearningRate 0.0299 Epoch: 15 Global Step: 78180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:37,914-Speed 18449.26 samples/sec Loss 5.5100 LearningRate 0.0298 Epoch: 15 Global Step: 78190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:41:42,367-Speed 18400.19 samples/sec Loss 5.5301 LearningRate 0.0298 Epoch: 15 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:41:46,851-Speed 18278.19 samples/sec Loss 5.4903 LearningRate 0.0298 Epoch: 15 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:41:51,299-Speed 18419.10 samples/sec Loss 5.4901 LearningRate 0.0298 Epoch: 15 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:41:55,753-Speed 18397.32 samples/sec Loss 5.5397 LearningRate 0.0298 Epoch: 15 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:42:00,212-Speed 18378.34 samples/sec Loss 5.4873 LearningRate 0.0297 Epoch: 15 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:42:04,632-Speed 18538.89 samples/sec Loss 5.5120 LearningRate 0.0297 Epoch: 15 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:42:09,053-Speed 18535.78 samples/sec Loss 5.4762 LearningRate 0.0297 Epoch: 15 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:42:13,490-Speed 18466.73 samples/sec Loss 5.5520 LearningRate 0.0297 Epoch: 15 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:42:17,893-Speed 18613.11 samples/sec Loss 5.4811 LearningRate 0.0296 Epoch: 15 Global Step: 78280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:22,320-Speed 18508.84 samples/sec Loss 5.5531 LearningRate 0.0296 Epoch: 15 Global Step: 78290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:26,743-Speed 18524.93 samples/sec Loss 5.4666 LearningRate 0.0296 Epoch: 15 Global Step: 78300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:31,182-Speed 18465.38 samples/sec Loss 5.4699 LearningRate 0.0296 Epoch: 15 Global Step: 78310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:35,612-Speed 18500.57 samples/sec Loss 5.5139 LearningRate 0.0295 Epoch: 15 Global Step: 78320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:40,030-Speed 18549.18 samples/sec Loss 5.4745 LearningRate 0.0295 Epoch: 15 Global Step: 78330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:44,462-Speed 18488.69 samples/sec Loss 5.5170 LearningRate 0.0295 Epoch: 15 Global Step: 78340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:48,925-Speed 18362.09 samples/sec Loss 5.5037 LearningRate 0.0295 Epoch: 15 Global Step: 78350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:53,372-Speed 18426.02 samples/sec Loss 5.5121 LearningRate 0.0295 Epoch: 15 Global Step: 78360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:42:57,805-Speed 18485.25 samples/sec Loss 5.4972 LearningRate 0.0294 Epoch: 15 Global Step: 78370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:02,226-Speed 18535.03 samples/sec Loss 5.4994 LearningRate 0.0294 Epoch: 15 Global Step: 78380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:43:06,631-Speed 18603.59 samples/sec Loss 5.4833 LearningRate 0.0294 Epoch: 15 Global Step: 78390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:11,066-Speed 18480.75 samples/sec Loss 5.5135 LearningRate 0.0294 Epoch: 15 Global Step: 78400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:15,487-Speed 18535.42 samples/sec Loss 5.4740 LearningRate 0.0293 Epoch: 15 Global Step: 78410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:19,880-Speed 18653.61 samples/sec Loss 5.4810 LearningRate 0.0293 Epoch: 15 Global Step: 78420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:24,280-Speed 18623.51 samples/sec Loss 5.5224 LearningRate 0.0293 Epoch: 15 Global Step: 78430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:28,674-Speed 18651.33 samples/sec Loss 5.4862 LearningRate 0.0293 Epoch: 15 Global Step: 78440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:33,103-Speed 18498.71 samples/sec Loss 5.4808 LearningRate 0.0292 Epoch: 15 Global Step: 78450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:37,532-Speed 18500.86 samples/sec Loss 5.4674 LearningRate 0.0292 Epoch: 15 Global Step: 78460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:41,991-Speed 18379.14 samples/sec Loss 5.4791 LearningRate 0.0292 Epoch: 15 Global Step: 78470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:46,379-Speed 18672.93 samples/sec Loss 5.5047 LearningRate 0.0292 Epoch: 15 Global Step: 78480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:43:50,807-Speed 18506.86 samples/sec Loss 5.4719 LearningRate 0.0292 Epoch: 15 Global Step: 78490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:43:55,233-Speed 18516.34 samples/sec Loss 5.4875 LearningRate 0.0291 Epoch: 15 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:43:59,713-Speed 18287.76 samples/sec Loss 5.4697 LearningRate 0.0291 Epoch: 15 Global Step: 78510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:04,158-Speed 18431.90 samples/sec Loss 5.4656 LearningRate 0.0291 Epoch: 15 Global Step: 78520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:08,621-Speed 18362.54 samples/sec Loss 5.4858 LearningRate 0.0291 Epoch: 15 Global Step: 78530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:13,138-Speed 18141.68 samples/sec Loss 5.4977 LearningRate 0.0290 Epoch: 15 Global Step: 78540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:17,577-Speed 18461.04 samples/sec Loss 5.4707 LearningRate 0.0290 Epoch: 15 Global Step: 78550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:21,979-Speed 18612.61 samples/sec Loss 5.4951 LearningRate 0.0290 Epoch: 15 Global Step: 78560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:26,408-Speed 18503.30 samples/sec Loss 5.4963 LearningRate 0.0290 Epoch: 15 Global Step: 78570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:30,836-Speed 18507.60 samples/sec Loss 5.4896 LearningRate 0.0289 Epoch: 15 Global Step: 78580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:44:35,284-Speed 18423.29 samples/sec Loss 5.4780 LearningRate 0.0289 Epoch: 15 Global Step: 78590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:44:39,690-Speed 18601.50 samples/sec Loss 5.4942 LearningRate 0.0289 Epoch: 15 Global Step: 78600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:44:44,150-Speed 18373.05 samples/sec Loss 5.5047 LearningRate 0.0289 Epoch: 15 Global Step: 78610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:44:48,578-Speed 18504.09 samples/sec Loss 5.5053 LearningRate 0.0289 Epoch: 15 Global Step: 78620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:44:56,661-Speed 10137.55 samples/sec Loss 5.4739 LearningRate 0.0288 Epoch: 15 Global Step: 78630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:45:01,068-Speed 18591.34 samples/sec Loss 5.5134 LearningRate 0.0288 Epoch: 15 Global Step: 78640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:45:05,494-Speed 18514.59 samples/sec Loss 5.4674 LearningRate 0.0288 Epoch: 15 Global Step: 78650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:45:09,911-Speed 18552.48 samples/sec Loss 5.4495 LearningRate 0.0288 Epoch: 15 Global Step: 78660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:45:14,337-Speed 18512.88 samples/sec Loss 5.4728 LearningRate 0.0287 Epoch: 15 Global Step: 78670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:45:18,833-Speed 18225.34 samples/sec Loss 5.4656 LearningRate 0.0287 Epoch: 15 Global Step: 78680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:45:23,268-Speed 18478.55 samples/sec Loss 5.4763 LearningRate 0.0287 Epoch: 15 Global Step: 78690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:27,739-Speed 18322.92 samples/sec Loss 5.5201 LearningRate 0.0287 Epoch: 15 Global Step: 78700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:32,154-Speed 18560.19 samples/sec Loss 5.4983 LearningRate 0.0286 Epoch: 15 Global Step: 78710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:36,576-Speed 18531.93 samples/sec Loss 5.4565 LearningRate 0.0286 Epoch: 15 Global Step: 78720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:41,020-Speed 18439.93 samples/sec Loss 5.4758 LearningRate 0.0286 Epoch: 15 Global Step: 78730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:45,603-Speed 17881.06 samples/sec Loss 5.5161 LearningRate 0.0286 Epoch: 15 Global Step: 78740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:50,026-Speed 18525.08 samples/sec Loss 5.4680 LearningRate 0.0286 Epoch: 15 Global Step: 78750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:54,413-Speed 18679.79 samples/sec Loss 5.4518 LearningRate 0.0285 Epoch: 15 Global Step: 78760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:45:58,846-Speed 18484.32 samples/sec Loss 5.5013 LearningRate 0.0285 Epoch: 15 Global Step: 78770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:46:03,293-Speed 18422.65 samples/sec Loss 5.4699 LearningRate 0.0285 Epoch: 15 Global Step: 78780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:46:07,769-Speed 18309.34 samples/sec Loss 5.4555 LearningRate 0.0285 Epoch: 15 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:46:12,256-Speed 18262.10 samples/sec Loss 5.4399 LearningRate 0.0284 Epoch: 15 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:46:16,672-Speed 18554.20 samples/sec Loss 5.4486 LearningRate 0.0284 Epoch: 15 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:46:21,092-Speed 18539.61 samples/sec Loss 5.4567 LearningRate 0.0284 Epoch: 15 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:46:25,583-Speed 18245.51 samples/sec Loss 5.4744 LearningRate 0.0284 Epoch: 15 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:46:29,972-Speed 18671.39 samples/sec Loss 5.4960 LearningRate 0.0283 Epoch: 15 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:46:34,453-Speed 18284.68 samples/sec Loss 5.4369 LearningRate 0.0283 Epoch: 15 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:46:38,867-Speed 18567.16 samples/sec Loss 5.4676 LearningRate 0.0283 Epoch: 15 Global Step: 78860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:46:43,338-Speed 18330.42 samples/sec Loss 5.4780 LearningRate 0.0283 Epoch: 15 Global Step: 78870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:46:47,739-Speed 18626.69 samples/sec Loss 5.4947 LearningRate 0.0283 Epoch: 15 Global Step: 78880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:46:52,221-Speed 18280.53 samples/sec Loss 5.4820 LearningRate 0.0282 Epoch: 15 Global Step: 78890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:46:56,674-Speed 18407.73 samples/sec Loss 5.4764 LearningRate 0.0282 Epoch: 15 Global Step: 78900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:47:01,187-Speed 18159.33 samples/sec Loss 5.5054 LearningRate 0.0282 Epoch: 15 Global Step: 78910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:47:05,617-Speed 18497.43 samples/sec Loss 5.4375 LearningRate 0.0282 Epoch: 15 Global Step: 78920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:47:10,046-Speed 18500.87 samples/sec Loss 5.4916 LearningRate 0.0281 Epoch: 15 Global Step: 78930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:47:14,453-Speed 18595.71 samples/sec Loss 5.4485 LearningRate 0.0281 Epoch: 15 Global Step: 78940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:47:18,856-Speed 18610.71 samples/sec Loss 5.4790 LearningRate 0.0281 Epoch: 15 Global Step: 78950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:47:23,253-Speed 18635.33 samples/sec Loss 5.4396 LearningRate 0.0281 Epoch: 15 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:27,676-Speed 18525.86 samples/sec Loss 5.4660 LearningRate 0.0280 Epoch: 15 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:32,200-Speed 18110.19 samples/sec Loss 5.4976 LearningRate 0.0280 Epoch: 15 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:36,662-Speed 18367.11 samples/sec Loss 5.4597 LearningRate 0.0280 Epoch: 15 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:41,123-Speed 18366.32 samples/sec Loss 5.4491 LearningRate 0.0280 Epoch: 15 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:45,612-Speed 18254.27 samples/sec Loss 5.4763 LearningRate 0.0280 Epoch: 15 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:50,062-Speed 18414.89 samples/sec Loss 5.5177 LearningRate 0.0279 Epoch: 15 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:54,466-Speed 18606.78 samples/sec Loss 5.4738 LearningRate 0.0279 Epoch: 15 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:47:58,888-Speed 18530.73 samples/sec Loss 5.4439 LearningRate 0.0279 Epoch: 15 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:48:03,299-Speed 18576.18 samples/sec Loss 5.4666 LearningRate 0.0279 Epoch: 15 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:48:07,699-Speed 18624.49 samples/sec Loss 5.4271 LearningRate 0.0278 Epoch: 15 Global Step: 79060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:12,152-Speed 18407.65 samples/sec Loss 5.4473 LearningRate 0.0278 Epoch: 15 Global Step: 79070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:16,590-Speed 18467.32 samples/sec Loss 5.4093 LearningRate 0.0278 Epoch: 15 Global Step: 79080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:21,014-Speed 18525.47 samples/sec Loss 5.4496 LearningRate 0.0278 Epoch: 15 Global Step: 79090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:25,451-Speed 18466.62 samples/sec Loss 5.4753 LearningRate 0.0278 Epoch: 15 Global Step: 79100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:29,911-Speed 18375.61 samples/sec Loss 5.4547 LearningRate 0.0277 Epoch: 15 Global Step: 79110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:34,318-Speed 18589.18 samples/sec Loss 5.4946 LearningRate 0.0277 Epoch: 15 Global Step: 79120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:38,735-Speed 18554.50 samples/sec Loss 5.4572 LearningRate 0.0277 Epoch: 15 Global Step: 79130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:43,163-Speed 18506.98 samples/sec Loss 5.4580 LearningRate 0.0277 Epoch: 15 Global Step: 79140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:47,694-Speed 18086.95 samples/sec Loss 5.4264 LearningRate 0.0276 Epoch: 15 Global Step: 79150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:48:52,105-Speed 18578.61 samples/sec Loss 5.4525 LearningRate 0.0276 Epoch: 15 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:48:56,519-Speed 18564.35 samples/sec Loss 5.4522 LearningRate 0.0276 Epoch: 15 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:00,924-Speed 18600.25 samples/sec Loss 5.4581 LearningRate 0.0276 Epoch: 15 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:05,333-Speed 18585.47 samples/sec Loss 5.4394 LearningRate 0.0276 Epoch: 15 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:09,772-Speed 18480.33 samples/sec Loss 5.4828 LearningRate 0.0275 Epoch: 15 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:14,189-Speed 18548.86 samples/sec Loss 5.4503 LearningRate 0.0275 Epoch: 15 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:18,594-Speed 18608.53 samples/sec Loss 5.4685 LearningRate 0.0275 Epoch: 15 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:22,992-Speed 18636.18 samples/sec Loss 5.4450 LearningRate 0.0275 Epoch: 15 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:27,442-Speed 18412.31 samples/sec Loss 5.4076 LearningRate 0.0274 Epoch: 15 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:31,840-Speed 18632.00 samples/sec Loss 5.4839 LearningRate 0.0274 Epoch: 15 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:36,314-Speed 18319.95 samples/sec Loss 5.4711 LearningRate 0.0274 Epoch: 15 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 07:49:40,739-Speed 18517.08 samples/sec Loss 5.4552 LearningRate 0.0274 Epoch: 15 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 07:49:48,182-Speed 11008.97 samples/sec Loss 5.4606 LearningRate 0.0274 Epoch: 15 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:52,629-Speed 18424.51 samples/sec Loss 5.4505 LearningRate 0.0273 Epoch: 15 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:49:57,045-Speed 18556.33 samples/sec Loss 5.4406 LearningRate 0.0273 Epoch: 15 Global Step: 79300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:01,504-Speed 18380.13 samples/sec Loss 5.4477 LearningRate 0.0273 Epoch: 15 Global Step: 79310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:05,915-Speed 18579.72 samples/sec Loss 5.4517 LearningRate 0.0273 Epoch: 15 Global Step: 79320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:10,342-Speed 18507.55 samples/sec Loss 5.4698 LearningRate 0.0272 Epoch: 15 Global Step: 79330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:14,770-Speed 18508.98 samples/sec Loss 5.4519 LearningRate 0.0272 Epoch: 15 Global Step: 79340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:19,193-Speed 18528.46 samples/sec Loss 5.4648 LearningRate 0.0272 Epoch: 15 Global Step: 79350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:23,610-Speed 18552.48 samples/sec Loss 5.4551 LearningRate 0.0272 Epoch: 15 Global Step: 79360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:28,024-Speed 18560.93 samples/sec Loss 5.4505 LearningRate 0.0271 Epoch: 15 Global Step: 79370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:32,471-Speed 18432.47 samples/sec Loss 5.4578 LearningRate 0.0271 Epoch: 15 Global Step: 79380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:36,897-Speed 18516.90 samples/sec Loss 5.4704 LearningRate 0.0271 Epoch: 15 Global Step: 79390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:41,324-Speed 18507.16 samples/sec Loss 5.4454 LearningRate 0.0271 Epoch: 15 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:50:45,768-Speed 18438.85 samples/sec Loss 5.4607 LearningRate 0.0271 Epoch: 15 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:50:50,173-Speed 18600.83 samples/sec Loss 5.4320 LearningRate 0.0270 Epoch: 15 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:50:54,576-Speed 18610.07 samples/sec Loss 5.4289 LearningRate 0.0270 Epoch: 15 Global Step: 79430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:50:59,028-Speed 18409.16 samples/sec Loss 5.4921 LearningRate 0.0270 Epoch: 15 Global Step: 79440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:51:03,444-Speed 18552.87 samples/sec Loss 5.4432 LearningRate 0.0270 Epoch: 15 Global Step: 79450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:07,879-Speed 18485.64 samples/sec Loss 5.4200 LearningRate 0.0269 Epoch: 15 Global Step: 79460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:12,317-Speed 18466.99 samples/sec Loss 5.4405 LearningRate 0.0269 Epoch: 15 Global Step: 79470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:16,734-Speed 18551.64 samples/sec Loss 5.4111 LearningRate 0.0269 Epoch: 15 Global Step: 79480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:21,175-Speed 18454.42 samples/sec Loss 5.3990 LearningRate 0.0269 Epoch: 15 Global Step: 79490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:25,624-Speed 18422.89 samples/sec Loss 5.4334 LearningRate 0.0269 Epoch: 15 Global Step: 79500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:30,030-Speed 18599.78 samples/sec Loss 5.4297 LearningRate 0.0268 Epoch: 15 Global Step: 79510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:34,448-Speed 18546.93 samples/sec Loss 5.4290 LearningRate 0.0268 Epoch: 15 Global Step: 79520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:38,818-Speed 18753.10 samples/sec Loss 5.4141 LearningRate 0.0268 Epoch: 15 Global Step: 79530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:43,238-Speed 18542.13 samples/sec Loss 5.4605 LearningRate 0.0268 Epoch: 15 Global Step: 79540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:51:47,633-Speed 18645.44 samples/sec Loss 5.4546 LearningRate 0.0267 Epoch: 15 Global Step: 79550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:51:52,037-Speed 18602.45 samples/sec Loss 5.4245 LearningRate 0.0267 Epoch: 15 Global Step: 79560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:51:56,484-Speed 18429.48 samples/sec Loss 5.4352 LearningRate 0.0267 Epoch: 15 Global Step: 79570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:01,017-Speed 18075.96 samples/sec Loss 5.4187 LearningRate 0.0267 Epoch: 15 Global Step: 79580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:05,476-Speed 18377.96 samples/sec Loss 5.4155 LearningRate 0.0267 Epoch: 15 Global Step: 79590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:09,947-Speed 18329.83 samples/sec Loss 5.4055 LearningRate 0.0266 Epoch: 15 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:14,401-Speed 18398.20 samples/sec Loss 5.4371 LearningRate 0.0266 Epoch: 15 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:18,873-Speed 18321.44 samples/sec Loss 5.4346 LearningRate 0.0266 Epoch: 15 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:23,383-Speed 18172.72 samples/sec Loss 5.4261 LearningRate 0.0266 Epoch: 15 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:27,825-Speed 18445.81 samples/sec Loss 5.4409 LearningRate 0.0265 Epoch: 15 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:32,230-Speed 18597.69 samples/sec Loss 5.4362 LearningRate 0.0265 Epoch: 15 Global Step: 79650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:52:36,654-Speed 18522.46 samples/sec Loss 5.4521 LearningRate 0.0265 Epoch: 15 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:52:41,059-Speed 18602.89 samples/sec Loss 5.4183 LearningRate 0.0265 Epoch: 15 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:52:45,486-Speed 18512.58 samples/sec Loss 5.4535 LearningRate 0.0265 Epoch: 15 Global Step: 79680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:49,910-Speed 18522.82 samples/sec Loss 5.4180 LearningRate 0.0264 Epoch: 15 Global Step: 79690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:54,326-Speed 18552.61 samples/sec Loss 5.4379 LearningRate 0.0264 Epoch: 15 Global Step: 79700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:52:58,764-Speed 18466.28 samples/sec Loss 5.3995 LearningRate 0.0264 Epoch: 15 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:03,196-Speed 18484.72 samples/sec Loss 5.3948 LearningRate 0.0264 Epoch: 15 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:07,613-Speed 18552.70 samples/sec Loss 5.4236 LearningRate 0.0264 Epoch: 15 Global Step: 79730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:12,017-Speed 18607.07 samples/sec Loss 5.4324 LearningRate 0.0263 Epoch: 15 Global Step: 79740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:16,436-Speed 18543.99 samples/sec Loss 5.4388 LearningRate 0.0263 Epoch: 15 Global Step: 79750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:20,913-Speed 18300.49 samples/sec Loss 5.3918 LearningRate 0.0263 Epoch: 15 Global Step: 79760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:25,354-Speed 18454.66 samples/sec Loss 5.4173 LearningRate 0.0263 Epoch: 15 Global Step: 79770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:29,770-Speed 18552.62 samples/sec Loss 5.4267 LearningRate 0.0262 Epoch: 15 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:53:34,205-Speed 18475.37 samples/sec Loss 5.4151 LearningRate 0.0262 Epoch: 15 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:53:38,633-Speed 18504.07 samples/sec Loss 5.4252 LearningRate 0.0262 Epoch: 15 Global Step: 79800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:43,049-Speed 18563.97 samples/sec Loss 5.3882 LearningRate 0.0262 Epoch: 15 Global Step: 79810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:47,441-Speed 18660.39 samples/sec Loss 5.4466 LearningRate 0.0262 Epoch: 15 Global Step: 79820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:51,871-Speed 18495.72 samples/sec Loss 5.4709 LearningRate 0.0261 Epoch: 15 Global Step: 79830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:53:56,277-Speed 18595.84 samples/sec Loss 5.4280 LearningRate 0.0261 Epoch: 15 Global Step: 79840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:00,681-Speed 18606.24 samples/sec Loss 5.4058 LearningRate 0.0261 Epoch: 15 Global Step: 79850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:05,105-Speed 18525.80 samples/sec Loss 5.3670 LearningRate 0.0261 Epoch: 15 Global Step: 79860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:09,561-Speed 18392.40 samples/sec Loss 5.4207 LearningRate 0.0260 Epoch: 15 Global Step: 79870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:13,978-Speed 18549.42 samples/sec Loss 5.4006 LearningRate 0.0260 Epoch: 15 Global Step: 79880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:18,398-Speed 18538.99 samples/sec Loss 5.4069 LearningRate 0.0260 Epoch: 15 Global Step: 79890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:22,802-Speed 18608.03 samples/sec Loss 5.3732 LearningRate 0.0260 Epoch: 15 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:54:27,217-Speed 18558.51 samples/sec Loss 5.3881 LearningRate 0.0260 Epoch: 15 Global Step: 79910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:31,613-Speed 18636.45 samples/sec Loss 5.4015 LearningRate 0.0259 Epoch: 15 Global Step: 79920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:36,044-Speed 18494.93 samples/sec Loss 5.4086 LearningRate 0.0259 Epoch: 15 Global Step: 79930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:40,456-Speed 18571.21 samples/sec Loss 5.4123 LearningRate 0.0259 Epoch: 15 Global Step: 79940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:44,957-Speed 18203.76 samples/sec Loss 5.4253 LearningRate 0.0259 Epoch: 15 Global Step: 79950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:49,872-Speed 16672.14 samples/sec Loss 5.4044 LearningRate 0.0258 Epoch: 15 Global Step: 79960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:54,283-Speed 18580.70 samples/sec Loss 5.4017 LearningRate 0.0258 Epoch: 15 Global Step: 79970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:54:58,696-Speed 18566.35 samples/sec Loss 5.3806 LearningRate 0.0258 Epoch: 15 Global Step: 79980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:03,120-Speed 18518.80 samples/sec Loss 5.3849 LearningRate 0.0258 Epoch: 15 Global Step: 79990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:07,544-Speed 18525.13 samples/sec Loss 5.4029 LearningRate 0.0258 Epoch: 15 Global Step: 80000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:11,937-Speed 18652.45 samples/sec Loss 5.4278 LearningRate 0.0257 Epoch: 15 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:55:16,331-Speed 18649.80 samples/sec Loss 5.3778 LearningRate 0.0257 Epoch: 15 Global Step: 80020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:20,761-Speed 18493.36 samples/sec Loss 5.4145 LearningRate 0.0257 Epoch: 15 Global Step: 80030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:25,293-Speed 18082.43 samples/sec Loss 5.4322 LearningRate 0.0257 Epoch: 15 Global Step: 80040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:30,115-Speed 16992.34 samples/sec Loss 5.4249 LearningRate 0.0257 Epoch: 15 Global Step: 80050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:34,540-Speed 18517.56 samples/sec Loss 5.4133 LearningRate 0.0256 Epoch: 15 Global Step: 80060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:39,006-Speed 18349.61 samples/sec Loss 5.4134 LearningRate 0.0256 Epoch: 15 Global Step: 80070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:43,476-Speed 18331.61 samples/sec Loss 5.4093 LearningRate 0.0256 Epoch: 15 Global Step: 80080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:47,883-Speed 18594.02 samples/sec Loss 5.4349 LearningRate 0.0256 Epoch: 15 Global Step: 80090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:52,308-Speed 18515.43 samples/sec Loss 5.3879 LearningRate 0.0255 Epoch: 15 Global Step: 80100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:55:56,784-Speed 18307.20 samples/sec Loss 5.4041 LearningRate 0.0255 Epoch: 15 Global Step: 80110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:01,168-Speed 18690.78 samples/sec Loss 5.4211 LearningRate 0.0255 Epoch: 15 Global Step: 80120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:05,607-Speed 18458.66 samples/sec Loss 5.4541 LearningRate 0.0255 Epoch: 15 Global Step: 80130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:10,007-Speed 18622.66 samples/sec Loss 5.4522 LearningRate 0.0255 Epoch: 15 Global Step: 80140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:14,446-Speed 18460.35 samples/sec Loss 5.4035 LearningRate 0.0254 Epoch: 15 Global Step: 80150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:18,851-Speed 18602.06 samples/sec Loss 5.4106 LearningRate 0.0254 Epoch: 15 Global Step: 80160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:23,256-Speed 18602.99 samples/sec Loss 5.4007 LearningRate 0.0254 Epoch: 15 Global Step: 80170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:27,686-Speed 18496.57 samples/sec Loss 5.3974 LearningRate 0.0254 Epoch: 15 Global Step: 80180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:32,089-Speed 18610.40 samples/sec Loss 5.4265 LearningRate 0.0253 Epoch: 15 Global Step: 80190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:36,534-Speed 18435.97 samples/sec Loss 5.3953 LearningRate 0.0253 Epoch: 15 Global Step: 80200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:41,017-Speed 18276.45 samples/sec Loss 5.4197 LearningRate 0.0253 Epoch: 15 Global Step: 80210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:45,461-Speed 18441.70 samples/sec Loss 5.4189 LearningRate 0.0253 Epoch: 15 Global Step: 80220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:49,902-Speed 18452.69 samples/sec Loss 5.4362 LearningRate 0.0253 Epoch: 15 Global Step: 80230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:54,379-Speed 18301.75 samples/sec Loss 5.3777 LearningRate 0.0252 Epoch: 15 Global Step: 80240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:56:58,757-Speed 18718.32 samples/sec Loss 5.4148 LearningRate 0.0252 Epoch: 15 Global Step: 80250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:03,157-Speed 18621.37 samples/sec Loss 5.4215 LearningRate 0.0252 Epoch: 15 Global Step: 80260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:07,598-Speed 18454.43 samples/sec Loss 5.4020 LearningRate 0.0252 Epoch: 15 Global Step: 80270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:11,998-Speed 18621.36 samples/sec Loss 5.3992 LearningRate 0.0252 Epoch: 15 Global Step: 80280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:16,395-Speed 18639.50 samples/sec Loss 5.3866 LearningRate 0.0251 Epoch: 15 Global Step: 80290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:20,797-Speed 18612.58 samples/sec Loss 5.4185 LearningRate 0.0251 Epoch: 15 Global Step: 80300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:25,259-Speed 18365.30 samples/sec Loss 5.3881 LearningRate 0.0251 Epoch: 15 Global Step: 80310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:29,640-Speed 18704.14 samples/sec Loss 5.3763 LearningRate 0.0251 Epoch: 15 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:57:34,049-Speed 18584.00 samples/sec Loss 5.4280 LearningRate 0.0250 Epoch: 15 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:57:38,486-Speed 18468.60 samples/sec Loss 5.3833 LearningRate 0.0250 Epoch: 15 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:57:42,912-Speed 18513.89 samples/sec Loss 5.3809 LearningRate 0.0250 Epoch: 15 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:57:47,326-Speed 18564.29 samples/sec Loss 5.3902 LearningRate 0.0250 Epoch: 15 Global Step: 80360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:51,725-Speed 18626.55 samples/sec Loss 5.4098 LearningRate 0.0250 Epoch: 15 Global Step: 80370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:57:56,143-Speed 18549.84 samples/sec Loss 5.3888 LearningRate 0.0249 Epoch: 15 Global Step: 80380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:00,579-Speed 18474.02 samples/sec Loss 5.3990 LearningRate 0.0249 Epoch: 15 Global Step: 80390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:04,974-Speed 18642.93 samples/sec Loss 5.3856 LearningRate 0.0249 Epoch: 15 Global Step: 80400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:09,379-Speed 18603.48 samples/sec Loss 5.3765 LearningRate 0.0249 Epoch: 15 Global Step: 80410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:13,834-Speed 18391.52 samples/sec Loss 5.3945 LearningRate 0.0249 Epoch: 15 Global Step: 80420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:18,263-Speed 18502.65 samples/sec Loss 5.3884 LearningRate 0.0248 Epoch: 15 Global Step: 80430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:22,704-Speed 18454.08 samples/sec Loss 5.4068 LearningRate 0.0248 Epoch: 15 Global Step: 80440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:27,220-Speed 18144.34 samples/sec Loss 5.3880 LearningRate 0.0248 Epoch: 15 Global Step: 80450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:31,708-Speed 18254.76 samples/sec Loss 5.4196 LearningRate 0.0248 Epoch: 15 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:58:36,147-Speed 18456.97 samples/sec Loss 5.3567 LearningRate 0.0247 Epoch: 15 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:58:40,599-Speed 18407.51 samples/sec Loss 5.3935 LearningRate 0.0247 Epoch: 15 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:58:45,029-Speed 18499.76 samples/sec Loss 5.3868 LearningRate 0.0247 Epoch: 15 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:58:49,462-Speed 18483.79 samples/sec Loss 5.3754 LearningRate 0.0247 Epoch: 15 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 07:58:53,887-Speed 18516.96 samples/sec Loss 5.3226 LearningRate 0.0247 Epoch: 15 Global Step: 80510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:58:58,340-Speed 18402.12 samples/sec Loss 5.3985 LearningRate 0.0246 Epoch: 15 Global Step: 80520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:59:02,864-Speed 18109.35 samples/sec Loss 5.3783 LearningRate 0.0246 Epoch: 15 Global Step: 80530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:59:07,286-Speed 18533.90 samples/sec Loss 5.4104 LearningRate 0.0246 Epoch: 15 Global Step: 80540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:11,679-Speed 18653.22 samples/sec Loss 5.4016 LearningRate 0.0246 Epoch: 15 Global Step: 80550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:16,115-Speed 18474.70 samples/sec Loss 5.3824 LearningRate 0.0246 Epoch: 15 Global Step: 80560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:20,525-Speed 18579.45 samples/sec Loss 5.3586 LearningRate 0.0245 Epoch: 15 Global Step: 80570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:24,958-Speed 18484.73 samples/sec Loss 5.3915 LearningRate 0.0245 Epoch: 15 Global Step: 80580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:29,406-Speed 18423.31 samples/sec Loss 5.3740 LearningRate 0.0245 Epoch: 15 Global Step: 80590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:33,808-Speed 18615.33 samples/sec Loss 5.3879 LearningRate 0.0245 Epoch: 15 Global Step: 80600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:38,249-Speed 18451.68 samples/sec Loss 5.3985 LearningRate 0.0245 Epoch: 15 Global Step: 80610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:42,669-Speed 18537.27 samples/sec Loss 5.3638 LearningRate 0.0244 Epoch: 15 Global Step: 80620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:47,155-Speed 18270.73 samples/sec Loss 5.3718 LearningRate 0.0244 Epoch: 15 Global Step: 80630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 07:59:51,580-Speed 18518.28 samples/sec Loss 5.3862 LearningRate 0.0244 Epoch: 15 Global Step: 80640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 07:59:55,969-Speed 18670.16 samples/sec Loss 5.3487 LearningRate 0.0244 Epoch: 15 Global Step: 80650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:00,454-Speed 18267.84 samples/sec Loss 5.4054 LearningRate 0.0243 Epoch: 15 Global Step: 80660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:04,861-Speed 18595.33 samples/sec Loss 5.4187 LearningRate 0.0243 Epoch: 15 Global Step: 80670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:09,283-Speed 18534.04 samples/sec Loss 5.3811 LearningRate 0.0243 Epoch: 15 Global Step: 80680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:13,754-Speed 18326.98 samples/sec Loss 5.3845 LearningRate 0.0243 Epoch: 15 Global Step: 80690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:18,267-Speed 18158.46 samples/sec Loss 5.3779 LearningRate 0.0243 Epoch: 15 Global Step: 80700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:22,686-Speed 18542.62 samples/sec Loss 5.3922 LearningRate 0.0242 Epoch: 15 Global Step: 80710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:27,135-Speed 18423.51 samples/sec Loss 5.3948 LearningRate 0.0242 Epoch: 15 Global Step: 80720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:31,538-Speed 18609.60 samples/sec Loss 5.3536 LearningRate 0.0242 Epoch: 15 Global Step: 80730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:00:35,943-Speed 18602.37 samples/sec Loss 5.4035 LearningRate 0.0242 Epoch: 15 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:00:40,338-Speed 18644.73 samples/sec Loss 5.4157 LearningRate 0.0242 Epoch: 15 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:00:44,722-Speed 18689.16 samples/sec Loss 5.3977 LearningRate 0.0241 Epoch: 15 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:00:49,130-Speed 18590.78 samples/sec Loss 5.3839 LearningRate 0.0241 Epoch: 15 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:00:53,542-Speed 18573.09 samples/sec Loss 5.3614 LearningRate 0.0241 Epoch: 15 Global Step: 80780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:00:57,943-Speed 18620.64 samples/sec Loss 5.3329 LearningRate 0.0241 Epoch: 15 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:01:02,380-Speed 18469.03 samples/sec Loss 5.3552 LearningRate 0.0240 Epoch: 15 Global Step: 80800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:06,791-Speed 18576.08 samples/sec Loss 5.3768 LearningRate 0.0240 Epoch: 15 Global Step: 80810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:11,210-Speed 18543.99 samples/sec Loss 5.3627 LearningRate 0.0240 Epoch: 15 Global Step: 80820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:15,671-Speed 18371.25 samples/sec Loss 5.3889 LearningRate 0.0240 Epoch: 15 Global Step: 80830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:20,082-Speed 18574.88 samples/sec Loss 5.3668 LearningRate 0.0240 Epoch: 15 Global Step: 80840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:24,534-Speed 18409.39 samples/sec Loss 5.3866 LearningRate 0.0239 Epoch: 15 Global Step: 80850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:28,952-Speed 18552.86 samples/sec Loss 5.3706 LearningRate 0.0239 Epoch: 15 Global Step: 80860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:33,449-Speed 18220.28 samples/sec Loss 5.3488 LearningRate 0.0239 Epoch: 15 Global Step: 80870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:37,910-Speed 18371.25 samples/sec Loss 5.3634 LearningRate 0.0239 Epoch: 15 Global Step: 80880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:42,335-Speed 18515.20 samples/sec Loss 5.3870 LearningRate 0.0239 Epoch: 15 Global Step: 80890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:01:46,747-Speed 18573.20 samples/sec Loss 5.3683 LearningRate 0.0238 Epoch: 15 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:01:51,224-Speed 18304.42 samples/sec Loss 5.3625 LearningRate 0.0238 Epoch: 15 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:01:55,613-Speed 18668.68 samples/sec Loss 5.3809 LearningRate 0.0238 Epoch: 15 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:02:00,012-Speed 18628.47 samples/sec Loss 5.3570 LearningRate 0.0238 Epoch: 15 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:02:04,437-Speed 18515.31 samples/sec Loss 5.3579 LearningRate 0.0238 Epoch: 15 Global Step: 80940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:08,889-Speed 18405.94 samples/sec Loss 5.3582 LearningRate 0.0237 Epoch: 15 Global Step: 80950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:13,327-Speed 18465.35 samples/sec Loss 5.3338 LearningRate 0.0237 Epoch: 15 Global Step: 80960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:17,735-Speed 18587.91 samples/sec Loss 5.3823 LearningRate 0.0237 Epoch: 15 Global Step: 80970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:22,121-Speed 18681.16 samples/sec Loss 5.3834 LearningRate 0.0237 Epoch: 15 Global Step: 80980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:26,512-Speed 18663.75 samples/sec Loss 5.3710 LearningRate 0.0237 Epoch: 15 Global Step: 80990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:30,953-Speed 18453.53 samples/sec Loss 5.3143 LearningRate 0.0236 Epoch: 15 Global Step: 81000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:35,366-Speed 18564.37 samples/sec Loss 5.3407 LearningRate 0.0236 Epoch: 15 Global Step: 81010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:39,852-Speed 18267.03 samples/sec Loss 5.3610 LearningRate 0.0236 Epoch: 15 Global Step: 81020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:44,308-Speed 18388.70 samples/sec Loss 5.3297 LearningRate 0.0236 Epoch: 15 Global Step: 81030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:02:48,767-Speed 18381.90 samples/sec Loss 5.3611 LearningRate 0.0235 Epoch: 15 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:02:53,241-Speed 18311.68 samples/sec Loss 5.3534 LearningRate 0.0235 Epoch: 15 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:02:57,646-Speed 18602.94 samples/sec Loss 5.3691 LearningRate 0.0235 Epoch: 15 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:03:02,056-Speed 18578.03 samples/sec Loss 5.3308 LearningRate 0.0235 Epoch: 15 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:03:06,453-Speed 18637.45 samples/sec Loss 5.3390 LearningRate 0.0235 Epoch: 15 Global Step: 81080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:10,852-Speed 18631.96 samples/sec Loss 5.3478 LearningRate 0.0234 Epoch: 15 Global Step: 81090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:15,280-Speed 18506.83 samples/sec Loss 5.3552 LearningRate 0.0234 Epoch: 15 Global Step: 81100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:19,669-Speed 18670.75 samples/sec Loss 5.3140 LearningRate 0.0234 Epoch: 15 Global Step: 81110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:24,045-Speed 18724.61 samples/sec Loss 5.3809 LearningRate 0.0234 Epoch: 15 Global Step: 81120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:28,479-Speed 18483.64 samples/sec Loss 5.3277 LearningRate 0.0234 Epoch: 15 Global Step: 81130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:32,890-Speed 18576.62 samples/sec Loss 5.3743 LearningRate 0.0233 Epoch: 15 Global Step: 81140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:37,397-Speed 18177.23 samples/sec Loss 5.3856 LearningRate 0.0233 Epoch: 15 Global Step: 81150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:41,828-Speed 18492.80 samples/sec Loss 5.3442 LearningRate 0.0233 Epoch: 15 Global Step: 81160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:46,219-Speed 18664.20 samples/sec Loss 5.3493 LearningRate 0.0233 Epoch: 15 Global Step: 81170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:50,635-Speed 18551.91 samples/sec Loss 5.3420 LearningRate 0.0233 Epoch: 15 Global Step: 81180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:55,029-Speed 18648.73 samples/sec Loss 5.3562 LearningRate 0.0232 Epoch: 15 Global Step: 81190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:03:59,479-Speed 18411.97 samples/sec Loss 5.3496 LearningRate 0.0232 Epoch: 15 Global Step: 81200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:03,935-Speed 18391.22 samples/sec Loss 5.3662 LearningRate 0.0232 Epoch: 15 Global Step: 81210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:08,361-Speed 18515.64 samples/sec Loss 5.3545 LearningRate 0.0232 Epoch: 15 Global Step: 81220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:12,795-Speed 18483.24 samples/sec Loss 5.3118 LearningRate 0.0232 Epoch: 15 Global Step: 81230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:17,198-Speed 18614.88 samples/sec Loss 5.3384 LearningRate 0.0231 Epoch: 15 Global Step: 81240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:21,665-Speed 18342.79 samples/sec Loss 5.3450 LearningRate 0.0231 Epoch: 15 Global Step: 81250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:26,080-Speed 18559.90 samples/sec Loss 5.3816 LearningRate 0.0231 Epoch: 15 Global Step: 81260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:30,484-Speed 18609.34 samples/sec Loss 5.3405 LearningRate 0.0231 Epoch: 15 Global Step: 81270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:34,900-Speed 18554.76 samples/sec Loss 5.3512 LearningRate 0.0231 Epoch: 15 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:04:39,334-Speed 18480.86 samples/sec Loss 5.3200 LearningRate 0.0230 Epoch: 15 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:04:43,793-Speed 18377.45 samples/sec Loss 5.3473 LearningRate 0.0230 Epoch: 15 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:04:48,179-Speed 18685.38 samples/sec Loss 5.3472 LearningRate 0.0230 Epoch: 15 Global Step: 81310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:52,582-Speed 18608.62 samples/sec Loss 5.3617 LearningRate 0.0230 Epoch: 15 Global Step: 81320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:04:56,991-Speed 18585.51 samples/sec Loss 5.3457 LearningRate 0.0229 Epoch: 15 Global Step: 81330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:01,442-Speed 18408.56 samples/sec Loss 5.3441 LearningRate 0.0229 Epoch: 15 Global Step: 81340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:05,870-Speed 18510.16 samples/sec Loss 5.3420 LearningRate 0.0229 Epoch: 15 Global Step: 81350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:10,266-Speed 18642.01 samples/sec Loss 5.3517 LearningRate 0.0229 Epoch: 15 Global Step: 81360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:14,748-Speed 18281.91 samples/sec Loss 5.3347 LearningRate 0.0229 Epoch: 15 Global Step: 81370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:19,161-Speed 18568.58 samples/sec Loss 5.3345 LearningRate 0.0228 Epoch: 15 Global Step: 81380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:23,616-Speed 18396.52 samples/sec Loss 5.3584 LearningRate 0.0228 Epoch: 15 Global Step: 81390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:28,082-Speed 18348.15 samples/sec Loss 5.3198 LearningRate 0.0228 Epoch: 15 Global Step: 81400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:32,490-Speed 18588.36 samples/sec Loss 5.3467 LearningRate 0.0228 Epoch: 15 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:05:36,912-Speed 18527.66 samples/sec Loss 5.4042 LearningRate 0.0228 Epoch: 15 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:05:41,359-Speed 18429.41 samples/sec Loss 5.2760 LearningRate 0.0227 Epoch: 15 Global Step: 81430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:45,837-Speed 18300.72 samples/sec Loss 5.3525 LearningRate 0.0227 Epoch: 15 Global Step: 81440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:50,245-Speed 18589.80 samples/sec Loss 5.3320 LearningRate 0.0227 Epoch: 15 Global Step: 81450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:54,666-Speed 18533.83 samples/sec Loss 5.3369 LearningRate 0.0227 Epoch: 15 Global Step: 81460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:05:59,146-Speed 18292.65 samples/sec Loss 5.3282 LearningRate 0.0227 Epoch: 15 Global Step: 81470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:06:03,676-Speed 18089.48 samples/sec Loss 5.3202 LearningRate 0.0226 Epoch: 15 Global Step: 81480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:06:08,138-Speed 18367.33 samples/sec Loss 5.3477 LearningRate 0.0226 Epoch: 15 Global Step: 81490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:06:12,601-Speed 18358.30 samples/sec Loss 5.3276 LearningRate 0.0226 Epoch: 15 Global Step: 81500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:06:17,013-Speed 18574.72 samples/sec Loss 5.2924 LearningRate 0.0226 Epoch: 15 Global Step: 81510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:06:21,442-Speed 18499.66 samples/sec Loss 5.3281 LearningRate 0.0226 Epoch: 15 Global Step: 81520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:06:25,843-Speed 18619.98 samples/sec Loss 5.3198 LearningRate 0.0225 Epoch: 15 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:06:30,299-Speed 18392.30 samples/sec Loss 5.3429 LearningRate 0.0225 Epoch: 15 Global Step: 81540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:06:34,713-Speed 18563.86 samples/sec Loss 5.2677 LearningRate 0.0225 Epoch: 15 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:06:39,152-Speed 18458.49 samples/sec Loss 5.3525 LearningRate 0.0225 Epoch: 15 Global Step: 81560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:06:43,586-Speed 18480.71 samples/sec Loss 5.3442 LearningRate 0.0225 Epoch: 15 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:06:48,113-Speed 18103.42 samples/sec Loss 5.3140 LearningRate 0.0224 Epoch: 15 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:06:52,625-Speed 18160.68 samples/sec Loss 5.3311 LearningRate 0.0224 Epoch: 15 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:06:57,017-Speed 18660.01 samples/sec Loss 5.3121 LearningRate 0.0224 Epoch: 15 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:01,441-Speed 18527.47 samples/sec Loss 5.3172 LearningRate 0.0224 Epoch: 15 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:05,920-Speed 18297.34 samples/sec Loss 5.2751 LearningRate 0.0224 Epoch: 15 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:10,378-Speed 18382.69 samples/sec Loss 5.3278 LearningRate 0.0223 Epoch: 15 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:14,840-Speed 18364.66 samples/sec Loss 5.3496 LearningRate 0.0223 Epoch: 15 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:19,276-Speed 18477.46 samples/sec Loss 5.3377 LearningRate 0.0223 Epoch: 15 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:23,703-Speed 18514.47 samples/sec Loss 5.3544 LearningRate 0.0223 Epoch: 15 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:28,148-Speed 18435.14 samples/sec Loss 5.3307 LearningRate 0.0223 Epoch: 15 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:32,641-Speed 18235.60 samples/sec Loss 5.3194 LearningRate 0.0222 Epoch: 15 Global Step: 81680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:37,104-Speed 18358.80 samples/sec Loss 5.3209 LearningRate 0.0222 Epoch: 15 Global Step: 81690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:41,513-Speed 18588.56 samples/sec Loss 5.3473 LearningRate 0.0222 Epoch: 15 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:07:45,903-Speed 18665.38 samples/sec Loss 5.3194 LearningRate 0.0222 Epoch: 15 Global Step: 81710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:07:50,364-Speed 18365.76 samples/sec Loss 5.3077 LearningRate 0.0222 Epoch: 15 Global Step: 81720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:07:54,769-Speed 18603.44 samples/sec Loss 5.3480 LearningRate 0.0221 Epoch: 15 Global Step: 81730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:07:59,150-Speed 18704.22 samples/sec Loss 5.2824 LearningRate 0.0221 Epoch: 15 Global Step: 81740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:03,582-Speed 18489.32 samples/sec Loss 5.3264 LearningRate 0.0221 Epoch: 15 Global Step: 81750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:07,965-Speed 18697.24 samples/sec Loss 5.3559 LearningRate 0.0221 Epoch: 15 Global Step: 81760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:12,401-Speed 18471.81 samples/sec Loss 5.3084 LearningRate 0.0221 Epoch: 15 Global Step: 81770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:16,819-Speed 18544.56 samples/sec Loss 5.3313 LearningRate 0.0220 Epoch: 15 Global Step: 81780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:21,223-Speed 18604.46 samples/sec Loss 5.3563 LearningRate 0.0220 Epoch: 15 Global Step: 81790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:25,628-Speed 18604.55 samples/sec Loss 5.2970 LearningRate 0.0220 Epoch: 15 Global Step: 81800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:30,066-Speed 18467.55 samples/sec Loss 5.3351 LearningRate 0.0220 Epoch: 15 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:08:34,560-Speed 18231.35 samples/sec Loss 5.3313 LearningRate 0.0220 Epoch: 15 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:08:38,969-Speed 18584.38 samples/sec Loss 5.3344 LearningRate 0.0219 Epoch: 15 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:08:43,356-Speed 18682.21 samples/sec Loss 5.3386 LearningRate 0.0219 Epoch: 15 Global Step: 81840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:47,762-Speed 18607.34 samples/sec Loss 5.3279 LearningRate 0.0219 Epoch: 15 Global Step: 81850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:52,152-Speed 18666.65 samples/sec Loss 5.3247 LearningRate 0.0219 Epoch: 15 Global Step: 81860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:08:56,540-Speed 18675.60 samples/sec Loss 5.2879 LearningRate 0.0219 Epoch: 15 Global Step: 81870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:01,018-Speed 18297.17 samples/sec Loss 5.3295 LearningRate 0.0218 Epoch: 15 Global Step: 81880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:05,437-Speed 18545.33 samples/sec Loss 5.3171 LearningRate 0.0218 Epoch: 15 Global Step: 81890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:09,846-Speed 18595.87 samples/sec Loss 5.2931 LearningRate 0.0218 Epoch: 15 Global Step: 81900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:14,317-Speed 18329.25 samples/sec Loss 5.3120 LearningRate 0.0218 Epoch: 15 Global Step: 81910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:18,747-Speed 18502.89 samples/sec Loss 5.2941 LearningRate 0.0218 Epoch: 15 Global Step: 81920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:24,408-Speed 14476.27 samples/sec Loss 5.3130 LearningRate 0.0217 Epoch: 15 Global Step: 81930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:28,870-Speed 18364.58 samples/sec Loss 5.3181 LearningRate 0.0217 Epoch: 15 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:09:33,287-Speed 18549.47 samples/sec Loss 5.3086 LearningRate 0.0217 Epoch: 15 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:09:37,704-Speed 18550.79 samples/sec Loss 5.3128 LearningRate 0.0217 Epoch: 15 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:09:42,129-Speed 18519.94 samples/sec Loss 5.3233 LearningRate 0.0217 Epoch: 15 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:09:46,516-Speed 18682.21 samples/sec Loss 5.2872 LearningRate 0.0216 Epoch: 15 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:09:50,943-Speed 18507.72 samples/sec Loss 5.2839 LearningRate 0.0216 Epoch: 15 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:09:55,362-Speed 18543.66 samples/sec Loss 5.3206 LearningRate 0.0216 Epoch: 15 Global Step: 82000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:09:59,789-Speed 18510.53 samples/sec Loss 5.3074 LearningRate 0.0216 Epoch: 15 Global Step: 82010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:10:04,230-Speed 18450.48 samples/sec Loss 5.3032 LearningRate 0.0216 Epoch: 15 Global Step: 82020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:10:08,666-Speed 18472.54 samples/sec Loss 5.2791 LearningRate 0.0215 Epoch: 15 Global Step: 82030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:13,116-Speed 18412.97 samples/sec Loss 5.2977 LearningRate 0.0215 Epoch: 15 Global Step: 82040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:17,586-Speed 18331.64 samples/sec Loss 5.3420 LearningRate 0.0215 Epoch: 15 Global Step: 82050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:22,022-Speed 18472.12 samples/sec Loss 5.3314 LearningRate 0.0215 Epoch: 15 Global Step: 82060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:26,464-Speed 18448.34 samples/sec Loss 5.3018 LearningRate 0.0215 Epoch: 15 Global Step: 82070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:30,880-Speed 18555.43 samples/sec Loss 5.3635 LearningRate 0.0214 Epoch: 15 Global Step: 82080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:35,330-Speed 18411.45 samples/sec Loss 5.2941 LearningRate 0.0214 Epoch: 15 Global Step: 82090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:39,747-Speed 18555.08 samples/sec Loss 5.3380 LearningRate 0.0214 Epoch: 15 Global Step: 82100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:44,147-Speed 18621.67 samples/sec Loss 5.3400 LearningRate 0.0214 Epoch: 15 Global Step: 82110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:48,568-Speed 18535.92 samples/sec Loss 5.2721 LearningRate 0.0214 Epoch: 15 Global Step: 82120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-14 08:10:52,966-Speed 18629.57 samples/sec Loss 5.3197 LearningRate 0.0213 Epoch: 15 Global Step: 82130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:10:57,340-Speed 18732.04 samples/sec Loss 5.2779 LearningRate 0.0213 Epoch: 15 Global Step: 82140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:01,756-Speed 18555.22 samples/sec Loss 5.3326 LearningRate 0.0213 Epoch: 15 Global Step: 82150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:06,263-Speed 18180.39 samples/sec Loss 5.2972 LearningRate 0.0213 Epoch: 15 Global Step: 82160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:10,707-Speed 18437.70 samples/sec Loss 5.2985 LearningRate 0.0213 Epoch: 15 Global Step: 82170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:15,156-Speed 18418.66 samples/sec Loss 5.2909 LearningRate 0.0212 Epoch: 15 Global Step: 82180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:19,559-Speed 18615.97 samples/sec Loss 5.3213 LearningRate 0.0212 Epoch: 15 Global Step: 82190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:23,962-Speed 18617.44 samples/sec Loss 5.3087 LearningRate 0.0212 Epoch: 15 Global Step: 82200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:28,420-Speed 18381.82 samples/sec Loss 5.2952 LearningRate 0.0212 Epoch: 15 Global Step: 82210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:32,848-Speed 18506.81 samples/sec Loss 5.2871 LearningRate 0.0212 Epoch: 15 Global Step: 82220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:11:37,284-Speed 18475.78 samples/sec Loss 5.3365 LearningRate 0.0211 Epoch: 15 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:11:41,740-Speed 18392.05 samples/sec Loss 5.3046 LearningRate 0.0211 Epoch: 15 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:11:46,163-Speed 18525.84 samples/sec Loss 5.2932 LearningRate 0.0211 Epoch: 15 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:11:50,628-Speed 18355.22 samples/sec Loss 5.2929 LearningRate 0.0211 Epoch: 15 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:11:55,048-Speed 18537.12 samples/sec Loss 5.3034 LearningRate 0.0211 Epoch: 15 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:11:59,482-Speed 18481.89 samples/sec Loss 5.3007 LearningRate 0.0210 Epoch: 15 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:12:03,892-Speed 18582.13 samples/sec Loss 5.2989 LearningRate 0.0210 Epoch: 15 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:12:08,306-Speed 18565.95 samples/sec Loss 5.2928 LearningRate 0.0210 Epoch: 15 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:12:12,697-Speed 18661.09 samples/sec Loss 5.3059 LearningRate 0.0210 Epoch: 15 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:12:17,144-Speed 18424.68 samples/sec Loss 5.2932 LearningRate 0.0210 Epoch: 15 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:12:21,584-Speed 18454.65 samples/sec Loss 5.2951 LearningRate 0.0209 Epoch: 15 Global Step: 82330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-14 08:12:26,051-Speed 18345.67 samples/sec Loss 5.2752 LearningRate 0.0209 Epoch: 15 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:12:30,434-Speed 18692.45 samples/sec Loss 5.3341 LearningRate 0.0209 Epoch: 15 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:12:34,824-Speed 18667.69 samples/sec Loss 5.2812 LearningRate 0.0209 Epoch: 15 Global Step: 82360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:12:39,265-Speed 18449.57 samples/sec Loss 5.2856 LearningRate 0.0209 Epoch: 15 Global Step: 82370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:12:43,726-Speed 18367.37 samples/sec Loss 5.2976 LearningRate 0.0208 Epoch: 15 Global Step: 82380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:12:48,190-Speed 18356.82 samples/sec Loss 5.2653 LearningRate 0.0208 Epoch: 15 Global Step: 82390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:12:52,597-Speed 18591.80 samples/sec Loss 5.2866 LearningRate 0.0208 Epoch: 15 Global Step: 82400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:12:57,024-Speed 18513.67 samples/sec Loss 5.2829 LearningRate 0.0208 Epoch: 15 Global Step: 82410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:01,442-Speed 18547.18 samples/sec Loss 5.2934 LearningRate 0.0208 Epoch: 15 Global Step: 82420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:05,847-Speed 18601.37 samples/sec Loss 5.2896 LearningRate 0.0207 Epoch: 15 Global Step: 82430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:10,247-Speed 18631.94 samples/sec Loss 5.2805 LearningRate 0.0207 Epoch: 15 Global Step: 82440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:14,639-Speed 18661.39 samples/sec Loss 5.3117 LearningRate 0.0207 Epoch: 15 Global Step: 82450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:19,043-Speed 18609.55 samples/sec Loss 5.2989 LearningRate 0.0207 Epoch: 15 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:13:23,490-Speed 18424.65 samples/sec Loss 5.2766 LearningRate 0.0207 Epoch: 15 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:13:27,912-Speed 18538.03 samples/sec Loss 5.2827 LearningRate 0.0206 Epoch: 15 Global Step: 82480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:32,328-Speed 18556.44 samples/sec Loss 5.2844 LearningRate 0.0206 Epoch: 15 Global Step: 82490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:36,709-Speed 18706.43 samples/sec Loss 5.3088 LearningRate 0.0206 Epoch: 15 Global Step: 82500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:41,109-Speed 18620.59 samples/sec Loss 5.3153 LearningRate 0.0206 Epoch: 15 Global Step: 82510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:45,627-Speed 18136.65 samples/sec Loss 5.2623 LearningRate 0.0206 Epoch: 15 Global Step: 82520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:50,113-Speed 18268.41 samples/sec Loss 5.2431 LearningRate 0.0205 Epoch: 15 Global Step: 82530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:54,529-Speed 18551.53 samples/sec Loss 5.2509 LearningRate 0.0205 Epoch: 15 Global Step: 82540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:13:58,926-Speed 18637.08 samples/sec Loss 5.3082 LearningRate 0.0205 Epoch: 15 Global Step: 82550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:14:03,331-Speed 18603.28 samples/sec Loss 5.2571 LearningRate 0.0205 Epoch: 15 Global Step: 82560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:14:07,797-Speed 18346.16 samples/sec Loss 5.2833 LearningRate 0.0205 Epoch: 15 Global Step: 82570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:14:12,227-Speed 18498.02 samples/sec Loss 5.2977 LearningRate 0.0205 Epoch: 15 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:16,628-Speed 18618.73 samples/sec Loss 5.2882 LearningRate 0.0204 Epoch: 15 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:21,062-Speed 18481.64 samples/sec Loss 5.2929 LearningRate 0.0204 Epoch: 15 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:25,460-Speed 18632.42 samples/sec Loss 5.2735 LearningRate 0.0204 Epoch: 15 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:29,893-Speed 18485.57 samples/sec Loss 5.2440 LearningRate 0.0204 Epoch: 15 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:34,303-Speed 18581.09 samples/sec Loss 5.2719 LearningRate 0.0204 Epoch: 15 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:38,710-Speed 18592.73 samples/sec Loss 5.2882 LearningRate 0.0203 Epoch: 15 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:43,169-Speed 18374.92 samples/sec Loss 5.3018 LearningRate 0.0203 Epoch: 15 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:47,634-Speed 18353.26 samples/sec Loss 5.2578 LearningRate 0.0203 Epoch: 15 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:52,088-Speed 18395.96 samples/sec Loss 5.2622 LearningRate 0.0203 Epoch: 15 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:14:56,523-Speed 18477.52 samples/sec Loss 5.2391 LearningRate 0.0203 Epoch: 15 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:15:00,949-Speed 18513.18 samples/sec Loss 5.2649 LearningRate 0.0202 Epoch: 15 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:15:05,409-Speed 18370.48 samples/sec Loss 5.2795 LearningRate 0.0202 Epoch: 15 Global Step: 82700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:09,802-Speed 18655.85 samples/sec Loss 5.3012 LearningRate 0.0202 Epoch: 15 Global Step: 82710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:14,204-Speed 18612.91 samples/sec Loss 5.2378 LearningRate 0.0202 Epoch: 15 Global Step: 82720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:18,601-Speed 18637.70 samples/sec Loss 5.3087 LearningRate 0.0202 Epoch: 15 Global Step: 82730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:23,004-Speed 18614.03 samples/sec Loss 5.2426 LearningRate 0.0201 Epoch: 15 Global Step: 82740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:27,453-Speed 18416.37 samples/sec Loss 5.3296 LearningRate 0.0201 Epoch: 15 Global Step: 82750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:31,872-Speed 18541.87 samples/sec Loss 5.2803 LearningRate 0.0201 Epoch: 15 Global Step: 82760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:36,306-Speed 18482.31 samples/sec Loss 5.2969 LearningRate 0.0201 Epoch: 15 Global Step: 82770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:40,723-Speed 18553.61 samples/sec Loss 5.2888 LearningRate 0.0201 Epoch: 15 Global Step: 82780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:48,293-Speed 10822.75 samples/sec Loss 5.2968 LearningRate 0.0200 Epoch: 15 Global Step: 82790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:15:52,695-Speed 18616.61 samples/sec Loss 5.2865 LearningRate 0.0200 Epoch: 15 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:15:57,163-Speed 18341.13 samples/sec Loss 5.2751 LearningRate 0.0200 Epoch: 15 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:01,588-Speed 18521.58 samples/sec Loss 5.2674 LearningRate 0.0200 Epoch: 15 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:06,020-Speed 18488.72 samples/sec Loss 5.2498 LearningRate 0.0200 Epoch: 15 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:10,461-Speed 18451.65 samples/sec Loss 5.2718 LearningRate 0.0200 Epoch: 15 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:14,884-Speed 18525.23 samples/sec Loss 5.2622 LearningRate 0.0199 Epoch: 15 Global Step: 82850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:19,327-Speed 18444.27 samples/sec Loss 5.2917 LearningRate 0.0199 Epoch: 15 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:23,731-Speed 18605.42 samples/sec Loss 5.2491 LearningRate 0.0199 Epoch: 15 Global Step: 82870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:28,130-Speed 18629.51 samples/sec Loss 5.2886 LearningRate 0.0199 Epoch: 15 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:32,541-Speed 18578.05 samples/sec Loss 5.2932 LearningRate 0.0199 Epoch: 15 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:36,907-Speed 18767.38 samples/sec Loss 5.2880 LearningRate 0.0198 Epoch: 15 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:41,353-Speed 18438.16 samples/sec Loss 5.2506 LearningRate 0.0198 Epoch: 15 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:45,754-Speed 18621.18 samples/sec Loss 5.2743 LearningRate 0.0198 Epoch: 15 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:50,169-Speed 18564.13 samples/sec Loss 5.2869 LearningRate 0.0198 Epoch: 15 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:54,591-Speed 18529.63 samples/sec Loss 5.2686 LearningRate 0.0198 Epoch: 15 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:16:58,996-Speed 18601.91 samples/sec Loss 5.2940 LearningRate 0.0197 Epoch: 15 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:17:17,353-Speed 4462.89 samples/sec Loss 5.2838 LearningRate 0.0197 Epoch: 16 Global Step: 82960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:21,732-Speed 18714.96 samples/sec Loss 5.2421 LearningRate 0.0197 Epoch: 16 Global Step: 82970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:26,125-Speed 18653.12 samples/sec Loss 5.2626 LearningRate 0.0197 Epoch: 16 Global Step: 82980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:30,528-Speed 18612.31 samples/sec Loss 5.2828 LearningRate 0.0197 Epoch: 16 Global Step: 82990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:34,954-Speed 18511.25 samples/sec Loss 5.2697 LearningRate 0.0196 Epoch: 16 Global Step: 83000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:39,390-Speed 18475.38 samples/sec Loss 5.2701 LearningRate 0.0196 Epoch: 16 Global Step: 83010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:43,830-Speed 18456.05 samples/sec Loss 5.2224 LearningRate 0.0196 Epoch: 16 Global Step: 83020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:48,232-Speed 18615.73 samples/sec Loss 5.2647 LearningRate 0.0196 Epoch: 16 Global Step: 83030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:52,623-Speed 18661.95 samples/sec Loss 5.2581 LearningRate 0.0196 Epoch: 16 Global Step: 83040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:17:57,034-Speed 18577.27 samples/sec Loss 5.2761 LearningRate 0.0196 Epoch: 16 Global Step: 83050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:01,424-Speed 18666.61 samples/sec Loss 5.2202 LearningRate 0.0195 Epoch: 16 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:18:05,842-Speed 18554.23 samples/sec Loss 5.2265 LearningRate 0.0195 Epoch: 16 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:18:10,237-Speed 18649.84 samples/sec Loss 5.2293 LearningRate 0.0195 Epoch: 16 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:18:14,605-Speed 18765.40 samples/sec Loss 5.2833 LearningRate 0.0195 Epoch: 16 Global Step: 83090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:19,023-Speed 18551.16 samples/sec Loss 5.2481 LearningRate 0.0195 Epoch: 16 Global Step: 83100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:23,377-Speed 18816.47 samples/sec Loss 5.2532 LearningRate 0.0194 Epoch: 16 Global Step: 83110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:27,760-Speed 18699.12 samples/sec Loss 5.2261 LearningRate 0.0194 Epoch: 16 Global Step: 83120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:32,162-Speed 18613.85 samples/sec Loss 5.2149 LearningRate 0.0194 Epoch: 16 Global Step: 83130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:36,573-Speed 18574.04 samples/sec Loss 5.2416 LearningRate 0.0194 Epoch: 16 Global Step: 83140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:40,976-Speed 18612.25 samples/sec Loss 5.2480 LearningRate 0.0194 Epoch: 16 Global Step: 83150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:45,378-Speed 18617.81 samples/sec Loss 5.2523 LearningRate 0.0193 Epoch: 16 Global Step: 83160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:49,802-Speed 18520.42 samples/sec Loss 5.2650 LearningRate 0.0193 Epoch: 16 Global Step: 83170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:54,179-Speed 18722.42 samples/sec Loss 5.2411 LearningRate 0.0193 Epoch: 16 Global Step: 83180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:18:58,570-Speed 18662.19 samples/sec Loss 5.2489 LearningRate 0.0193 Epoch: 16 Global Step: 83190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:19:03,016-Speed 18435.24 samples/sec Loss 5.2536 LearningRate 0.0193 Epoch: 16 Global Step: 83200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:07,493-Speed 18309.44 samples/sec Loss 5.2671 LearningRate 0.0192 Epoch: 16 Global Step: 83210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:11,947-Speed 18394.01 samples/sec Loss 5.2715 LearningRate 0.0192 Epoch: 16 Global Step: 83220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:16,403-Speed 18390.34 samples/sec Loss 5.2817 LearningRate 0.0192 Epoch: 16 Global Step: 83230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:20,801-Speed 18632.23 samples/sec Loss 5.2489 LearningRate 0.0192 Epoch: 16 Global Step: 83240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:25,216-Speed 18560.02 samples/sec Loss 5.2340 LearningRate 0.0192 Epoch: 16 Global Step: 83250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:29,665-Speed 18414.98 samples/sec Loss 5.2388 LearningRate 0.0192 Epoch: 16 Global Step: 83260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:34,084-Speed 18544.37 samples/sec Loss 5.2422 LearningRate 0.0191 Epoch: 16 Global Step: 83270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:38,485-Speed 18619.85 samples/sec Loss 5.2444 LearningRate 0.0191 Epoch: 16 Global Step: 83280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:42,919-Speed 18480.04 samples/sec Loss 5.2387 LearningRate 0.0191 Epoch: 16 Global Step: 83290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:47,372-Speed 18402.58 samples/sec Loss 5.2643 LearningRate 0.0191 Epoch: 16 Global Step: 83300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:51,786-Speed 18564.23 samples/sec Loss 5.2715 LearningRate 0.0191 Epoch: 16 Global Step: 83310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:19:56,214-Speed 18505.77 samples/sec Loss 5.2527 LearningRate 0.0190 Epoch: 16 Global Step: 83320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:00,665-Speed 18409.42 samples/sec Loss 5.2712 LearningRate 0.0190 Epoch: 16 Global Step: 83330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:05,141-Speed 18304.94 samples/sec Loss 5.2545 LearningRate 0.0190 Epoch: 16 Global Step: 83340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:09,559-Speed 18546.68 samples/sec Loss 5.2322 LearningRate 0.0190 Epoch: 16 Global Step: 83350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:13,965-Speed 18596.89 samples/sec Loss 5.2573 LearningRate 0.0190 Epoch: 16 Global Step: 83360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:18,344-Speed 18710.86 samples/sec Loss 5.2382 LearningRate 0.0189 Epoch: 16 Global Step: 83370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:22,752-Speed 18588.17 samples/sec Loss 5.2481 LearningRate 0.0189 Epoch: 16 Global Step: 83380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:27,171-Speed 18542.49 samples/sec Loss 5.2359 LearningRate 0.0189 Epoch: 16 Global Step: 83390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:31,594-Speed 18525.17 samples/sec Loss 5.2087 LearningRate 0.0189 Epoch: 16 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:20:36,083-Speed 18252.98 samples/sec Loss 5.2708 LearningRate 0.0189 Epoch: 16 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:20:40,518-Speed 18477.22 samples/sec Loss 5.2025 LearningRate 0.0189 Epoch: 16 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:20:44,962-Speed 18438.82 samples/sec Loss 5.2689 LearningRate 0.0188 Epoch: 16 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:20:49,410-Speed 18419.83 samples/sec Loss 5.2699 LearningRate 0.0188 Epoch: 16 Global Step: 83440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:53,833-Speed 18528.67 samples/sec Loss 5.2324 LearningRate 0.0188 Epoch: 16 Global Step: 83450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:20:58,257-Speed 18520.91 samples/sec Loss 5.2511 LearningRate 0.0188 Epoch: 16 Global Step: 83460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:02,671-Speed 18562.30 samples/sec Loss 5.2074 LearningRate 0.0188 Epoch: 16 Global Step: 83470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:07,112-Speed 18452.93 samples/sec Loss 5.2606 LearningRate 0.0187 Epoch: 16 Global Step: 83480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:11,552-Speed 18453.74 samples/sec Loss 5.2211 LearningRate 0.0187 Epoch: 16 Global Step: 83490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:16,045-Speed 18239.77 samples/sec Loss 5.2364 LearningRate 0.0187 Epoch: 16 Global Step: 83500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:20,456-Speed 18574.87 samples/sec Loss 5.2416 LearningRate 0.0187 Epoch: 16 Global Step: 83510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:24,881-Speed 18519.77 samples/sec Loss 5.2329 LearningRate 0.0187 Epoch: 16 Global Step: 83520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:29,286-Speed 18598.81 samples/sec Loss 5.2398 LearningRate 0.0187 Epoch: 16 Global Step: 83530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:33,715-Speed 18503.98 samples/sec Loss 5.2272 LearningRate 0.0186 Epoch: 16 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:21:38,112-Speed 18635.07 samples/sec Loss 5.2683 LearningRate 0.0186 Epoch: 16 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-14 08:21:42,545-Speed 18481.49 samples/sec Loss 5.2372 LearningRate 0.0186 Epoch: 16 Global Step: 83560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:46,978-Speed 18488.41 samples/sec Loss 5.2295 LearningRate 0.0186 Epoch: 16 Global Step: 83570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:51,414-Speed 18476.17 samples/sec Loss 5.2283 LearningRate 0.0186 Epoch: 16 Global Step: 83580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:21:55,802-Speed 18679.72 samples/sec Loss 5.2501 LearningRate 0.0185 Epoch: 16 Global Step: 83590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-14 08:22:00,178-Speed 18723.33 samples/sec Loss 5.1906 LearningRate 0.0185 Epoch: 16 Global Step: 83600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:22:04,610-Speed 18487.93 samples/sec Loss 5.2290 LearningRate 0.0185 Epoch: 16 Global Step: 83610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:22:09,063-Speed 18402.50 samples/sec Loss 5.2352 LearningRate 0.0185 Epoch: 16 Global Step: 83620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:22:20,930-Speed 6904.72 samples/sec Loss 5.1868 LearningRate 0.0185 Epoch: 16 Global Step: 83630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:22:25,338-Speed 18593.22 samples/sec Loss 5.2020 LearningRate 0.0184 Epoch: 16 Global Step: 83640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:22:29,768-Speed 18495.26 samples/sec Loss 5.2466 LearningRate 0.0184 Epoch: 16 Global Step: 83650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:22:34,184-Speed 18557.07 samples/sec Loss 5.2140 LearningRate 0.0184 Epoch: 16 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:22:38,596-Speed 18573.47 samples/sec Loss 5.2333 LearningRate 0.0184 Epoch: 16 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:22:43,025-Speed 18501.83 samples/sec Loss 5.2284 LearningRate 0.0184 Epoch: 16 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:22:47,540-Speed 18149.46 samples/sec Loss 5.2069 LearningRate 0.0184 Epoch: 16 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:22:51,991-Speed 18405.07 samples/sec Loss 5.2301 LearningRate 0.0183 Epoch: 16 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:22:56,426-Speed 18478.82 samples/sec Loss 5.2547 LearningRate 0.0183 Epoch: 16 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:23:00,830-Speed 18607.10 samples/sec Loss 5.2536 LearningRate 0.0183 Epoch: 16 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:23:05,292-Speed 18364.21 samples/sec Loss 5.1639 LearningRate 0.0183 Epoch: 16 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:23:09,748-Speed 18391.22 samples/sec Loss 5.2299 LearningRate 0.0183 Epoch: 16 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:23:14,183-Speed 18477.15 samples/sec Loss 5.2326 LearningRate 0.0182 Epoch: 16 Global Step: 83750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:18,669-Speed 18266.37 samples/sec Loss 5.2341 LearningRate 0.0182 Epoch: 16 Global Step: 83760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:23,109-Speed 18456.55 samples/sec Loss 5.2158 LearningRate 0.0182 Epoch: 16 Global Step: 83770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:27,574-Speed 18351.75 samples/sec Loss 5.2120 LearningRate 0.0182 Epoch: 16 Global Step: 83780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:32,002-Speed 18507.34 samples/sec Loss 5.2047 LearningRate 0.0182 Epoch: 16 Global Step: 83790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:36,403-Speed 18624.82 samples/sec Loss 5.2600 LearningRate 0.0182 Epoch: 16 Global Step: 83800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:40,836-Speed 18488.87 samples/sec Loss 5.2192 LearningRate 0.0181 Epoch: 16 Global Step: 83810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:45,263-Speed 18521.44 samples/sec Loss 5.2041 LearningRate 0.0181 Epoch: 16 Global Step: 83820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:49,742-Speed 18295.69 samples/sec Loss 5.2326 LearningRate 0.0181 Epoch: 16 Global Step: 83830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:54,165-Speed 18535.92 samples/sec Loss 5.1883 LearningRate 0.0181 Epoch: 16 Global Step: 83840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:23:58,605-Speed 18452.99 samples/sec Loss 5.2142 LearningRate 0.0181 Epoch: 16 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:24:03,041-Speed 18476.47 samples/sec Loss 5.2587 LearningRate 0.0180 Epoch: 16 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:24:07,471-Speed 18495.35 samples/sec Loss 5.2257 LearningRate 0.0180 Epoch: 16 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:24:11,941-Speed 18328.25 samples/sec Loss 5.1982 LearningRate 0.0180 Epoch: 16 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:24:16,391-Speed 18415.66 samples/sec Loss 5.1826 LearningRate 0.0180 Epoch: 16 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:24:20,807-Speed 18555.59 samples/sec Loss 5.1977 LearningRate 0.0180 Epoch: 16 Global Step: 83900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:25,223-Speed 18560.30 samples/sec Loss 5.2532 LearningRate 0.0180 Epoch: 16 Global Step: 83910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:29,631-Speed 18590.31 samples/sec Loss 5.2191 LearningRate 0.0179 Epoch: 16 Global Step: 83920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:34,073-Speed 18446.26 samples/sec Loss 5.1732 LearningRate 0.0179 Epoch: 16 Global Step: 83930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:38,534-Speed 18368.78 samples/sec Loss 5.2320 LearningRate 0.0179 Epoch: 16 Global Step: 83940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:42,983-Speed 18417.14 samples/sec Loss 5.2182 LearningRate 0.0179 Epoch: 16 Global Step: 83950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:47,461-Speed 18298.87 samples/sec Loss 5.2088 LearningRate 0.0179 Epoch: 16 Global Step: 83960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:51,899-Speed 18465.46 samples/sec Loss 5.1531 LearningRate 0.0178 Epoch: 16 Global Step: 83970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:24:56,336-Speed 18471.51 samples/sec Loss 5.2075 LearningRate 0.0178 Epoch: 16 Global Step: 83980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:00,753-Speed 18548.33 samples/sec Loss 5.2038 LearningRate 0.0178 Epoch: 16 Global Step: 83990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:05,176-Speed 18532.18 samples/sec Loss 5.2295 LearningRate 0.0178 Epoch: 16 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:25:09,614-Speed 18458.73 samples/sec Loss 5.2030 LearningRate 0.0178 Epoch: 16 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:25:14,048-Speed 18483.11 samples/sec Loss 5.2056 LearningRate 0.0178 Epoch: 16 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:25:18,493-Speed 18432.67 samples/sec Loss 5.2348 LearningRate 0.0177 Epoch: 16 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:25:22,898-Speed 18605.96 samples/sec Loss 5.1944 LearningRate 0.0177 Epoch: 16 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:25:27,333-Speed 18475.25 samples/sec Loss 5.2194 LearningRate 0.0177 Epoch: 16 Global Step: 84050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:31,723-Speed 18666.64 samples/sec Loss 5.1937 LearningRate 0.0177 Epoch: 16 Global Step: 84060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:36,148-Speed 18518.95 samples/sec Loss 5.2128 LearningRate 0.0177 Epoch: 16 Global Step: 84070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:40,574-Speed 18511.64 samples/sec Loss 5.2442 LearningRate 0.0176 Epoch: 16 Global Step: 84080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:45,019-Speed 18435.26 samples/sec Loss 5.2146 LearningRate 0.0176 Epoch: 16 Global Step: 84090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:49,461-Speed 18445.01 samples/sec Loss 5.1764 LearningRate 0.0176 Epoch: 16 Global Step: 84100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:53,912-Speed 18413.63 samples/sec Loss 5.1969 LearningRate 0.0176 Epoch: 16 Global Step: 84110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:25:58,349-Speed 18470.07 samples/sec Loss 5.2693 LearningRate 0.0176 Epoch: 16 Global Step: 84120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:26:02,823-Speed 18315.32 samples/sec Loss 5.2378 LearningRate 0.0176 Epoch: 16 Global Step: 84130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:26:07,236-Speed 18564.86 samples/sec Loss 5.2100 LearningRate 0.0175 Epoch: 16 Global Step: 84140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:11,671-Speed 18478.57 samples/sec Loss 5.2066 LearningRate 0.0175 Epoch: 16 Global Step: 84150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:16,060-Speed 18670.37 samples/sec Loss 5.2278 LearningRate 0.0175 Epoch: 16 Global Step: 84160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:20,505-Speed 18433.79 samples/sec Loss 5.1783 LearningRate 0.0175 Epoch: 16 Global Step: 84170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:24,950-Speed 18440.87 samples/sec Loss 5.1778 LearningRate 0.0175 Epoch: 16 Global Step: 84180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:29,363-Speed 18568.07 samples/sec Loss 5.2051 LearningRate 0.0175 Epoch: 16 Global Step: 84190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:33,792-Speed 18503.34 samples/sec Loss 5.1883 LearningRate 0.0174 Epoch: 16 Global Step: 84200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:38,233-Speed 18448.96 samples/sec Loss 5.2357 LearningRate 0.0174 Epoch: 16 Global Step: 84210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:42,688-Speed 18391.70 samples/sec Loss 5.1729 LearningRate 0.0174 Epoch: 16 Global Step: 84220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:47,097-Speed 18589.63 samples/sec Loss 5.2071 LearningRate 0.0174 Epoch: 16 Global Step: 84230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:26:51,531-Speed 18485.41 samples/sec Loss 5.1925 LearningRate 0.0174 Epoch: 16 Global Step: 84240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:26:55,946-Speed 18564.43 samples/sec Loss 5.1872 LearningRate 0.0173 Epoch: 16 Global Step: 84250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:00,438-Speed 18242.18 samples/sec Loss 5.1877 LearningRate 0.0173 Epoch: 16 Global Step: 84260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:04,885-Speed 18428.03 samples/sec Loss 5.1619 LearningRate 0.0173 Epoch: 16 Global Step: 84270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:09,277-Speed 18656.24 samples/sec Loss 5.2078 LearningRate 0.0173 Epoch: 16 Global Step: 84280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:13,682-Speed 18606.18 samples/sec Loss 5.1986 LearningRate 0.0173 Epoch: 16 Global Step: 84290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:18,086-Speed 18608.53 samples/sec Loss 5.2213 LearningRate 0.0173 Epoch: 16 Global Step: 84300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:22,521-Speed 18477.28 samples/sec Loss 5.1999 LearningRate 0.0172 Epoch: 16 Global Step: 84310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:26,950-Speed 18509.23 samples/sec Loss 5.1899 LearningRate 0.0172 Epoch: 16 Global Step: 84320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:31,358-Speed 18588.28 samples/sec Loss 5.1984 LearningRate 0.0172 Epoch: 16 Global Step: 84330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:27:35,789-Speed 18492.21 samples/sec Loss 5.2082 LearningRate 0.0172 Epoch: 16 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:27:40,209-Speed 18542.11 samples/sec Loss 5.2138 LearningRate 0.0172 Epoch: 16 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:27:44,624-Speed 18565.29 samples/sec Loss 5.2001 LearningRate 0.0171 Epoch: 16 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:27:49,015-Speed 18658.65 samples/sec Loss 5.1987 LearningRate 0.0171 Epoch: 16 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:27:53,427-Speed 18577.23 samples/sec Loss 5.2170 LearningRate 0.0171 Epoch: 16 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:27:57,858-Speed 18491.23 samples/sec Loss 5.1943 LearningRate 0.0171 Epoch: 16 Global Step: 84390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:28:02,272-Speed 18565.95 samples/sec Loss 5.2115 LearningRate 0.0171 Epoch: 16 Global Step: 84400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:28:06,682-Speed 18578.55 samples/sec Loss 5.1645 LearningRate 0.0171 Epoch: 16 Global Step: 84410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:11,096-Speed 18563.81 samples/sec Loss 5.2054 LearningRate 0.0170 Epoch: 16 Global Step: 84420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:15,575-Speed 18294.32 samples/sec Loss 5.2326 LearningRate 0.0170 Epoch: 16 Global Step: 84430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:20,022-Speed 18428.58 samples/sec Loss 5.1866 LearningRate 0.0170 Epoch: 16 Global Step: 84440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:24,423-Speed 18619.69 samples/sec Loss 5.2037 LearningRate 0.0170 Epoch: 16 Global Step: 84450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:28,814-Speed 18657.71 samples/sec Loss 5.2240 LearningRate 0.0170 Epoch: 16 Global Step: 84460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:33,256-Speed 18451.58 samples/sec Loss 5.1935 LearningRate 0.0170 Epoch: 16 Global Step: 84470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:37,675-Speed 18540.70 samples/sec Loss 5.2124 LearningRate 0.0169 Epoch: 16 Global Step: 84480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:42,166-Speed 18244.84 samples/sec Loss 5.1734 LearningRate 0.0169 Epoch: 16 Global Step: 84490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:46,643-Speed 18303.35 samples/sec Loss 5.1680 LearningRate 0.0169 Epoch: 16 Global Step: 84500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:28:51,065-Speed 18532.48 samples/sec Loss 5.2083 LearningRate 0.0169 Epoch: 16 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:28:55,499-Speed 18479.66 samples/sec Loss 5.1884 LearningRate 0.0169 Epoch: 16 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:28:59,896-Speed 18639.37 samples/sec Loss 5.1966 LearningRate 0.0168 Epoch: 16 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:29:04,306-Speed 18581.43 samples/sec Loss 5.1859 LearningRate 0.0168 Epoch: 16 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:29:08,738-Speed 18488.14 samples/sec Loss 5.1830 LearningRate 0.0168 Epoch: 16 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:29:13,197-Speed 18377.52 samples/sec Loss 5.2267 LearningRate 0.0168 Epoch: 16 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:29:17,657-Speed 18374.20 samples/sec Loss 5.1946 LearningRate 0.0168 Epoch: 16 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:29:22,059-Speed 18611.39 samples/sec Loss 5.2249 LearningRate 0.0168 Epoch: 16 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:29:26,457-Speed 18633.01 samples/sec Loss 5.1974 LearningRate 0.0167 Epoch: 16 Global Step: 84590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:29:30,891-Speed 18481.34 samples/sec Loss 5.1933 LearningRate 0.0167 Epoch: 16 Global Step: 84600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:29:35,285-Speed 18644.22 samples/sec Loss 5.2141 LearningRate 0.0167 Epoch: 16 Global Step: 84610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:29:39,768-Speed 18281.05 samples/sec Loss 5.1857 LearningRate 0.0167 Epoch: 16 Global Step: 84620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:29:44,182-Speed 18564.75 samples/sec Loss 5.1772 LearningRate 0.0167 Epoch: 16 Global Step: 84630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:29:48,601-Speed 18543.31 samples/sec Loss 5.1854 LearningRate 0.0167 Epoch: 16 Global Step: 84640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:29:53,015-Speed 18565.53 samples/sec Loss 5.1470 LearningRate 0.0166 Epoch: 16 Global Step: 84650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:29:57,396-Speed 18702.18 samples/sec Loss 5.2059 LearningRate 0.0166 Epoch: 16 Global Step: 84660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:01,784-Speed 18677.59 samples/sec Loss 5.1460 LearningRate 0.0166 Epoch: 16 Global Step: 84670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:06,198-Speed 18561.97 samples/sec Loss 5.1422 LearningRate 0.0166 Epoch: 16 Global Step: 84680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:10,597-Speed 18629.58 samples/sec Loss 5.1620 LearningRate 0.0166 Epoch: 16 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:30:14,991-Speed 18650.92 samples/sec Loss 5.2171 LearningRate 0.0165 Epoch: 16 Global Step: 84700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:19,437-Speed 18432.41 samples/sec Loss 5.1722 LearningRate 0.0165 Epoch: 16 Global Step: 84710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:23,811-Speed 18738.07 samples/sec Loss 5.1705 LearningRate 0.0165 Epoch: 16 Global Step: 84720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:28,199-Speed 18679.63 samples/sec Loss 5.1220 LearningRate 0.0165 Epoch: 16 Global Step: 84730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:32,576-Speed 18724.83 samples/sec Loss 5.1505 LearningRate 0.0165 Epoch: 16 Global Step: 84740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:37,012-Speed 18472.46 samples/sec Loss 5.1854 LearningRate 0.0165 Epoch: 16 Global Step: 84750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:41,457-Speed 18434.45 samples/sec Loss 5.1634 LearningRate 0.0164 Epoch: 16 Global Step: 84760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:45,851-Speed 18658.97 samples/sec Loss 5.1824 LearningRate 0.0164 Epoch: 16 Global Step: 84770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:50,282-Speed 18490.95 samples/sec Loss 5.1944 LearningRate 0.0164 Epoch: 16 Global Step: 84780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:54,723-Speed 18450.56 samples/sec Loss 5.1479 LearningRate 0.0164 Epoch: 16 Global Step: 84790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:30:59,150-Speed 18510.08 samples/sec Loss 5.1717 LearningRate 0.0164 Epoch: 16 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:31:03,576-Speed 18517.14 samples/sec Loss 5.1417 LearningRate 0.0164 Epoch: 16 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:31:07,970-Speed 18649.09 samples/sec Loss 5.1730 LearningRate 0.0163 Epoch: 16 Global Step: 84820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:12,367-Speed 18634.40 samples/sec Loss 5.1959 LearningRate 0.0163 Epoch: 16 Global Step: 84830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:16,752-Speed 18686.84 samples/sec Loss 5.1871 LearningRate 0.0163 Epoch: 16 Global Step: 84840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:21,183-Speed 18491.42 samples/sec Loss 5.1713 LearningRate 0.0163 Epoch: 16 Global Step: 84850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:25,629-Speed 18431.36 samples/sec Loss 5.1861 LearningRate 0.0163 Epoch: 16 Global Step: 84860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:30,095-Speed 18348.21 samples/sec Loss 5.1690 LearningRate 0.0163 Epoch: 16 Global Step: 84870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:34,485-Speed 18666.76 samples/sec Loss 5.1798 LearningRate 0.0162 Epoch: 16 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:38,876-Speed 18659.89 samples/sec Loss 5.1376 LearningRate 0.0162 Epoch: 16 Global Step: 84890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:43,286-Speed 18582.21 samples/sec Loss 5.1939 LearningRate 0.0162 Epoch: 16 Global Step: 84900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:47,771-Speed 18278.54 samples/sec Loss 5.1806 LearningRate 0.0162 Epoch: 16 Global Step: 84910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:31:52,218-Speed 18422.14 samples/sec Loss 5.1716 LearningRate 0.0162 Epoch: 16 Global Step: 84920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:31:56,626-Speed 18601.05 samples/sec Loss 5.1565 LearningRate 0.0162 Epoch: 16 Global Step: 84930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:01,020-Speed 18650.89 samples/sec Loss 5.1773 LearningRate 0.0161 Epoch: 16 Global Step: 84940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:05,402-Speed 18700.76 samples/sec Loss 5.1466 LearningRate 0.0161 Epoch: 16 Global Step: 84950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:09,808-Speed 18594.32 samples/sec Loss 5.1629 LearningRate 0.0161 Epoch: 16 Global Step: 84960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:14,263-Speed 18395.69 samples/sec Loss 5.1591 LearningRate 0.0161 Epoch: 16 Global Step: 84970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:18,681-Speed 18549.57 samples/sec Loss 5.1820 LearningRate 0.0161 Epoch: 16 Global Step: 84980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:23,064-Speed 18691.93 samples/sec Loss 5.1514 LearningRate 0.0160 Epoch: 16 Global Step: 84990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:27,458-Speed 18647.94 samples/sec Loss 5.1370 LearningRate 0.0160 Epoch: 16 Global Step: 85000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:31,883-Speed 18518.33 samples/sec Loss 5.1941 LearningRate 0.0160 Epoch: 16 Global Step: 85010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:36,290-Speed 18593.14 samples/sec Loss 5.1905 LearningRate 0.0160 Epoch: 16 Global Step: 85020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:32:45,410-Speed 8982.65 samples/sec Loss 5.1500 LearningRate 0.0160 Epoch: 16 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:32:49,785-Speed 18732.30 samples/sec Loss 5.1830 LearningRate 0.0160 Epoch: 16 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:32:54,201-Speed 18555.77 samples/sec Loss 5.1285 LearningRate 0.0159 Epoch: 16 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:32:58,595-Speed 18647.07 samples/sec Loss 5.1625 LearningRate 0.0159 Epoch: 16 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:03,004-Speed 18584.58 samples/sec Loss 5.1778 LearningRate 0.0159 Epoch: 16 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:07,408-Speed 18605.58 samples/sec Loss 5.1162 LearningRate 0.0159 Epoch: 16 Global Step: 85080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:11,850-Speed 18443.62 samples/sec Loss 5.1898 LearningRate 0.0159 Epoch: 16 Global Step: 85090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:16,232-Speed 18701.38 samples/sec Loss 5.1403 LearningRate 0.0159 Epoch: 16 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:20,624-Speed 18655.93 samples/sec Loss 5.1557 LearningRate 0.0158 Epoch: 16 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:25,059-Speed 18473.31 samples/sec Loss 5.1342 LearningRate 0.0158 Epoch: 16 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:29,456-Speed 18635.69 samples/sec Loss 5.1912 LearningRate 0.0158 Epoch: 16 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:33:33,907-Speed 18410.67 samples/sec Loss 5.1302 LearningRate 0.0158 Epoch: 16 Global Step: 85140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:33:38,349-Speed 18445.33 samples/sec Loss 5.1463 LearningRate 0.0158 Epoch: 16 Global Step: 85150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:33:42,828-Speed 18291.11 samples/sec Loss 5.1699 LearningRate 0.0158 Epoch: 16 Global Step: 85160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:33:47,293-Speed 18355.79 samples/sec Loss 5.1807 LearningRate 0.0157 Epoch: 16 Global Step: 85170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:33:51,702-Speed 18581.93 samples/sec Loss 5.1730 LearningRate 0.0157 Epoch: 16 Global Step: 85180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:33:56,160-Speed 18383.69 samples/sec Loss 5.1557 LearningRate 0.0157 Epoch: 16 Global Step: 85190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:00,578-Speed 18548.98 samples/sec Loss 5.1684 LearningRate 0.0157 Epoch: 16 Global Step: 85200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:05,008-Speed 18495.74 samples/sec Loss 5.1642 LearningRate 0.0157 Epoch: 16 Global Step: 85210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:09,400-Speed 18660.22 samples/sec Loss 5.1591 LearningRate 0.0157 Epoch: 16 Global Step: 85220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:13,761-Speed 18790.92 samples/sec Loss 5.1370 LearningRate 0.0156 Epoch: 16 Global Step: 85230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:18,142-Speed 18703.67 samples/sec Loss 5.1177 LearningRate 0.0156 Epoch: 16 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:34:22,571-Speed 18501.89 samples/sec Loss 5.1577 LearningRate 0.0156 Epoch: 16 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:34:26,971-Speed 18623.73 samples/sec Loss 5.1782 LearningRate 0.0156 Epoch: 16 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:34:31,370-Speed 18624.32 samples/sec Loss 5.1464 LearningRate 0.0156 Epoch: 16 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:34:35,765-Speed 18647.65 samples/sec Loss 5.1288 LearningRate 0.0156 Epoch: 16 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:34:40,146-Speed 18706.93 samples/sec Loss 5.1522 LearningRate 0.0155 Epoch: 16 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:34:44,532-Speed 18681.07 samples/sec Loss 5.1690 LearningRate 0.0155 Epoch: 16 Global Step: 85300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:48,913-Speed 18704.42 samples/sec Loss 5.1615 LearningRate 0.0155 Epoch: 16 Global Step: 85310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:53,289-Speed 18726.52 samples/sec Loss 5.1479 LearningRate 0.0155 Epoch: 16 Global Step: 85320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:34:57,685-Speed 18639.81 samples/sec Loss 5.1301 LearningRate 0.0155 Epoch: 16 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:02,084-Speed 18630.77 samples/sec Loss 5.1607 LearningRate 0.0155 Epoch: 16 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:06,465-Speed 18701.16 samples/sec Loss 5.1594 LearningRate 0.0154 Epoch: 16 Global Step: 85350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:10,874-Speed 18583.52 samples/sec Loss 5.1806 LearningRate 0.0154 Epoch: 16 Global Step: 85360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:15,287-Speed 18572.55 samples/sec Loss 5.1460 LearningRate 0.0154 Epoch: 16 Global Step: 85370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:19,723-Speed 18476.31 samples/sec Loss 5.1622 LearningRate 0.0154 Epoch: 16 Global Step: 85380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:24,155-Speed 18489.94 samples/sec Loss 5.1837 LearningRate 0.0154 Epoch: 16 Global Step: 85390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:28,544-Speed 18667.19 samples/sec Loss 5.1561 LearningRate 0.0154 Epoch: 16 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:35:32,953-Speed 18587.63 samples/sec Loss 5.1482 LearningRate 0.0153 Epoch: 16 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:35:37,343-Speed 18663.13 samples/sec Loss 5.1416 LearningRate 0.0153 Epoch: 16 Global Step: 85420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:41,770-Speed 18511.62 samples/sec Loss 5.1480 LearningRate 0.0153 Epoch: 16 Global Step: 85430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:46,170-Speed 18623.16 samples/sec Loss 5.1583 LearningRate 0.0153 Epoch: 16 Global Step: 85440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:50,648-Speed 18305.77 samples/sec Loss 5.1530 LearningRate 0.0153 Epoch: 16 Global Step: 85450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:55,068-Speed 18544.93 samples/sec Loss 5.1408 LearningRate 0.0153 Epoch: 16 Global Step: 85460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:35:59,457-Speed 18668.30 samples/sec Loss 5.1157 LearningRate 0.0152 Epoch: 16 Global Step: 85470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:03,899-Speed 18451.54 samples/sec Loss 5.1231 LearningRate 0.0152 Epoch: 16 Global Step: 85480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:08,289-Speed 18667.92 samples/sec Loss 5.1669 LearningRate 0.0152 Epoch: 16 Global Step: 85490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:12,776-Speed 18258.78 samples/sec Loss 5.1292 LearningRate 0.0152 Epoch: 16 Global Step: 85500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:17,155-Speed 18715.31 samples/sec Loss 5.1497 LearningRate 0.0152 Epoch: 16 Global Step: 85510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:21,522-Speed 18759.24 samples/sec Loss 5.1385 LearningRate 0.0152 Epoch: 16 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:36:25,925-Speed 18611.98 samples/sec Loss 5.1505 LearningRate 0.0151 Epoch: 16 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:36:30,407-Speed 18281.52 samples/sec Loss 5.2003 LearningRate 0.0151 Epoch: 16 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:36:34,816-Speed 18584.52 samples/sec Loss 5.1487 LearningRate 0.0151 Epoch: 16 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:36:39,230-Speed 18564.50 samples/sec Loss 5.1619 LearningRate 0.0151 Epoch: 16 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:36:43,657-Speed 18509.46 samples/sec Loss 5.1467 LearningRate 0.0151 Epoch: 16 Global Step: 85570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:48,156-Speed 18211.70 samples/sec Loss 5.1151 LearningRate 0.0151 Epoch: 16 Global Step: 85580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:52,559-Speed 18608.91 samples/sec Loss 5.1449 LearningRate 0.0150 Epoch: 16 Global Step: 85590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:36:56,982-Speed 18531.86 samples/sec Loss 5.1544 LearningRate 0.0150 Epoch: 16 Global Step: 85600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:37:01,390-Speed 18586.18 samples/sec Loss 5.0978 LearningRate 0.0150 Epoch: 16 Global Step: 85610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:37:05,899-Speed 18175.76 samples/sec Loss 5.1368 LearningRate 0.0150 Epoch: 16 Global Step: 85620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:37:10,425-Speed 18104.62 samples/sec Loss 5.1595 LearningRate 0.0150 Epoch: 16 Global Step: 85630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:37:14,818-Speed 18648.88 samples/sec Loss 5.0984 LearningRate 0.0150 Epoch: 16 Global Step: 85640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:37:19,202-Speed 18693.83 samples/sec Loss 5.1099 LearningRate 0.0149 Epoch: 16 Global Step: 85650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:37:23,639-Speed 18466.69 samples/sec Loss 5.1557 LearningRate 0.0149 Epoch: 16 Global Step: 85660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:37:28,111-Speed 18328.22 samples/sec Loss 5.1263 LearningRate 0.0149 Epoch: 16 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:37:32,524-Speed 18577.38 samples/sec Loss 5.1670 LearningRate 0.0149 Epoch: 16 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:37:36,973-Speed 18420.89 samples/sec Loss 5.1442 LearningRate 0.0149 Epoch: 16 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:37:41,384-Speed 18575.11 samples/sec Loss 5.1316 LearningRate 0.0149 Epoch: 16 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:37:45,782-Speed 18639.35 samples/sec Loss 5.1530 LearningRate 0.0148 Epoch: 16 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:37:50,212-Speed 18497.96 samples/sec Loss 5.0610 LearningRate 0.0148 Epoch: 16 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:37:54,625-Speed 18568.27 samples/sec Loss 5.1336 LearningRate 0.0148 Epoch: 16 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:37:59,024-Speed 18623.51 samples/sec Loss 5.1838 LearningRate 0.0148 Epoch: 16 Global Step: 85740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:03,446-Speed 18532.20 samples/sec Loss 5.1608 LearningRate 0.0148 Epoch: 16 Global Step: 85750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:07,850-Speed 18606.22 samples/sec Loss 5.1584 LearningRate 0.0148 Epoch: 16 Global Step: 85760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:12,293-Speed 18439.55 samples/sec Loss 5.1331 LearningRate 0.0147 Epoch: 16 Global Step: 85770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:16,697-Speed 18606.01 samples/sec Loss 5.1333 LearningRate 0.0147 Epoch: 16 Global Step: 85780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:21,124-Speed 18513.28 samples/sec Loss 5.1175 LearningRate 0.0147 Epoch: 16 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:25,530-Speed 18598.28 samples/sec Loss 5.1300 LearningRate 0.0147 Epoch: 16 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:29,958-Speed 18505.13 samples/sec Loss 5.1085 LearningRate 0.0147 Epoch: 16 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:34,346-Speed 18675.90 samples/sec Loss 5.1466 LearningRate 0.0147 Epoch: 16 Global Step: 85820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:38,725-Speed 18710.54 samples/sec Loss 5.1401 LearningRate 0.0146 Epoch: 16 Global Step: 85830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:43,114-Speed 18667.18 samples/sec Loss 5.1388 LearningRate 0.0146 Epoch: 16 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:38:47,549-Speed 18478.80 samples/sec Loss 5.1550 LearningRate 0.0146 Epoch: 16 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:38:51,972-Speed 18523.64 samples/sec Loss 5.1351 LearningRate 0.0146 Epoch: 16 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:38:56,390-Speed 18546.33 samples/sec Loss 5.1122 LearningRate 0.0146 Epoch: 16 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:00,818-Speed 18503.54 samples/sec Loss 5.1627 LearningRate 0.0146 Epoch: 16 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:05,235-Speed 18551.96 samples/sec Loss 5.1207 LearningRate 0.0145 Epoch: 16 Global Step: 85890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:09,633-Speed 18630.92 samples/sec Loss 5.1247 LearningRate 0.0145 Epoch: 16 Global Step: 85900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:14,101-Speed 18337.50 samples/sec Loss 5.1000 LearningRate 0.0145 Epoch: 16 Global Step: 85910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:18,555-Speed 18401.50 samples/sec Loss 5.1460 LearningRate 0.0145 Epoch: 16 Global Step: 85920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:23,003-Speed 18418.76 samples/sec Loss 5.1469 LearningRate 0.0145 Epoch: 16 Global Step: 85930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:27,440-Speed 18472.29 samples/sec Loss 5.1309 LearningRate 0.0145 Epoch: 16 Global Step: 85940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:31,884-Speed 18438.49 samples/sec Loss 5.1245 LearningRate 0.0144 Epoch: 16 Global Step: 85950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:36,324-Speed 18454.09 samples/sec Loss 5.1368 LearningRate 0.0144 Epoch: 16 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:39:40,756-Speed 18487.15 samples/sec Loss 5.1330 LearningRate 0.0144 Epoch: 16 Global Step: 85970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:45,170-Speed 18568.09 samples/sec Loss 5.1262 LearningRate 0.0144 Epoch: 16 Global Step: 85980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:49,584-Speed 18568.42 samples/sec Loss 5.1202 LearningRate 0.0144 Epoch: 16 Global Step: 85990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:53,981-Speed 18641.76 samples/sec Loss 5.1691 LearningRate 0.0144 Epoch: 16 Global Step: 86000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:39:58,429-Speed 18417.96 samples/sec Loss 5.1376 LearningRate 0.0143 Epoch: 16 Global Step: 86010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:02,877-Speed 18426.75 samples/sec Loss 5.1086 LearningRate 0.0143 Epoch: 16 Global Step: 86020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:07,276-Speed 18624.39 samples/sec Loss 5.0832 LearningRate 0.0143 Epoch: 16 Global Step: 86030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:11,691-Speed 18560.70 samples/sec Loss 5.1280 LearningRate 0.0143 Epoch: 16 Global Step: 86040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:16,092-Speed 18619.53 samples/sec Loss 5.1437 LearningRate 0.0143 Epoch: 16 Global Step: 86050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:20,519-Speed 18511.53 samples/sec Loss 5.1437 LearningRate 0.0143 Epoch: 16 Global Step: 86060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:24,909-Speed 18672.13 samples/sec Loss 5.1317 LearningRate 0.0142 Epoch: 16 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:40:29,360-Speed 18408.92 samples/sec Loss 5.1301 LearningRate 0.0142 Epoch: 16 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:40:33,767-Speed 18594.16 samples/sec Loss 5.1305 LearningRate 0.0142 Epoch: 16 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:40:38,191-Speed 18523.47 samples/sec Loss 5.0892 LearningRate 0.0142 Epoch: 16 Global Step: 86100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:42,615-Speed 18533.96 samples/sec Loss 5.1180 LearningRate 0.0142 Epoch: 16 Global Step: 86110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:47,036-Speed 18537.56 samples/sec Loss 5.1076 LearningRate 0.0142 Epoch: 16 Global Step: 86120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:51,512-Speed 18308.04 samples/sec Loss 5.1249 LearningRate 0.0141 Epoch: 16 Global Step: 86130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:40:55,957-Speed 18434.42 samples/sec Loss 5.0878 LearningRate 0.0141 Epoch: 16 Global Step: 86140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:41:00,363-Speed 18598.63 samples/sec Loss 5.1094 LearningRate 0.0141 Epoch: 16 Global Step: 86150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:04,741-Speed 18719.35 samples/sec Loss 5.1344 LearningRate 0.0141 Epoch: 16 Global Step: 86160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:09,162-Speed 18532.74 samples/sec Loss 5.1055 LearningRate 0.0141 Epoch: 16 Global Step: 86170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:13,608-Speed 18431.22 samples/sec Loss 5.1224 LearningRate 0.0141 Epoch: 16 Global Step: 86180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:18,017-Speed 18584.61 samples/sec Loss 5.1243 LearningRate 0.0141 Epoch: 16 Global Step: 86190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:22,472-Speed 18390.08 samples/sec Loss 5.0629 LearningRate 0.0140 Epoch: 16 Global Step: 86200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:26,904-Speed 18490.34 samples/sec Loss 5.0994 LearningRate 0.0140 Epoch: 16 Global Step: 86210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:31,371-Speed 18341.50 samples/sec Loss 5.1004 LearningRate 0.0140 Epoch: 16 Global Step: 86220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:35,853-Speed 18282.88 samples/sec Loss 5.1497 LearningRate 0.0140 Epoch: 16 Global Step: 86230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:40,267-Speed 18562.43 samples/sec Loss 5.1267 LearningRate 0.0140 Epoch: 16 Global Step: 86240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:41:44,639-Speed 18744.27 samples/sec Loss 5.1292 LearningRate 0.0140 Epoch: 16 Global Step: 86250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:41:49,051-Speed 18572.36 samples/sec Loss 5.1204 LearningRate 0.0139 Epoch: 16 Global Step: 86260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:41:53,478-Speed 18510.57 samples/sec Loss 5.1026 LearningRate 0.0139 Epoch: 16 Global Step: 86270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:41:57,897-Speed 18541.32 samples/sec Loss 5.1155 LearningRate 0.0139 Epoch: 16 Global Step: 86280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:02,331-Speed 18479.70 samples/sec Loss 5.1068 LearningRate 0.0139 Epoch: 16 Global Step: 86290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:06,774-Speed 18446.00 samples/sec Loss 5.1128 LearningRate 0.0139 Epoch: 16 Global Step: 86300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:11,141-Speed 18758.63 samples/sec Loss 5.0475 LearningRate 0.0139 Epoch: 16 Global Step: 86310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:15,609-Speed 18338.56 samples/sec Loss 5.0917 LearningRate 0.0138 Epoch: 16 Global Step: 86320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:20,043-Speed 18481.82 samples/sec Loss 5.1085 LearningRate 0.0138 Epoch: 16 Global Step: 86330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:24,498-Speed 18396.04 samples/sec Loss 5.1126 LearningRate 0.0138 Epoch: 16 Global Step: 86340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:28,913-Speed 18558.26 samples/sec Loss 5.1047 LearningRate 0.0138 Epoch: 16 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:42:33,314-Speed 18623.96 samples/sec Loss 5.1095 LearningRate 0.0138 Epoch: 16 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:42:37,692-Speed 18714.61 samples/sec Loss 5.1111 LearningRate 0.0138 Epoch: 16 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:42:42,115-Speed 18525.06 samples/sec Loss 5.0822 LearningRate 0.0137 Epoch: 16 Global Step: 86380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:46,573-Speed 18382.52 samples/sec Loss 5.1349 LearningRate 0.0137 Epoch: 16 Global Step: 86390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:51,047-Speed 18317.87 samples/sec Loss 5.0736 LearningRate 0.0137 Epoch: 16 Global Step: 86400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:55,438-Speed 18682.43 samples/sec Loss 5.1336 LearningRate 0.0137 Epoch: 16 Global Step: 86410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:42:59,842-Speed 18603.44 samples/sec Loss 5.0919 LearningRate 0.0137 Epoch: 16 Global Step: 86420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:04,236-Speed 18649.31 samples/sec Loss 5.0851 LearningRate 0.0137 Epoch: 16 Global Step: 86430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:08,658-Speed 18531.67 samples/sec Loss 5.0931 LearningRate 0.0137 Epoch: 16 Global Step: 86440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:13,034-Speed 18722.83 samples/sec Loss 5.1116 LearningRate 0.0136 Epoch: 16 Global Step: 86450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:17,465-Speed 18493.86 samples/sec Loss 5.0757 LearningRate 0.0136 Epoch: 16 Global Step: 86460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:21,892-Speed 18508.92 samples/sec Loss 5.1425 LearningRate 0.0136 Epoch: 16 Global Step: 86470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:26,301-Speed 18588.87 samples/sec Loss 5.1151 LearningRate 0.0136 Epoch: 16 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:43:30,721-Speed 18541.27 samples/sec Loss 5.1108 LearningRate 0.0136 Epoch: 16 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:43:35,172-Speed 18409.67 samples/sec Loss 5.0962 LearningRate 0.0136 Epoch: 16 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:43:40,432-Speed 15578.73 samples/sec Loss 5.0833 LearningRate 0.0135 Epoch: 16 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:43:44,818-Speed 18683.66 samples/sec Loss 5.1355 LearningRate 0.0135 Epoch: 16 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:43:49,236-Speed 18544.78 samples/sec Loss 5.1034 LearningRate 0.0135 Epoch: 16 Global Step: 86530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:53,642-Speed 18596.96 samples/sec Loss 5.0977 LearningRate 0.0135 Epoch: 16 Global Step: 86540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:43:58,128-Speed 18268.26 samples/sec Loss 5.0883 LearningRate 0.0135 Epoch: 16 Global Step: 86550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:02,536-Speed 18588.03 samples/sec Loss 5.0814 LearningRate 0.0135 Epoch: 16 Global Step: 86560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:07,012-Speed 18307.66 samples/sec Loss 5.0922 LearningRate 0.0134 Epoch: 16 Global Step: 86570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:11,444-Speed 18489.74 samples/sec Loss 5.1166 LearningRate 0.0134 Epoch: 16 Global Step: 86580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:15,860-Speed 18558.48 samples/sec Loss 5.0851 LearningRate 0.0134 Epoch: 16 Global Step: 86590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:20,274-Speed 18562.31 samples/sec Loss 5.1120 LearningRate 0.0134 Epoch: 16 Global Step: 86600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:24,766-Speed 18245.94 samples/sec Loss 5.1107 LearningRate 0.0134 Epoch: 16 Global Step: 86610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:29,165-Speed 18627.34 samples/sec Loss 5.0899 LearningRate 0.0134 Epoch: 16 Global Step: 86620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:33,591-Speed 18516.11 samples/sec Loss 5.1368 LearningRate 0.0134 Epoch: 16 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:44:38,001-Speed 18578.24 samples/sec Loss 5.0944 LearningRate 0.0133 Epoch: 16 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:44:42,403-Speed 18613.53 samples/sec Loss 5.1188 LearningRate 0.0133 Epoch: 16 Global Step: 86650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:46,786-Speed 18695.35 samples/sec Loss 5.0506 LearningRate 0.0133 Epoch: 16 Global Step: 86660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:51,203-Speed 18551.61 samples/sec Loss 5.0923 LearningRate 0.0133 Epoch: 16 Global Step: 86670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:44:55,787-Speed 17873.52 samples/sec Loss 5.0989 LearningRate 0.0133 Epoch: 16 Global Step: 86680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:00,219-Speed 18487.23 samples/sec Loss 5.0352 LearningRate 0.0133 Epoch: 16 Global Step: 86690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:04,659-Speed 18459.07 samples/sec Loss 5.0970 LearningRate 0.0132 Epoch: 16 Global Step: 86700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:09,094-Speed 18473.54 samples/sec Loss 5.0769 LearningRate 0.0132 Epoch: 16 Global Step: 86710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:13,518-Speed 18520.93 samples/sec Loss 5.0573 LearningRate 0.0132 Epoch: 16 Global Step: 86720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:17,963-Speed 18434.74 samples/sec Loss 5.0954 LearningRate 0.0132 Epoch: 16 Global Step: 86730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:22,405-Speed 18448.28 samples/sec Loss 5.0552 LearningRate 0.0132 Epoch: 16 Global Step: 86740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:26,809-Speed 18609.55 samples/sec Loss 5.1036 LearningRate 0.0132 Epoch: 16 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:45:31,216-Speed 18593.38 samples/sec Loss 5.0726 LearningRate 0.0132 Epoch: 16 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:45:35,641-Speed 18517.14 samples/sec Loss 5.0774 LearningRate 0.0131 Epoch: 16 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:45:40,065-Speed 18523.48 samples/sec Loss 5.1147 LearningRate 0.0131 Epoch: 16 Global Step: 86780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:44,471-Speed 18594.60 samples/sec Loss 5.0832 LearningRate 0.0131 Epoch: 16 Global Step: 86790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:48,882-Speed 18585.21 samples/sec Loss 5.0859 LearningRate 0.0131 Epoch: 16 Global Step: 86800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:53,297-Speed 18556.25 samples/sec Loss 5.0609 LearningRate 0.0131 Epoch: 16 Global Step: 86810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:45:57,733-Speed 18475.48 samples/sec Loss 5.0849 LearningRate 0.0131 Epoch: 16 Global Step: 86820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:02,163-Speed 18494.16 samples/sec Loss 5.1106 LearningRate 0.0130 Epoch: 16 Global Step: 86830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:06,616-Speed 18402.01 samples/sec Loss 5.1043 LearningRate 0.0130 Epoch: 16 Global Step: 86840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:11,042-Speed 18518.81 samples/sec Loss 5.0601 LearningRate 0.0130 Epoch: 16 Global Step: 86850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:15,458-Speed 18556.34 samples/sec Loss 5.0879 LearningRate 0.0130 Epoch: 16 Global Step: 86860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:19,886-Speed 18506.67 samples/sec Loss 5.0944 LearningRate 0.0130 Epoch: 16 Global Step: 86870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:24,340-Speed 18400.70 samples/sec Loss 5.0795 LearningRate 0.0130 Epoch: 16 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:46:28,739-Speed 18630.71 samples/sec Loss 5.0533 LearningRate 0.0130 Epoch: 16 Global Step: 86890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:33,199-Speed 18371.76 samples/sec Loss 5.0938 LearningRate 0.0129 Epoch: 16 Global Step: 86900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:37,674-Speed 18312.12 samples/sec Loss 5.0805 LearningRate 0.0129 Epoch: 16 Global Step: 86910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:42,119-Speed 18433.89 samples/sec Loss 5.0748 LearningRate 0.0129 Epoch: 16 Global Step: 86920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:46,578-Speed 18380.33 samples/sec Loss 5.0942 LearningRate 0.0129 Epoch: 16 Global Step: 86930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:51,020-Speed 18445.99 samples/sec Loss 5.1177 LearningRate 0.0129 Epoch: 16 Global Step: 86940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:55,443-Speed 18526.87 samples/sec Loss 5.0451 LearningRate 0.0129 Epoch: 16 Global Step: 86950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:46:59,876-Speed 18484.67 samples/sec Loss 5.1139 LearningRate 0.0128 Epoch: 16 Global Step: 86960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:47:04,273-Speed 18637.14 samples/sec Loss 5.0771 LearningRate 0.0128 Epoch: 16 Global Step: 86970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:47:08,697-Speed 18521.53 samples/sec Loss 5.1065 LearningRate 0.0128 Epoch: 16 Global Step: 86980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:13,122-Speed 18516.04 samples/sec Loss 5.1020 LearningRate 0.0128 Epoch: 16 Global Step: 86990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:17,561-Speed 18458.67 samples/sec Loss 5.0874 LearningRate 0.0128 Epoch: 16 Global Step: 87000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:21,989-Speed 18507.14 samples/sec Loss 5.0971 LearningRate 0.0128 Epoch: 16 Global Step: 87010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:26,402-Speed 18568.55 samples/sec Loss 5.0912 LearningRate 0.0128 Epoch: 16 Global Step: 87020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:30,843-Speed 18450.38 samples/sec Loss 5.0815 LearningRate 0.0127 Epoch: 16 Global Step: 87030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:35,304-Speed 18367.08 samples/sec Loss 5.0934 LearningRate 0.0127 Epoch: 16 Global Step: 87040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:39,691-Speed 18675.82 samples/sec Loss 5.0797 LearningRate 0.0127 Epoch: 16 Global Step: 87050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:44,138-Speed 18424.65 samples/sec Loss 5.1118 LearningRate 0.0127 Epoch: 16 Global Step: 87060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:48,526-Speed 18677.03 samples/sec Loss 5.0856 LearningRate 0.0127 Epoch: 16 Global Step: 87070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:47:52,910-Speed 18691.71 samples/sec Loss 5.0451 LearningRate 0.0127 Epoch: 16 Global Step: 87080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:47:57,330-Speed 18538.73 samples/sec Loss 5.0650 LearningRate 0.0126 Epoch: 16 Global Step: 87090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:01,770-Speed 18458.09 samples/sec Loss 5.0800 LearningRate 0.0126 Epoch: 16 Global Step: 87100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:06,198-Speed 18504.13 samples/sec Loss 5.0459 LearningRate 0.0126 Epoch: 16 Global Step: 87110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:10,623-Speed 18518.54 samples/sec Loss 5.0703 LearningRate 0.0126 Epoch: 16 Global Step: 87120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:15,035-Speed 18572.77 samples/sec Loss 5.1087 LearningRate 0.0126 Epoch: 16 Global Step: 87130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:19,448-Speed 18567.36 samples/sec Loss 5.0699 LearningRate 0.0126 Epoch: 16 Global Step: 87140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:23,899-Speed 18410.11 samples/sec Loss 5.0363 LearningRate 0.0126 Epoch: 16 Global Step: 87150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:28,321-Speed 18533.47 samples/sec Loss 5.0680 LearningRate 0.0125 Epoch: 16 Global Step: 87160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:48:32,725-Speed 18603.59 samples/sec Loss 5.0648 LearningRate 0.0125 Epoch: 16 Global Step: 87170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:48:37,166-Speed 18457.77 samples/sec Loss 5.0427 LearningRate 0.0125 Epoch: 16 Global Step: 87180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:48:41,617-Speed 18408.60 samples/sec Loss 5.0601 LearningRate 0.0125 Epoch: 16 Global Step: 87190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:48:46,107-Speed 18250.51 samples/sec Loss 5.0531 LearningRate 0.0125 Epoch: 16 Global Step: 87200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:48:50,526-Speed 18545.53 samples/sec Loss 5.1086 LearningRate 0.0125 Epoch: 16 Global Step: 87210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:48:54,915-Speed 18672.34 samples/sec Loss 5.0607 LearningRate 0.0124 Epoch: 16 Global Step: 87220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:48:59,353-Speed 18463.53 samples/sec Loss 5.0989 LearningRate 0.0124 Epoch: 16 Global Step: 87230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:49:03,733-Speed 18710.81 samples/sec Loss 5.0926 LearningRate 0.0124 Epoch: 16 Global Step: 87240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:49:08,153-Speed 18537.19 samples/sec Loss 5.0599 LearningRate 0.0124 Epoch: 16 Global Step: 87250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:49:12,585-Speed 18490.94 samples/sec Loss 5.0526 LearningRate 0.0124 Epoch: 16 Global Step: 87260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:49:17,096-Speed 18163.56 samples/sec Loss 5.1099 LearningRate 0.0124 Epoch: 16 Global Step: 87270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:21,546-Speed 18419.10 samples/sec Loss 5.0575 LearningRate 0.0124 Epoch: 16 Global Step: 87280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:25,979-Speed 18490.38 samples/sec Loss 5.0672 LearningRate 0.0123 Epoch: 16 Global Step: 87290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:30,376-Speed 18634.90 samples/sec Loss 5.0731 LearningRate 0.0123 Epoch: 16 Global Step: 87300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:34,807-Speed 18497.88 samples/sec Loss 5.0606 LearningRate 0.0123 Epoch: 16 Global Step: 87310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:39,255-Speed 18418.75 samples/sec Loss 5.0628 LearningRate 0.0123 Epoch: 16 Global Step: 87320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:43,680-Speed 18520.25 samples/sec Loss 5.0448 LearningRate 0.0123 Epoch: 16 Global Step: 87330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:48,105-Speed 18515.85 samples/sec Loss 5.0314 LearningRate 0.0123 Epoch: 16 Global Step: 87340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:52,522-Speed 18552.20 samples/sec Loss 5.0617 LearningRate 0.0123 Epoch: 16 Global Step: 87350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:49:56,923-Speed 18618.83 samples/sec Loss 5.0796 LearningRate 0.0122 Epoch: 16 Global Step: 87360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:01,316-Speed 18656.23 samples/sec Loss 5.0810 LearningRate 0.0122 Epoch: 16 Global Step: 87370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:50:05,727-Speed 18582.86 samples/sec Loss 5.0637 LearningRate 0.0122 Epoch: 16 Global Step: 87380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:10,112-Speed 18689.10 samples/sec Loss 5.0641 LearningRate 0.0122 Epoch: 16 Global Step: 87390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:14,544-Speed 18491.36 samples/sec Loss 5.0577 LearningRate 0.0122 Epoch: 16 Global Step: 87400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:18,937-Speed 18652.49 samples/sec Loss 5.0813 LearningRate 0.0122 Epoch: 16 Global Step: 87410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:23,384-Speed 18428.25 samples/sec Loss 5.0867 LearningRate 0.0121 Epoch: 16 Global Step: 87420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:27,800-Speed 18557.61 samples/sec Loss 5.0718 LearningRate 0.0121 Epoch: 16 Global Step: 87430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:32,242-Speed 18442.08 samples/sec Loss 5.0443 LearningRate 0.0121 Epoch: 16 Global Step: 87440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:36,675-Speed 18486.15 samples/sec Loss 5.0642 LearningRate 0.0121 Epoch: 16 Global Step: 87450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:41,079-Speed 18604.89 samples/sec Loss 5.0670 LearningRate 0.0121 Epoch: 16 Global Step: 87460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:45,528-Speed 18419.79 samples/sec Loss 5.0722 LearningRate 0.0121 Epoch: 16 Global Step: 87470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:50:49,946-Speed 18548.84 samples/sec Loss 5.0654 LearningRate 0.0121 Epoch: 16 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:50:54,342-Speed 18639.86 samples/sec Loss 5.0383 LearningRate 0.0120 Epoch: 16 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:50:58,764-Speed 18532.24 samples/sec Loss 5.0459 LearningRate 0.0120 Epoch: 16 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:03,175-Speed 18576.58 samples/sec Loss 5.0741 LearningRate 0.0120 Epoch: 16 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:07,573-Speed 18634.97 samples/sec Loss 5.0579 LearningRate 0.0120 Epoch: 16 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:11,973-Speed 18620.58 samples/sec Loss 5.0935 LearningRate 0.0120 Epoch: 16 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:16,395-Speed 18527.81 samples/sec Loss 5.0297 LearningRate 0.0120 Epoch: 16 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:20,809-Speed 18573.41 samples/sec Loss 5.0375 LearningRate 0.0120 Epoch: 16 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:25,262-Speed 18403.70 samples/sec Loss 5.0577 LearningRate 0.0119 Epoch: 16 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:29,683-Speed 18535.95 samples/sec Loss 5.0629 LearningRate 0.0119 Epoch: 16 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:51:34,122-Speed 18460.13 samples/sec Loss 5.0646 LearningRate 0.0119 Epoch: 16 Global Step: 87580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:51:38,538-Speed 18552.85 samples/sec Loss 5.0357 LearningRate 0.0119 Epoch: 16 Global Step: 87590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:51:42,976-Speed 18461.70 samples/sec Loss 5.0328 LearningRate 0.0119 Epoch: 16 Global Step: 87600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:51:47,426-Speed 18417.25 samples/sec Loss 5.0819 LearningRate 0.0119 Epoch: 16 Global Step: 87610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:51:51,845-Speed 18548.45 samples/sec Loss 5.0378 LearningRate 0.0118 Epoch: 16 Global Step: 87620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:51:56,299-Speed 18401.54 samples/sec Loss 5.1361 LearningRate 0.0118 Epoch: 16 Global Step: 87630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:52:00,724-Speed 18516.67 samples/sec Loss 5.0839 LearningRate 0.0118 Epoch: 16 Global Step: 87640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:52:05,151-Speed 18507.91 samples/sec Loss 5.0918 LearningRate 0.0118 Epoch: 16 Global Step: 87650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:52:09,555-Speed 18607.27 samples/sec Loss 5.0567 LearningRate 0.0118 Epoch: 16 Global Step: 87660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:52:13,961-Speed 18597.12 samples/sec Loss 5.0371 LearningRate 0.0118 Epoch: 16 Global Step: 87670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:52:18,379-Speed 18547.71 samples/sec Loss 5.0430 LearningRate 0.0118 Epoch: 16 Global Step: 87680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:22,768-Speed 18667.68 samples/sec Loss 5.0519 LearningRate 0.0117 Epoch: 16 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:27,216-Speed 18439.52 samples/sec Loss 5.0419 LearningRate 0.0117 Epoch: 16 Global Step: 87700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:31,618-Speed 18613.41 samples/sec Loss 5.0361 LearningRate 0.0117 Epoch: 16 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:36,110-Speed 18241.32 samples/sec Loss 5.0470 LearningRate 0.0117 Epoch: 16 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:40,602-Speed 18241.04 samples/sec Loss 5.0437 LearningRate 0.0117 Epoch: 16 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:44,990-Speed 18673.77 samples/sec Loss 5.0051 LearningRate 0.0117 Epoch: 16 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:49,430-Speed 18455.69 samples/sec Loss 5.0561 LearningRate 0.0117 Epoch: 16 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:53,900-Speed 18331.19 samples/sec Loss 5.0771 LearningRate 0.0116 Epoch: 16 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:52:58,402-Speed 18202.76 samples/sec Loss 5.0581 LearningRate 0.0116 Epoch: 16 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:53:02,814-Speed 18570.95 samples/sec Loss 5.0469 LearningRate 0.0116 Epoch: 16 Global Step: 87780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 08:53:07,299-Speed 18269.20 samples/sec Loss 5.0015 LearningRate 0.0116 Epoch: 16 Global Step: 87790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:53:11,701-Speed 18616.36 samples/sec Loss 5.0595 LearningRate 0.0116 Epoch: 16 Global Step: 87800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:53:16,160-Speed 18375.70 samples/sec Loss 4.9988 LearningRate 0.0116 Epoch: 16 Global Step: 87810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:53:20,615-Speed 18391.70 samples/sec Loss 5.0375 LearningRate 0.0116 Epoch: 16 Global Step: 87820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:53:25,005-Speed 18666.90 samples/sec Loss 5.0634 LearningRate 0.0115 Epoch: 16 Global Step: 87830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:53:29,404-Speed 18629.29 samples/sec Loss 5.0376 LearningRate 0.0115 Epoch: 16 Global Step: 87840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:53:38,434-Speed 9072.45 samples/sec Loss 5.0384 LearningRate 0.0115 Epoch: 16 Global Step: 87850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:53:42,838-Speed 18608.62 samples/sec Loss 5.0611 LearningRate 0.0115 Epoch: 16 Global Step: 87860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:53:47,255-Speed 18545.14 samples/sec Loss 5.0673 LearningRate 0.0115 Epoch: 16 Global Step: 87870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:53:51,680-Speed 18519.27 samples/sec Loss 5.1014 LearningRate 0.0115 Epoch: 16 Global Step: 87880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:53:56,067-Speed 18678.69 samples/sec Loss 5.0180 LearningRate 0.0115 Epoch: 16 Global Step: 87890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:54:00,495-Speed 18502.06 samples/sec Loss 5.0356 LearningRate 0.0114 Epoch: 16 Global Step: 87900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:54:04,907-Speed 18574.19 samples/sec Loss 5.0781 LearningRate 0.0114 Epoch: 16 Global Step: 87910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:54:09,331-Speed 18517.29 samples/sec Loss 5.0169 LearningRate 0.0114 Epoch: 16 Global Step: 87920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:54:13,754-Speed 18530.79 samples/sec Loss 5.0504 LearningRate 0.0114 Epoch: 16 Global Step: 87930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:54:18,171-Speed 18550.22 samples/sec Loss 5.0542 LearningRate 0.0114 Epoch: 16 Global Step: 87940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 08:54:22,588-Speed 18567.60 samples/sec Loss 5.0620 LearningRate 0.0114 Epoch: 16 Global Step: 87950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:27,006-Speed 18546.00 samples/sec Loss 5.0498 LearningRate 0.0114 Epoch: 16 Global Step: 87960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:31,384-Speed 18717.43 samples/sec Loss 5.0171 LearningRate 0.0113 Epoch: 16 Global Step: 87970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:35,788-Speed 18605.54 samples/sec Loss 5.0160 LearningRate 0.0113 Epoch: 16 Global Step: 87980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:40,224-Speed 18476.63 samples/sec Loss 5.0351 LearningRate 0.0113 Epoch: 16 Global Step: 87990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:44,651-Speed 18507.12 samples/sec Loss 5.0512 LearningRate 0.0113 Epoch: 16 Global Step: 88000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:49,088-Speed 18469.55 samples/sec Loss 5.0629 LearningRate 0.0113 Epoch: 16 Global Step: 88010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:53,586-Speed 18217.42 samples/sec Loss 5.0416 LearningRate 0.0113 Epoch: 16 Global Step: 88020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:54:58,010-Speed 18519.05 samples/sec Loss 5.0682 LearningRate 0.0113 Epoch: 16 Global Step: 88030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:55:02,415-Speed 18602.92 samples/sec Loss 5.0515 LearningRate 0.0112 Epoch: 16 Global Step: 88040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:55:06,845-Speed 18497.65 samples/sec Loss 5.0211 LearningRate 0.0112 Epoch: 16 Global Step: 88050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:55:11,316-Speed 18328.61 samples/sec Loss 5.0743 LearningRate 0.0112 Epoch: 16 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:55:15,825-Speed 18168.59 samples/sec Loss 5.0541 LearningRate 0.0112 Epoch: 16 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:55:20,221-Speed 18643.57 samples/sec Loss 5.0332 LearningRate 0.0112 Epoch: 16 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:55:24,621-Speed 18621.73 samples/sec Loss 5.0355 LearningRate 0.0112 Epoch: 16 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:55:29,033-Speed 18570.72 samples/sec Loss 5.0102 LearningRate 0.0112 Epoch: 16 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:55:33,453-Speed 18542.51 samples/sec Loss 5.0470 LearningRate 0.0111 Epoch: 16 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:55:37,862-Speed 18584.93 samples/sec Loss 5.0882 LearningRate 0.0111 Epoch: 16 Global Step: 88120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:55:42,259-Speed 18637.45 samples/sec Loss 5.0554 LearningRate 0.0111 Epoch: 16 Global Step: 88130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:55:46,645-Speed 18684.38 samples/sec Loss 5.0418 LearningRate 0.0111 Epoch: 16 Global Step: 88140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:04,795-Speed 4513.92 samples/sec Loss 5.0597 LearningRate 0.0111 Epoch: 17 Global Step: 88150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:09,318-Speed 18115.46 samples/sec Loss 5.0271 LearningRate 0.0111 Epoch: 17 Global Step: 88160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:13,689-Speed 18751.19 samples/sec Loss 5.0255 LearningRate 0.0111 Epoch: 17 Global Step: 88170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:18,098-Speed 18583.32 samples/sec Loss 5.0305 LearningRate 0.0110 Epoch: 17 Global Step: 88180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:22,488-Speed 18664.61 samples/sec Loss 5.0449 LearningRate 0.0110 Epoch: 17 Global Step: 88190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:26,885-Speed 18637.19 samples/sec Loss 5.0262 LearningRate 0.0110 Epoch: 17 Global Step: 88200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:31,300-Speed 18558.75 samples/sec Loss 5.0071 LearningRate 0.0110 Epoch: 17 Global Step: 88210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:56:35,723-Speed 18528.76 samples/sec Loss 5.0349 LearningRate 0.0110 Epoch: 17 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:56:40,140-Speed 18550.61 samples/sec Loss 4.9774 LearningRate 0.0110 Epoch: 17 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:56:44,548-Speed 18589.72 samples/sec Loss 5.0547 LearningRate 0.0110 Epoch: 17 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:56:48,932-Speed 18689.10 samples/sec Loss 5.0244 LearningRate 0.0109 Epoch: 17 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:56:53,355-Speed 18530.62 samples/sec Loss 5.0447 LearningRate 0.0109 Epoch: 17 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:56:57,730-Speed 18730.18 samples/sec Loss 5.0166 LearningRate 0.0109 Epoch: 17 Global Step: 88270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:02,121-Speed 18661.49 samples/sec Loss 5.0207 LearningRate 0.0109 Epoch: 17 Global Step: 88280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:06,523-Speed 18614.04 samples/sec Loss 5.0304 LearningRate 0.0109 Epoch: 17 Global Step: 88290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:10,945-Speed 18532.42 samples/sec Loss 5.0361 LearningRate 0.0109 Epoch: 17 Global Step: 88300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:15,387-Speed 18447.55 samples/sec Loss 5.0117 LearningRate 0.0109 Epoch: 17 Global Step: 88310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:19,815-Speed 18505.83 samples/sec Loss 5.0266 LearningRate 0.0108 Epoch: 17 Global Step: 88320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:24,212-Speed 18632.73 samples/sec Loss 5.0044 LearningRate 0.0108 Epoch: 17 Global Step: 88330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:28,655-Speed 18441.92 samples/sec Loss 5.0299 LearningRate 0.0108 Epoch: 17 Global Step: 88340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:33,090-Speed 18478.14 samples/sec Loss 5.0262 LearningRate 0.0108 Epoch: 17 Global Step: 88350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:37,507-Speed 18547.08 samples/sec Loss 5.0273 LearningRate 0.0108 Epoch: 17 Global Step: 88360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:41,921-Speed 18566.23 samples/sec Loss 5.0278 LearningRate 0.0108 Epoch: 17 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:57:46,348-Speed 18511.42 samples/sec Loss 5.0375 LearningRate 0.0108 Epoch: 17 Global Step: 88380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:50,788-Speed 18456.61 samples/sec Loss 5.0558 LearningRate 0.0107 Epoch: 17 Global Step: 88390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:55,251-Speed 18366.41 samples/sec Loss 5.0501 LearningRate 0.0107 Epoch: 17 Global Step: 88400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:57:59,643-Speed 18662.23 samples/sec Loss 5.0060 LearningRate 0.0107 Epoch: 17 Global Step: 88410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:04,056-Speed 18569.83 samples/sec Loss 5.0161 LearningRate 0.0107 Epoch: 17 Global Step: 88420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:08,483-Speed 18510.81 samples/sec Loss 5.0158 LearningRate 0.0107 Epoch: 17 Global Step: 88430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:12,856-Speed 18736.18 samples/sec Loss 5.0328 LearningRate 0.0107 Epoch: 17 Global Step: 88440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:17,306-Speed 18413.54 samples/sec Loss 4.9992 LearningRate 0.0107 Epoch: 17 Global Step: 88450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:21,723-Speed 18556.20 samples/sec Loss 5.0414 LearningRate 0.0106 Epoch: 17 Global Step: 88460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:26,144-Speed 18541.82 samples/sec Loss 5.0116 LearningRate 0.0106 Epoch: 17 Global Step: 88470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:30,561-Speed 18551.40 samples/sec Loss 5.0286 LearningRate 0.0106 Epoch: 17 Global Step: 88480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:58:34,977-Speed 18555.93 samples/sec Loss 4.9984 LearningRate 0.0106 Epoch: 17 Global Step: 88490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:58:39,380-Speed 18609.98 samples/sec Loss 5.0235 LearningRate 0.0106 Epoch: 17 Global Step: 88500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:58:43,824-Speed 18438.72 samples/sec Loss 5.0498 LearningRate 0.0106 Epoch: 17 Global Step: 88510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:58:48,225-Speed 18623.79 samples/sec Loss 5.0037 LearningRate 0.0106 Epoch: 17 Global Step: 88520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:58:52,631-Speed 18598.14 samples/sec Loss 5.0252 LearningRate 0.0105 Epoch: 17 Global Step: 88530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:58:57,061-Speed 18494.98 samples/sec Loss 5.0058 LearningRate 0.0105 Epoch: 17 Global Step: 88540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:01,473-Speed 18576.91 samples/sec Loss 5.0358 LearningRate 0.0105 Epoch: 17 Global Step: 88550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:05,877-Speed 18605.96 samples/sec Loss 5.0016 LearningRate 0.0105 Epoch: 17 Global Step: 88560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:10,297-Speed 18537.36 samples/sec Loss 5.0396 LearningRate 0.0105 Epoch: 17 Global Step: 88570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:14,714-Speed 18551.96 samples/sec Loss 5.0192 LearningRate 0.0105 Epoch: 17 Global Step: 88580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:19,147-Speed 18484.14 samples/sec Loss 5.0369 LearningRate 0.0105 Epoch: 17 Global Step: 88590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:23,622-Speed 18312.58 samples/sec Loss 5.0309 LearningRate 0.0104 Epoch: 17 Global Step: 88600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:28,036-Speed 18564.98 samples/sec Loss 4.9878 LearningRate 0.0104 Epoch: 17 Global Step: 88610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:32,432-Speed 18640.38 samples/sec Loss 5.0219 LearningRate 0.0104 Epoch: 17 Global Step: 88620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:36,858-Speed 18513.54 samples/sec Loss 5.0086 LearningRate 0.0104 Epoch: 17 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:59:41,314-Speed 18402.21 samples/sec Loss 5.0385 LearningRate 0.0104 Epoch: 17 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 08:59:45,822-Speed 18178.92 samples/sec Loss 5.0731 LearningRate 0.0104 Epoch: 17 Global Step: 88650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:50,269-Speed 18423.52 samples/sec Loss 5.0078 LearningRate 0.0104 Epoch: 17 Global Step: 88660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:54,744-Speed 18315.56 samples/sec Loss 5.0403 LearningRate 0.0104 Epoch: 17 Global Step: 88670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 08:59:59,199-Speed 18392.13 samples/sec Loss 5.0049 LearningRate 0.0103 Epoch: 17 Global Step: 88680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:03,646-Speed 18422.85 samples/sec Loss 5.0372 LearningRate 0.0103 Epoch: 17 Global Step: 88690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:08,063-Speed 18552.60 samples/sec Loss 4.9988 LearningRate 0.0103 Epoch: 17 Global Step: 88700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:12,505-Speed 18446.21 samples/sec Loss 5.0105 LearningRate 0.0103 Epoch: 17 Global Step: 88710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:16,971-Speed 18350.75 samples/sec Loss 4.9988 LearningRate 0.0103 Epoch: 17 Global Step: 88720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:21,390-Speed 18544.01 samples/sec Loss 5.0215 LearningRate 0.0103 Epoch: 17 Global Step: 88730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:25,822-Speed 18490.08 samples/sec Loss 4.9932 LearningRate 0.0103 Epoch: 17 Global Step: 88740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:30,287-Speed 18349.50 samples/sec Loss 5.0276 LearningRate 0.0102 Epoch: 17 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:00:34,714-Speed 18508.82 samples/sec Loss 4.9980 LearningRate 0.0102 Epoch: 17 Global Step: 88760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:00:39,213-Speed 18215.78 samples/sec Loss 5.0212 LearningRate 0.0102 Epoch: 17 Global Step: 88770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:00:43,648-Speed 18476.97 samples/sec Loss 5.0094 LearningRate 0.0102 Epoch: 17 Global Step: 88780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:48,062-Speed 18564.06 samples/sec Loss 5.0097 LearningRate 0.0102 Epoch: 17 Global Step: 88790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:52,558-Speed 18226.26 samples/sec Loss 5.0028 LearningRate 0.0102 Epoch: 17 Global Step: 88800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:00:56,987-Speed 18502.71 samples/sec Loss 4.9804 LearningRate 0.0102 Epoch: 17 Global Step: 88810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:01:01,449-Speed 18364.48 samples/sec Loss 5.0048 LearningRate 0.0101 Epoch: 17 Global Step: 88820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:01:05,915-Speed 18349.45 samples/sec Loss 4.9855 LearningRate 0.0101 Epoch: 17 Global Step: 88830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:01:10,350-Speed 18474.95 samples/sec Loss 4.9945 LearningRate 0.0101 Epoch: 17 Global Step: 88840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:01:14,813-Speed 18362.42 samples/sec Loss 4.9943 LearningRate 0.0101 Epoch: 17 Global Step: 88850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:01:19,283-Speed 18331.91 samples/sec Loss 4.9792 LearningRate 0.0101 Epoch: 17 Global Step: 88860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:01:23,690-Speed 18592.09 samples/sec Loss 5.0100 LearningRate 0.0101 Epoch: 17 Global Step: 88870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:28,129-Speed 18461.87 samples/sec Loss 5.0118 LearningRate 0.0101 Epoch: 17 Global Step: 88880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:32,618-Speed 18255.11 samples/sec Loss 4.9925 LearningRate 0.0100 Epoch: 17 Global Step: 88890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:37,026-Speed 18588.71 samples/sec Loss 5.0284 LearningRate 0.0100 Epoch: 17 Global Step: 88900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:41,426-Speed 18625.36 samples/sec Loss 4.9918 LearningRate 0.0100 Epoch: 17 Global Step: 88910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:45,841-Speed 18566.97 samples/sec Loss 4.9939 LearningRate 0.0100 Epoch: 17 Global Step: 88920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:50,287-Speed 18429.76 samples/sec Loss 4.9880 LearningRate 0.0100 Epoch: 17 Global Step: 88930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:54,722-Speed 18472.00 samples/sec Loss 4.9462 LearningRate 0.0100 Epoch: 17 Global Step: 88940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:01:59,158-Speed 18472.61 samples/sec Loss 5.0003 LearningRate 0.0100 Epoch: 17 Global Step: 88950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:02:03,622-Speed 18359.36 samples/sec Loss 5.0027 LearningRate 0.0100 Epoch: 17 Global Step: 88960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:02:08,035-Speed 18568.82 samples/sec Loss 4.9773 LearningRate 0.0099 Epoch: 17 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:12,452-Speed 18552.03 samples/sec Loss 4.9865 LearningRate 0.0099 Epoch: 17 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:16,849-Speed 18635.06 samples/sec Loss 4.9681 LearningRate 0.0099 Epoch: 17 Global Step: 88990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:21,258-Speed 18583.23 samples/sec Loss 5.0123 LearningRate 0.0099 Epoch: 17 Global Step: 89000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:25,700-Speed 18453.39 samples/sec Loss 5.0158 LearningRate 0.0099 Epoch: 17 Global Step: 89010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:30,133-Speed 18484.45 samples/sec Loss 5.0180 LearningRate 0.0099 Epoch: 17 Global Step: 89020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:34,576-Speed 18442.52 samples/sec Loss 5.0327 LearningRate 0.0099 Epoch: 17 Global Step: 89030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:38,959-Speed 18697.43 samples/sec Loss 5.0385 LearningRate 0.0098 Epoch: 17 Global Step: 89040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:43,445-Speed 18267.03 samples/sec Loss 5.0120 LearningRate 0.0098 Epoch: 17 Global Step: 89050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:47,897-Speed 18408.08 samples/sec Loss 5.0007 LearningRate 0.0098 Epoch: 17 Global Step: 89060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:52,343-Speed 18428.65 samples/sec Loss 4.9710 LearningRate 0.0098 Epoch: 17 Global Step: 89070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:02:56,813-Speed 18332.85 samples/sec Loss 4.9828 LearningRate 0.0098 Epoch: 17 Global Step: 89080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:03:01,273-Speed 18375.79 samples/sec Loss 4.9675 LearningRate 0.0098 Epoch: 17 Global Step: 89090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:03:05,713-Speed 18451.28 samples/sec Loss 4.9701 LearningRate 0.0098 Epoch: 17 Global Step: 89100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:03:10,205-Speed 18247.40 samples/sec Loss 4.9976 LearningRate 0.0098 Epoch: 17 Global Step: 89110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:03:14,618-Speed 18566.34 samples/sec Loss 5.0044 LearningRate 0.0097 Epoch: 17 Global Step: 89120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:18,997-Speed 18713.27 samples/sec Loss 4.9988 LearningRate 0.0097 Epoch: 17 Global Step: 89130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:23,413-Speed 18557.73 samples/sec Loss 4.9972 LearningRate 0.0097 Epoch: 17 Global Step: 89140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:27,878-Speed 18351.69 samples/sec Loss 4.9921 LearningRate 0.0097 Epoch: 17 Global Step: 89150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:32,276-Speed 18633.21 samples/sec Loss 4.9739 LearningRate 0.0097 Epoch: 17 Global Step: 89160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:36,679-Speed 18615.52 samples/sec Loss 4.9789 LearningRate 0.0097 Epoch: 17 Global Step: 89170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:41,074-Speed 18647.98 samples/sec Loss 4.9875 LearningRate 0.0097 Epoch: 17 Global Step: 89180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:45,459-Speed 18689.34 samples/sec Loss 4.9544 LearningRate 0.0096 Epoch: 17 Global Step: 89190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:49,911-Speed 18406.56 samples/sec Loss 5.0198 LearningRate 0.0096 Epoch: 17 Global Step: 89200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:03:54,285-Speed 18739.50 samples/sec Loss 4.9676 LearningRate 0.0096 Epoch: 17 Global Step: 89210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:04:03,798-Speed 8612.63 samples/sec Loss 4.9897 LearningRate 0.0096 Epoch: 17 Global Step: 89220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:08,186-Speed 18681.62 samples/sec Loss 4.9866 LearningRate 0.0096 Epoch: 17 Global Step: 89230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:12,613-Speed 18514.74 samples/sec Loss 4.9690 LearningRate 0.0096 Epoch: 17 Global Step: 89240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:17,010-Speed 18634.80 samples/sec Loss 5.0137 LearningRate 0.0096 Epoch: 17 Global Step: 89250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:21,425-Speed 18561.08 samples/sec Loss 4.9744 LearningRate 0.0096 Epoch: 17 Global Step: 89260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:25,835-Speed 18587.75 samples/sec Loss 4.9940 LearningRate 0.0095 Epoch: 17 Global Step: 89270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:30,265-Speed 18501.54 samples/sec Loss 4.9759 LearningRate 0.0095 Epoch: 17 Global Step: 89280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:34,657-Speed 18655.10 samples/sec Loss 4.9939 LearningRate 0.0095 Epoch: 17 Global Step: 89290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:39,103-Speed 18433.28 samples/sec Loss 4.9768 LearningRate 0.0095 Epoch: 17 Global Step: 89300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:43,498-Speed 18647.89 samples/sec Loss 4.9733 LearningRate 0.0095 Epoch: 17 Global Step: 89310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:04:47,928-Speed 18498.60 samples/sec Loss 5.0157 LearningRate 0.0095 Epoch: 17 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:04:52,349-Speed 18533.08 samples/sec Loss 5.0028 LearningRate 0.0095 Epoch: 17 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:04:56,755-Speed 18595.90 samples/sec Loss 4.9987 LearningRate 0.0094 Epoch: 17 Global Step: 89340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:01,209-Speed 18400.41 samples/sec Loss 4.9891 LearningRate 0.0094 Epoch: 17 Global Step: 89350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:05,624-Speed 18558.92 samples/sec Loss 4.9775 LearningRate 0.0094 Epoch: 17 Global Step: 89360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:10,056-Speed 18488.46 samples/sec Loss 4.9866 LearningRate 0.0094 Epoch: 17 Global Step: 89370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:14,502-Speed 18430.79 samples/sec Loss 5.0002 LearningRate 0.0094 Epoch: 17 Global Step: 89380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:18,995-Speed 18239.54 samples/sec Loss 4.9909 LearningRate 0.0094 Epoch: 17 Global Step: 89390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:23,474-Speed 18293.21 samples/sec Loss 5.0090 LearningRate 0.0094 Epoch: 17 Global Step: 89400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:27,934-Speed 18378.63 samples/sec Loss 4.9614 LearningRate 0.0094 Epoch: 17 Global Step: 89410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:32,363-Speed 18500.40 samples/sec Loss 5.0085 LearningRate 0.0093 Epoch: 17 Global Step: 89420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:36,766-Speed 18610.55 samples/sec Loss 4.9481 LearningRate 0.0093 Epoch: 17 Global Step: 89430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:05:41,162-Speed 18639.72 samples/sec Loss 5.0085 LearningRate 0.0093 Epoch: 17 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:05:45,593-Speed 18494.11 samples/sec Loss 4.9957 LearningRate 0.0093 Epoch: 17 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:05:49,981-Speed 18677.62 samples/sec Loss 4.9921 LearningRate 0.0093 Epoch: 17 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:05:54,400-Speed 18540.05 samples/sec Loss 4.9646 LearningRate 0.0093 Epoch: 17 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:05:58,914-Speed 18153.69 samples/sec Loss 4.9853 LearningRate 0.0093 Epoch: 17 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:06:03,356-Speed 18445.06 samples/sec Loss 4.9482 LearningRate 0.0093 Epoch: 17 Global Step: 89490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:07,761-Speed 18605.53 samples/sec Loss 4.9708 LearningRate 0.0092 Epoch: 17 Global Step: 89500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:12,166-Speed 18600.63 samples/sec Loss 4.9740 LearningRate 0.0092 Epoch: 17 Global Step: 89510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:16,580-Speed 18568.05 samples/sec Loss 4.9767 LearningRate 0.0092 Epoch: 17 Global Step: 89520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:20,989-Speed 18583.75 samples/sec Loss 4.9744 LearningRate 0.0092 Epoch: 17 Global Step: 89530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:25,465-Speed 18307.72 samples/sec Loss 4.9560 LearningRate 0.0092 Epoch: 17 Global Step: 89540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:29,864-Speed 18625.15 samples/sec Loss 4.9970 LearningRate 0.0092 Epoch: 17 Global Step: 89550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:34,264-Speed 18626.88 samples/sec Loss 5.0189 LearningRate 0.0092 Epoch: 17 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:38,690-Speed 18520.41 samples/sec Loss 4.9348 LearningRate 0.0091 Epoch: 17 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:43,176-Speed 18264.80 samples/sec Loss 4.9749 LearningRate 0.0091 Epoch: 17 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:06:47,641-Speed 18352.20 samples/sec Loss 4.9989 LearningRate 0.0091 Epoch: 17 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:06:52,044-Speed 18610.43 samples/sec Loss 5.0020 LearningRate 0.0091 Epoch: 17 Global Step: 89600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:06:56,459-Speed 18561.86 samples/sec Loss 4.9499 LearningRate 0.0091 Epoch: 17 Global Step: 89610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:07:00,905-Speed 18433.23 samples/sec Loss 4.9720 LearningRate 0.0091 Epoch: 17 Global Step: 89620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:07:05,313-Speed 18595.03 samples/sec Loss 4.9967 LearningRate 0.0091 Epoch: 17 Global Step: 89630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:07:09,733-Speed 18537.12 samples/sec Loss 4.9864 LearningRate 0.0091 Epoch: 17 Global Step: 89640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:07:14,223-Speed 18251.02 samples/sec Loss 5.0257 LearningRate 0.0090 Epoch: 17 Global Step: 89650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:07:18,635-Speed 18577.43 samples/sec Loss 4.9298 LearningRate 0.0090 Epoch: 17 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:07:23,100-Speed 18352.32 samples/sec Loss 4.9861 LearningRate 0.0090 Epoch: 17 Global Step: 89670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:27,636-Speed 18077.02 samples/sec Loss 4.9961 LearningRate 0.0090 Epoch: 17 Global Step: 89680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:32,040-Speed 18601.39 samples/sec Loss 4.9757 LearningRate 0.0090 Epoch: 17 Global Step: 89690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:36,453-Speed 18570.28 samples/sec Loss 4.9958 LearningRate 0.0090 Epoch: 17 Global Step: 89700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:40,853-Speed 18622.74 samples/sec Loss 4.9965 LearningRate 0.0090 Epoch: 17 Global Step: 89710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:45,283-Speed 18494.26 samples/sec Loss 5.0000 LearningRate 0.0090 Epoch: 17 Global Step: 89720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:49,749-Speed 18350.21 samples/sec Loss 4.9554 LearningRate 0.0089 Epoch: 17 Global Step: 89730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:54,158-Speed 18582.55 samples/sec Loss 4.9588 LearningRate 0.0089 Epoch: 17 Global Step: 89740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:07:58,578-Speed 18537.65 samples/sec Loss 5.0030 LearningRate 0.0089 Epoch: 17 Global Step: 89750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:08:03,003-Speed 18519.08 samples/sec Loss 4.9835 LearningRate 0.0089 Epoch: 17 Global Step: 89760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:08:07,416-Speed 18563.39 samples/sec Loss 4.9941 LearningRate 0.0089 Epoch: 17 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:11,809-Speed 18654.22 samples/sec Loss 4.9841 LearningRate 0.0089 Epoch: 17 Global Step: 89780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:16,322-Speed 18157.10 samples/sec Loss 4.9776 LearningRate 0.0089 Epoch: 17 Global Step: 89790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:20,814-Speed 18241.49 samples/sec Loss 4.9482 LearningRate 0.0089 Epoch: 17 Global Step: 89800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:25,232-Speed 18551.45 samples/sec Loss 4.9516 LearningRate 0.0088 Epoch: 17 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:29,662-Speed 18505.37 samples/sec Loss 4.9644 LearningRate 0.0088 Epoch: 17 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:34,173-Speed 18164.72 samples/sec Loss 4.9839 LearningRate 0.0088 Epoch: 17 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:38,601-Speed 18504.40 samples/sec Loss 4.9541 LearningRate 0.0088 Epoch: 17 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:43,044-Speed 18448.82 samples/sec Loss 4.9300 LearningRate 0.0088 Epoch: 17 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:47,437-Speed 18653.56 samples/sec Loss 4.9714 LearningRate 0.0088 Epoch: 17 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:51,866-Speed 18498.58 samples/sec Loss 4.9605 LearningRate 0.0088 Epoch: 17 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:08:56,282-Speed 18557.33 samples/sec Loss 4.9801 LearningRate 0.0087 Epoch: 17 Global Step: 89880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:00,716-Speed 18482.40 samples/sec Loss 4.9338 LearningRate 0.0087 Epoch: 17 Global Step: 89890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:05,122-Speed 18599.23 samples/sec Loss 4.9805 LearningRate 0.0087 Epoch: 17 Global Step: 89900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:09,544-Speed 18530.10 samples/sec Loss 4.9499 LearningRate 0.0087 Epoch: 17 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:14,057-Speed 18156.52 samples/sec Loss 5.0009 LearningRate 0.0087 Epoch: 17 Global Step: 89920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:18,485-Speed 18510.99 samples/sec Loss 4.9448 LearningRate 0.0087 Epoch: 17 Global Step: 89930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:22,893-Speed 18596.87 samples/sec Loss 4.9889 LearningRate 0.0087 Epoch: 17 Global Step: 89940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:27,303-Speed 18581.42 samples/sec Loss 4.9419 LearningRate 0.0087 Epoch: 17 Global Step: 89950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:31,741-Speed 18466.50 samples/sec Loss 4.9775 LearningRate 0.0086 Epoch: 17 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:36,173-Speed 18487.24 samples/sec Loss 4.9819 LearningRate 0.0086 Epoch: 17 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:09:40,597-Speed 18530.67 samples/sec Loss 4.9809 LearningRate 0.0086 Epoch: 17 Global Step: 89980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:09:44,998-Speed 18619.78 samples/sec Loss 4.9469 LearningRate 0.0086 Epoch: 17 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:09:49,404-Speed 18601.18 samples/sec Loss 4.9680 LearningRate 0.0086 Epoch: 17 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:09:53,831-Speed 18512.69 samples/sec Loss 4.9125 LearningRate 0.0086 Epoch: 17 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:09:58,218-Speed 18680.22 samples/sec Loss 4.9283 LearningRate 0.0086 Epoch: 17 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:02,653-Speed 18476.61 samples/sec Loss 4.9623 LearningRate 0.0086 Epoch: 17 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:07,055-Speed 18619.27 samples/sec Loss 4.9641 LearningRate 0.0085 Epoch: 17 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:11,529-Speed 18317.65 samples/sec Loss 4.9633 LearningRate 0.0085 Epoch: 17 Global Step: 90050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:15,931-Speed 18612.74 samples/sec Loss 4.9313 LearningRate 0.0085 Epoch: 17 Global Step: 90060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:20,352-Speed 18537.20 samples/sec Loss 4.9351 LearningRate 0.0085 Epoch: 17 Global Step: 90070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:24,753-Speed 18622.53 samples/sec Loss 4.9821 LearningRate 0.0085 Epoch: 17 Global Step: 90080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-14 09:10:29,145-Speed 18658.01 samples/sec Loss 4.9621 LearningRate 0.0085 Epoch: 17 Global Step: 90090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:33,530-Speed 18684.19 samples/sec Loss 5.0007 LearningRate 0.0085 Epoch: 17 Global Step: 90100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:37,944-Speed 18561.80 samples/sec Loss 4.9298 LearningRate 0.0085 Epoch: 17 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:42,338-Speed 18652.99 samples/sec Loss 4.9594 LearningRate 0.0084 Epoch: 17 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:46,744-Speed 18595.46 samples/sec Loss 4.9434 LearningRate 0.0084 Epoch: 17 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:51,141-Speed 18643.62 samples/sec Loss 4.9106 LearningRate 0.0084 Epoch: 17 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:10:55,589-Speed 18424.15 samples/sec Loss 4.9438 LearningRate 0.0084 Epoch: 17 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:11:00,016-Speed 18510.52 samples/sec Loss 4.9875 LearningRate 0.0084 Epoch: 17 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:11:04,431-Speed 18562.91 samples/sec Loss 4.9273 LearningRate 0.0084 Epoch: 17 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:11:08,845-Speed 18569.39 samples/sec Loss 4.9246 LearningRate 0.0084 Epoch: 17 Global Step: 90180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:13,272-Speed 18514.40 samples/sec Loss 4.9622 LearningRate 0.0084 Epoch: 17 Global Step: 90190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:17,653-Speed 18705.95 samples/sec Loss 4.9826 LearningRate 0.0083 Epoch: 17 Global Step: 90200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:22,067-Speed 18561.86 samples/sec Loss 4.9670 LearningRate 0.0083 Epoch: 17 Global Step: 90210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:26,497-Speed 18498.42 samples/sec Loss 5.0051 LearningRate 0.0083 Epoch: 17 Global Step: 90220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:30,911-Speed 18563.84 samples/sec Loss 4.9625 LearningRate 0.0083 Epoch: 17 Global Step: 90230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:35,322-Speed 18578.56 samples/sec Loss 4.9552 LearningRate 0.0083 Epoch: 17 Global Step: 90240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:39,752-Speed 18505.45 samples/sec Loss 4.9235 LearningRate 0.0083 Epoch: 17 Global Step: 90250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:44,228-Speed 18307.90 samples/sec Loss 4.9437 LearningRate 0.0083 Epoch: 17 Global Step: 90260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:48,703-Speed 18315.45 samples/sec Loss 4.9614 LearningRate 0.0083 Epoch: 17 Global Step: 90270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:11:53,126-Speed 18524.96 samples/sec Loss 4.9516 LearningRate 0.0082 Epoch: 17 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:11:57,515-Speed 18669.98 samples/sec Loss 4.9494 LearningRate 0.0082 Epoch: 17 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:12:01,932-Speed 18557.58 samples/sec Loss 4.9287 LearningRate 0.0082 Epoch: 17 Global Step: 90300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:06,317-Speed 18685.57 samples/sec Loss 4.9635 LearningRate 0.0082 Epoch: 17 Global Step: 90310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:10,740-Speed 18530.38 samples/sec Loss 4.9490 LearningRate 0.0082 Epoch: 17 Global Step: 90320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:15,164-Speed 18521.06 samples/sec Loss 4.9354 LearningRate 0.0082 Epoch: 17 Global Step: 90330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:19,600-Speed 18472.91 samples/sec Loss 4.9455 LearningRate 0.0082 Epoch: 17 Global Step: 90340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:24,014-Speed 18563.22 samples/sec Loss 4.9309 LearningRate 0.0082 Epoch: 17 Global Step: 90350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:28,441-Speed 18514.21 samples/sec Loss 4.9338 LearningRate 0.0082 Epoch: 17 Global Step: 90360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:32,841-Speed 18620.29 samples/sec Loss 4.9298 LearningRate 0.0081 Epoch: 17 Global Step: 90370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:37,275-Speed 18479.09 samples/sec Loss 4.9340 LearningRate 0.0081 Epoch: 17 Global Step: 90380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:41,697-Speed 18534.50 samples/sec Loss 4.9446 LearningRate 0.0081 Epoch: 17 Global Step: 90390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:12:46,057-Speed 18791.49 samples/sec Loss 4.9532 LearningRate 0.0081 Epoch: 17 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:12:50,470-Speed 18568.62 samples/sec Loss 4.9721 LearningRate 0.0081 Epoch: 17 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:12:54,856-Speed 18684.70 samples/sec Loss 4.9747 LearningRate 0.0081 Epoch: 17 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:12:59,234-Speed 18716.06 samples/sec Loss 4.9463 LearningRate 0.0081 Epoch: 17 Global Step: 90430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:03,649-Speed 18557.62 samples/sec Loss 4.9262 LearningRate 0.0081 Epoch: 17 Global Step: 90440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:08,077-Speed 18507.57 samples/sec Loss 4.9463 LearningRate 0.0080 Epoch: 17 Global Step: 90450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:12,551-Speed 18314.44 samples/sec Loss 4.9529 LearningRate 0.0080 Epoch: 17 Global Step: 90460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:16,970-Speed 18544.62 samples/sec Loss 4.9440 LearningRate 0.0080 Epoch: 17 Global Step: 90470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:21,369-Speed 18626.70 samples/sec Loss 4.9530 LearningRate 0.0080 Epoch: 17 Global Step: 90480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:25,838-Speed 18337.61 samples/sec Loss 4.9383 LearningRate 0.0080 Epoch: 17 Global Step: 90490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:30,224-Speed 18681.44 samples/sec Loss 4.9421 LearningRate 0.0080 Epoch: 17 Global Step: 90500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:34,641-Speed 18549.70 samples/sec Loss 4.9545 LearningRate 0.0080 Epoch: 17 Global Step: 90510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:39,081-Speed 18456.24 samples/sec Loss 4.9438 LearningRate 0.0080 Epoch: 17 Global Step: 90520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:13:43,479-Speed 18630.68 samples/sec Loss 4.9320 LearningRate 0.0079 Epoch: 17 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:13:47,905-Speed 18516.06 samples/sec Loss 4.9360 LearningRate 0.0079 Epoch: 17 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:13:52,322-Speed 18550.11 samples/sec Loss 4.9575 LearningRate 0.0079 Epoch: 17 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:13:56,693-Speed 18745.21 samples/sec Loss 4.9837 LearningRate 0.0079 Epoch: 17 Global Step: 90560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:01,111-Speed 18547.32 samples/sec Loss 4.9739 LearningRate 0.0079 Epoch: 17 Global Step: 90570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:05,573-Speed 18366.85 samples/sec Loss 4.9575 LearningRate 0.0079 Epoch: 17 Global Step: 90580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:10,002-Speed 18501.98 samples/sec Loss 4.9645 LearningRate 0.0079 Epoch: 17 Global Step: 90590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:14,389-Speed 18677.03 samples/sec Loss 4.9350 LearningRate 0.0079 Epoch: 17 Global Step: 90600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:18,807-Speed 18547.35 samples/sec Loss 4.9755 LearningRate 0.0078 Epoch: 17 Global Step: 90610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:23,238-Speed 18498.16 samples/sec Loss 4.9006 LearningRate 0.0078 Epoch: 17 Global Step: 90620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:27,708-Speed 18335.33 samples/sec Loss 4.9344 LearningRate 0.0078 Epoch: 17 Global Step: 90630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:36,024-Speed 9853.82 samples/sec Loss 4.9099 LearningRate 0.0078 Epoch: 17 Global Step: 90640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:40,420-Speed 18641.38 samples/sec Loss 4.9385 LearningRate 0.0078 Epoch: 17 Global Step: 90650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:44,839-Speed 18545.42 samples/sec Loss 4.9120 LearningRate 0.0078 Epoch: 17 Global Step: 90660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:49,251-Speed 18575.79 samples/sec Loss 4.9656 LearningRate 0.0078 Epoch: 17 Global Step: 90670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:53,667-Speed 18555.46 samples/sec Loss 4.9364 LearningRate 0.0078 Epoch: 17 Global Step: 90680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:14:58,082-Speed 18561.92 samples/sec Loss 4.9554 LearningRate 0.0078 Epoch: 17 Global Step: 90690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:15:02,511-Speed 18497.26 samples/sec Loss 4.9130 LearningRate 0.0077 Epoch: 17 Global Step: 90700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:15:06,950-Speed 18461.30 samples/sec Loss 4.9250 LearningRate 0.0077 Epoch: 17 Global Step: 90710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:15:11,435-Speed 18274.24 samples/sec Loss 4.9751 LearningRate 0.0077 Epoch: 17 Global Step: 90720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:15:15,881-Speed 18426.98 samples/sec Loss 4.9353 LearningRate 0.0077 Epoch: 17 Global Step: 90730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:15:20,310-Speed 18504.22 samples/sec Loss 4.9512 LearningRate 0.0077 Epoch: 17 Global Step: 90740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:15:24,770-Speed 18370.50 samples/sec Loss 4.9559 LearningRate 0.0077 Epoch: 17 Global Step: 90750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:15:29,238-Speed 18340.05 samples/sec Loss 4.9335 LearningRate 0.0077 Epoch: 17 Global Step: 90760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:15:33,650-Speed 18571.85 samples/sec Loss 4.9091 LearningRate 0.0077 Epoch: 17 Global Step: 90770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:15:38,080-Speed 18502.37 samples/sec Loss 4.9413 LearningRate 0.0076 Epoch: 17 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:15:42,528-Speed 18424.93 samples/sec Loss 4.9433 LearningRate 0.0076 Epoch: 17 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:15:46,944-Speed 18556.29 samples/sec Loss 4.9435 LearningRate 0.0076 Epoch: 17 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:15:51,336-Speed 18657.51 samples/sec Loss 4.9088 LearningRate 0.0076 Epoch: 17 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:15:55,721-Speed 18687.69 samples/sec Loss 4.9533 LearningRate 0.0076 Epoch: 17 Global Step: 90820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:00,187-Speed 18347.24 samples/sec Loss 4.8934 LearningRate 0.0076 Epoch: 17 Global Step: 90830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:04,587-Speed 18625.97 samples/sec Loss 4.9139 LearningRate 0.0076 Epoch: 17 Global Step: 90840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:08,978-Speed 18665.05 samples/sec Loss 4.9612 LearningRate 0.0076 Epoch: 17 Global Step: 90850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:13,365-Speed 18679.83 samples/sec Loss 4.9689 LearningRate 0.0076 Epoch: 17 Global Step: 90860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:17,788-Speed 18527.88 samples/sec Loss 4.9297 LearningRate 0.0075 Epoch: 17 Global Step: 90870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:22,249-Speed 18370.70 samples/sec Loss 4.9351 LearningRate 0.0075 Epoch: 17 Global Step: 90880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:26,698-Speed 18424.17 samples/sec Loss 4.9262 LearningRate 0.0075 Epoch: 17 Global Step: 90890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:31,118-Speed 18535.38 samples/sec Loss 4.9500 LearningRate 0.0075 Epoch: 17 Global Step: 90900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:35,609-Speed 18244.44 samples/sec Loss 4.9268 LearningRate 0.0075 Epoch: 17 Global Step: 90910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:40,030-Speed 18537.56 samples/sec Loss 4.9380 LearningRate 0.0075 Epoch: 17 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:16:44,493-Speed 18356.36 samples/sec Loss 4.9434 LearningRate 0.0075 Epoch: 17 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:16:48,939-Speed 18431.93 samples/sec Loss 4.9097 LearningRate 0.0075 Epoch: 17 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:16:53,349-Speed 18581.75 samples/sec Loss 4.9246 LearningRate 0.0074 Epoch: 17 Global Step: 90950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:16:57,745-Speed 18640.24 samples/sec Loss 4.9411 LearningRate 0.0074 Epoch: 17 Global Step: 90960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:17:02,178-Speed 18481.15 samples/sec Loss 4.9094 LearningRate 0.0074 Epoch: 17 Global Step: 90970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:17:06,552-Speed 18734.04 samples/sec Loss 4.8921 LearningRate 0.0074 Epoch: 17 Global Step: 90980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:17:10,941-Speed 18669.04 samples/sec Loss 4.9354 LearningRate 0.0074 Epoch: 17 Global Step: 90990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:15,380-Speed 18455.84 samples/sec Loss 4.9202 LearningRate 0.0074 Epoch: 17 Global Step: 91000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:19,762-Speed 18704.20 samples/sec Loss 4.9361 LearningRate 0.0074 Epoch: 17 Global Step: 91010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:24,195-Speed 18482.55 samples/sec Loss 4.9544 LearningRate 0.0074 Epoch: 17 Global Step: 91020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:28,606-Speed 18597.79 samples/sec Loss 4.9227 LearningRate 0.0074 Epoch: 17 Global Step: 91030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:33,008-Speed 18612.87 samples/sec Loss 4.9610 LearningRate 0.0073 Epoch: 17 Global Step: 91040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:37,424-Speed 18549.88 samples/sec Loss 4.9410 LearningRate 0.0073 Epoch: 17 Global Step: 91050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:41,848-Speed 18523.99 samples/sec Loss 4.8965 LearningRate 0.0073 Epoch: 17 Global Step: 91060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:46,248-Speed 18622.39 samples/sec Loss 4.9097 LearningRate 0.0073 Epoch: 17 Global Step: 91070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:50,676-Speed 18507.18 samples/sec Loss 4.9574 LearningRate 0.0073 Epoch: 17 Global Step: 91080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-01-14 09:17:55,116-Speed 18454.65 samples/sec Loss 4.9467 LearningRate 0.0073 Epoch: 17 Global Step: 91090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:17:59,536-Speed 18538.66 samples/sec Loss 4.9423 LearningRate 0.0073 Epoch: 17 Global Step: 91100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:03,993-Speed 18387.81 samples/sec Loss 4.9438 LearningRate 0.0073 Epoch: 17 Global Step: 91110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:08,373-Speed 18702.86 samples/sec Loss 4.9629 LearningRate 0.0072 Epoch: 17 Global Step: 91120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:12,770-Speed 18640.32 samples/sec Loss 4.9226 LearningRate 0.0072 Epoch: 17 Global Step: 91130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:17,181-Speed 18578.56 samples/sec Loss 4.9350 LearningRate 0.0072 Epoch: 17 Global Step: 91140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:21,592-Speed 18577.01 samples/sec Loss 4.8954 LearningRate 0.0072 Epoch: 17 Global Step: 91150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:26,017-Speed 18517.83 samples/sec Loss 4.9066 LearningRate 0.0072 Epoch: 17 Global Step: 91160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:30,402-Speed 18686.56 samples/sec Loss 4.9347 LearningRate 0.0072 Epoch: 17 Global Step: 91170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:34,790-Speed 18677.74 samples/sec Loss 4.9357 LearningRate 0.0072 Epoch: 17 Global Step: 91180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:39,250-Speed 18369.95 samples/sec Loss 4.9256 LearningRate 0.0072 Epoch: 17 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:18:43,648-Speed 18635.51 samples/sec Loss 4.9144 LearningRate 0.0072 Epoch: 17 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:18:48,013-Speed 18774.50 samples/sec Loss 4.9216 LearningRate 0.0071 Epoch: 17 Global Step: 91210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:52,405-Speed 18655.56 samples/sec Loss 4.8819 LearningRate 0.0071 Epoch: 17 Global Step: 91220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:18:56,818-Speed 18574.53 samples/sec Loss 4.8939 LearningRate 0.0071 Epoch: 17 Global Step: 91230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:01,266-Speed 18426.27 samples/sec Loss 4.9547 LearningRate 0.0071 Epoch: 17 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:05,718-Speed 18403.85 samples/sec Loss 4.9091 LearningRate 0.0071 Epoch: 17 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:10,141-Speed 18528.06 samples/sec Loss 4.9235 LearningRate 0.0071 Epoch: 17 Global Step: 91260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:14,536-Speed 18647.68 samples/sec Loss 4.9188 LearningRate 0.0071 Epoch: 17 Global Step: 91270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:18,959-Speed 18528.22 samples/sec Loss 4.9343 LearningRate 0.0071 Epoch: 17 Global Step: 91280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:23,378-Speed 18545.11 samples/sec Loss 4.9416 LearningRate 0.0071 Epoch: 17 Global Step: 91290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:27,830-Speed 18406.44 samples/sec Loss 4.9009 LearningRate 0.0070 Epoch: 17 Global Step: 91300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:32,274-Speed 18439.71 samples/sec Loss 4.9181 LearningRate 0.0070 Epoch: 17 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:19:36,656-Speed 18703.25 samples/sec Loss 4.9488 LearningRate 0.0070 Epoch: 17 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:19:41,098-Speed 18449.59 samples/sec Loss 4.9134 LearningRate 0.0070 Epoch: 17 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:19:45,516-Speed 18549.41 samples/sec Loss 4.9245 LearningRate 0.0070 Epoch: 17 Global Step: 91340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:49,970-Speed 18397.20 samples/sec Loss 4.8790 LearningRate 0.0070 Epoch: 17 Global Step: 91350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:54,449-Speed 18293.87 samples/sec Loss 4.9242 LearningRate 0.0070 Epoch: 17 Global Step: 91360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:19:58,897-Speed 18424.77 samples/sec Loss 4.9117 LearningRate 0.0070 Epoch: 17 Global Step: 91370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:03,396-Speed 18212.25 samples/sec Loss 4.8936 LearningRate 0.0070 Epoch: 17 Global Step: 91380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:07,851-Speed 18392.05 samples/sec Loss 4.8983 LearningRate 0.0069 Epoch: 17 Global Step: 91390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:12,253-Speed 18623.05 samples/sec Loss 4.9003 LearningRate 0.0069 Epoch: 17 Global Step: 91400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:16,723-Speed 18328.09 samples/sec Loss 4.8952 LearningRate 0.0069 Epoch: 17 Global Step: 91410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:21,187-Speed 18361.52 samples/sec Loss 4.9003 LearningRate 0.0069 Epoch: 17 Global Step: 91420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:25,610-Speed 18523.67 samples/sec Loss 4.8908 LearningRate 0.0069 Epoch: 17 Global Step: 91430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:30,045-Speed 18478.67 samples/sec Loss 4.9278 LearningRate 0.0069 Epoch: 17 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:20:34,449-Speed 18604.09 samples/sec Loss 4.9135 LearningRate 0.0069 Epoch: 17 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:20:38,871-Speed 18532.66 samples/sec Loss 4.8644 LearningRate 0.0069 Epoch: 17 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:20:43,387-Speed 18146.05 samples/sec Loss 4.8970 LearningRate 0.0068 Epoch: 17 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:20:47,798-Speed 18578.55 samples/sec Loss 4.9016 LearningRate 0.0068 Epoch: 17 Global Step: 91480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:52,327-Speed 18091.92 samples/sec Loss 4.9561 LearningRate 0.0068 Epoch: 17 Global Step: 91490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:20:56,770-Speed 18442.84 samples/sec Loss 4.8848 LearningRate 0.0068 Epoch: 17 Global Step: 91500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:01,195-Speed 18519.80 samples/sec Loss 4.9049 LearningRate 0.0068 Epoch: 17 Global Step: 91510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:05,666-Speed 18323.82 samples/sec Loss 4.9304 LearningRate 0.0068 Epoch: 17 Global Step: 91520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:10,061-Speed 18646.44 samples/sec Loss 4.9501 LearningRate 0.0068 Epoch: 17 Global Step: 91530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:14,448-Speed 18678.51 samples/sec Loss 4.9064 LearningRate 0.0068 Epoch: 17 Global Step: 91540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:18,896-Speed 18424.02 samples/sec Loss 4.9304 LearningRate 0.0068 Epoch: 17 Global Step: 91550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:23,309-Speed 18565.13 samples/sec Loss 4.9051 LearningRate 0.0067 Epoch: 17 Global Step: 91560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:27,732-Speed 18525.24 samples/sec Loss 4.9128 LearningRate 0.0067 Epoch: 17 Global Step: 91570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:32,144-Speed 18574.16 samples/sec Loss 4.9295 LearningRate 0.0067 Epoch: 17 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-14 09:21:36,540-Speed 18638.63 samples/sec Loss 4.8991 LearningRate 0.0067 Epoch: 17 Global Step: 91590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:40,949-Speed 18585.17 samples/sec Loss 4.9372 LearningRate 0.0067 Epoch: 17 Global Step: 91600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:45,383-Speed 18479.58 samples/sec Loss 4.9027 LearningRate 0.0067 Epoch: 17 Global Step: 91610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:49,794-Speed 18577.90 samples/sec Loss 4.8787 LearningRate 0.0067 Epoch: 17 Global Step: 91620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-14 09:21:54,237-Speed 18441.37 samples/sec Loss 4.9355 LearningRate 0.0067 Epoch: 17 Global Step: 91630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:21:58,666-Speed 18502.55 samples/sec Loss 4.8870 LearningRate 0.0067 Epoch: 17 Global Step: 91640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:03,076-Speed 18578.90 samples/sec Loss 4.9227 LearningRate 0.0066 Epoch: 17 Global Step: 91650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:07,475-Speed 18625.49 samples/sec Loss 4.8895 LearningRate 0.0066 Epoch: 17 Global Step: 91660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:11,957-Speed 18281.96 samples/sec Loss 4.8934 LearningRate 0.0066 Epoch: 17 Global Step: 91670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:16,416-Speed 18376.44 samples/sec Loss 4.9043 LearningRate 0.0066 Epoch: 17 Global Step: 91680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:20,836-Speed 18538.20 samples/sec Loss 4.9063 LearningRate 0.0066 Epoch: 17 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:22:25,215-Speed 18713.56 samples/sec Loss 4.8838 LearningRate 0.0066 Epoch: 17 Global Step: 91700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:29,600-Speed 18701.02 samples/sec Loss 4.9379 LearningRate 0.0066 Epoch: 17 Global Step: 91710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:33,986-Speed 18681.37 samples/sec Loss 4.9181 LearningRate 0.0066 Epoch: 17 Global Step: 91720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:38,397-Speed 18576.89 samples/sec Loss 4.9182 LearningRate 0.0066 Epoch: 17 Global Step: 91730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:42,827-Speed 18498.47 samples/sec Loss 4.9149 LearningRate 0.0065 Epoch: 17 Global Step: 91740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:47,328-Speed 18202.58 samples/sec Loss 4.8904 LearningRate 0.0065 Epoch: 17 Global Step: 91750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:51,744-Speed 18558.92 samples/sec Loss 4.8945 LearningRate 0.0065 Epoch: 17 Global Step: 91760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:22:56,148-Speed 18607.47 samples/sec Loss 4.9258 LearningRate 0.0065 Epoch: 17 Global Step: 91770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:00,555-Speed 18594.87 samples/sec Loss 4.8938 LearningRate 0.0065 Epoch: 17 Global Step: 91780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:04,959-Speed 18604.52 samples/sec Loss 4.8821 LearningRate 0.0065 Epoch: 17 Global Step: 91790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:09,375-Speed 18556.67 samples/sec Loss 4.8538 LearningRate 0.0065 Epoch: 17 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:23:13,794-Speed 18543.16 samples/sec Loss 4.9032 LearningRate 0.0065 Epoch: 17 Global Step: 91810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:18,252-Speed 18382.31 samples/sec Loss 4.9080 LearningRate 0.0065 Epoch: 17 Global Step: 91820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:22,726-Speed 18312.23 samples/sec Loss 4.8820 LearningRate 0.0065 Epoch: 17 Global Step: 91830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:27,170-Speed 18439.96 samples/sec Loss 4.9027 LearningRate 0.0064 Epoch: 17 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:31,576-Speed 18598.66 samples/sec Loss 4.9210 LearningRate 0.0064 Epoch: 17 Global Step: 91850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:36,000-Speed 18519.01 samples/sec Loss 4.8851 LearningRate 0.0064 Epoch: 17 Global Step: 91860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:40,460-Speed 18378.92 samples/sec Loss 4.9220 LearningRate 0.0064 Epoch: 17 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:44,841-Speed 18706.31 samples/sec Loss 4.9142 LearningRate 0.0064 Epoch: 17 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:49,257-Speed 18555.35 samples/sec Loss 4.9031 LearningRate 0.0064 Epoch: 17 Global Step: 91890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:53,755-Speed 18218.89 samples/sec Loss 4.8783 LearningRate 0.0064 Epoch: 17 Global Step: 91900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:23:58,183-Speed 18508.11 samples/sec Loss 4.8945 LearningRate 0.0064 Epoch: 17 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:24:02,624-Speed 18449.63 samples/sec Loss 4.8835 LearningRate 0.0064 Epoch: 17 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:24:07,024-Speed 18619.90 samples/sec Loss 4.8919 LearningRate 0.0063 Epoch: 17 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:24:11,419-Speed 18644.38 samples/sec Loss 4.8842 LearningRate 0.0063 Epoch: 17 Global Step: 91940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:15,818-Speed 18630.01 samples/sec Loss 4.8768 LearningRate 0.0063 Epoch: 17 Global Step: 91950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:20,247-Speed 18502.58 samples/sec Loss 4.8696 LearningRate 0.0063 Epoch: 17 Global Step: 91960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:24,684-Speed 18465.28 samples/sec Loss 4.9220 LearningRate 0.0063 Epoch: 17 Global Step: 91970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:29,130-Speed 18428.51 samples/sec Loss 4.8893 LearningRate 0.0063 Epoch: 17 Global Step: 91980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:33,539-Speed 18585.42 samples/sec Loss 4.9122 LearningRate 0.0063 Epoch: 17 Global Step: 91990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:37,970-Speed 18492.37 samples/sec Loss 4.8886 LearningRate 0.0063 Epoch: 17 Global Step: 92000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:42,373-Speed 18612.45 samples/sec Loss 4.8965 LearningRate 0.0063 Epoch: 17 Global Step: 92010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:46,823-Speed 18417.52 samples/sec Loss 4.8670 LearningRate 0.0062 Epoch: 17 Global Step: 92020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:51,236-Speed 18568.84 samples/sec Loss 4.8769 LearningRate 0.0062 Epoch: 17 Global Step: 92030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:24:55,640-Speed 18601.89 samples/sec Loss 4.8966 LearningRate 0.0062 Epoch: 17 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:25:00,072-Speed 18490.81 samples/sec Loss 4.8782 LearningRate 0.0062 Epoch: 17 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:25:04,557-Speed 18268.82 samples/sec Loss 4.8851 LearningRate 0.0062 Epoch: 17 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:25:08,966-Speed 18587.05 samples/sec Loss 4.8912 LearningRate 0.0062 Epoch: 17 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:25:13,362-Speed 18640.77 samples/sec Loss 4.9061 LearningRate 0.0062 Epoch: 17 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:25:17,774-Speed 18571.64 samples/sec Loss 4.8860 LearningRate 0.0062 Epoch: 17 Global Step: 92090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:22,158-Speed 18694.67 samples/sec Loss 4.8688 LearningRate 0.0062 Epoch: 17 Global Step: 92100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:26,572-Speed 18561.49 samples/sec Loss 4.8731 LearningRate 0.0061 Epoch: 17 Global Step: 92110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:30,969-Speed 18636.21 samples/sec Loss 4.9120 LearningRate 0.0061 Epoch: 17 Global Step: 92120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:35,366-Speed 18637.08 samples/sec Loss 4.8788 LearningRate 0.0061 Epoch: 17 Global Step: 92130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:39,804-Speed 18464.54 samples/sec Loss 4.8939 LearningRate 0.0061 Epoch: 17 Global Step: 92140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:44,266-Speed 18372.56 samples/sec Loss 4.9417 LearningRate 0.0061 Epoch: 17 Global Step: 92150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:48,669-Speed 18609.67 samples/sec Loss 4.8728 LearningRate 0.0061 Epoch: 17 Global Step: 92160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:53,078-Speed 18585.55 samples/sec Loss 4.9007 LearningRate 0.0061 Epoch: 17 Global Step: 92170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:25:57,491-Speed 18568.63 samples/sec Loss 4.9166 LearningRate 0.0061 Epoch: 17 Global Step: 92180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:26:01,915-Speed 18525.26 samples/sec Loss 4.8851 LearningRate 0.0061 Epoch: 17 Global Step: 92190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:26:06,358-Speed 18440.43 samples/sec Loss 4.8702 LearningRate 0.0061 Epoch: 17 Global Step: 92200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:26:10,816-Speed 18382.55 samples/sec Loss 4.9306 LearningRate 0.0060 Epoch: 17 Global Step: 92210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:26:15,264-Speed 18418.59 samples/sec Loss 4.8757 LearningRate 0.0060 Epoch: 17 Global Step: 92220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:26:19,777-Speed 18158.17 samples/sec Loss 4.8743 LearningRate 0.0060 Epoch: 17 Global Step: 92230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:26:24,167-Speed 18665.98 samples/sec Loss 4.8664 LearningRate 0.0060 Epoch: 17 Global Step: 92240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:26:28,666-Speed 18217.10 samples/sec Loss 4.8698 LearningRate 0.0060 Epoch: 17 Global Step: 92250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:26:33,090-Speed 18520.46 samples/sec Loss 4.9071 LearningRate 0.0060 Epoch: 17 Global Step: 92260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:26:37,500-Speed 18580.00 samples/sec Loss 4.8745 LearningRate 0.0060 Epoch: 17 Global Step: 92270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:26:41,924-Speed 18530.97 samples/sec Loss 4.9023 LearningRate 0.0060 Epoch: 17 Global Step: 92280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:26:46,354-Speed 18496.88 samples/sec Loss 4.8757 LearningRate 0.0060 Epoch: 17 Global Step: 92290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:26:50,773-Speed 18544.70 samples/sec Loss 4.9095 LearningRate 0.0059 Epoch: 17 Global Step: 92300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:26:55,185-Speed 18575.22 samples/sec Loss 4.9173 LearningRate 0.0059 Epoch: 17 Global Step: 92310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:26:59,597-Speed 18572.15 samples/sec Loss 4.9037 LearningRate 0.0059 Epoch: 17 Global Step: 92320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:27:04,039-Speed 18444.37 samples/sec Loss 4.8601 LearningRate 0.0059 Epoch: 17 Global Step: 92330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:27:08,471-Speed 18489.11 samples/sec Loss 4.8402 LearningRate 0.0059 Epoch: 17 Global Step: 92340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:27:12,893-Speed 18535.32 samples/sec Loss 4.9140 LearningRate 0.0059 Epoch: 17 Global Step: 92350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:27:17,340-Speed 18427.24 samples/sec Loss 4.8822 LearningRate 0.0059 Epoch: 17 Global Step: 92360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:21,734-Speed 18649.74 samples/sec Loss 4.8932 LearningRate 0.0059 Epoch: 17 Global Step: 92370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:26,185-Speed 18411.40 samples/sec Loss 4.8838 LearningRate 0.0059 Epoch: 17 Global Step: 92380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:30,583-Speed 18633.40 samples/sec Loss 4.9021 LearningRate 0.0059 Epoch: 17 Global Step: 92390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:34,980-Speed 18634.99 samples/sec Loss 4.9107 LearningRate 0.0058 Epoch: 17 Global Step: 92400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:39,395-Speed 18562.19 samples/sec Loss 4.8729 LearningRate 0.0058 Epoch: 17 Global Step: 92410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:43,811-Speed 18557.00 samples/sec Loss 4.8640 LearningRate 0.0058 Epoch: 17 Global Step: 92420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:48,215-Speed 18606.14 samples/sec Loss 4.8983 LearningRate 0.0058 Epoch: 17 Global Step: 92430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:52,647-Speed 18491.36 samples/sec Loss 4.8842 LearningRate 0.0058 Epoch: 17 Global Step: 92440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:27:57,088-Speed 18450.27 samples/sec Loss 4.9056 LearningRate 0.0058 Epoch: 17 Global Step: 92450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:01,529-Speed 18450.59 samples/sec Loss 4.8761 LearningRate 0.0058 Epoch: 17 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:28:05,932-Speed 18613.48 samples/sec Loss 4.8616 LearningRate 0.0058 Epoch: 17 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:28:10,340-Speed 18588.91 samples/sec Loss 4.8922 LearningRate 0.0058 Epoch: 17 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:28:14,766-Speed 18514.53 samples/sec Loss 4.8552 LearningRate 0.0058 Epoch: 17 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:19,231-Speed 18353.41 samples/sec Loss 4.8885 LearningRate 0.0057 Epoch: 17 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:23,668-Speed 18467.93 samples/sec Loss 4.8638 LearningRate 0.0057 Epoch: 17 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:28,158-Speed 18252.35 samples/sec Loss 4.8287 LearningRate 0.0057 Epoch: 17 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:32,656-Speed 18215.34 samples/sec Loss 4.8735 LearningRate 0.0057 Epoch: 17 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:37,074-Speed 18549.79 samples/sec Loss 4.8799 LearningRate 0.0057 Epoch: 17 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:41,519-Speed 18434.95 samples/sec Loss 4.8729 LearningRate 0.0057 Epoch: 17 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:45,954-Speed 18473.20 samples/sec Loss 4.8830 LearningRate 0.0057 Epoch: 17 Global Step: 92560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:50,448-Speed 18236.53 samples/sec Loss 4.8851 LearningRate 0.0057 Epoch: 17 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:54,881-Speed 18481.75 samples/sec Loss 4.8768 LearningRate 0.0057 Epoch: 17 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:28:59,275-Speed 18650.47 samples/sec Loss 4.8613 LearningRate 0.0056 Epoch: 17 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:29:03,692-Speed 18550.33 samples/sec Loss 4.8905 LearningRate 0.0056 Epoch: 17 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:29:08,158-Speed 18349.37 samples/sec Loss 4.8854 LearningRate 0.0056 Epoch: 17 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:29:12,543-Speed 18687.17 samples/sec Loss 4.8832 LearningRate 0.0056 Epoch: 17 Global Step: 92620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:16,920-Speed 18719.12 samples/sec Loss 4.8661 LearningRate 0.0056 Epoch: 17 Global Step: 92630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:21,315-Speed 18646.31 samples/sec Loss 4.8910 LearningRate 0.0056 Epoch: 17 Global Step: 92640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:25,766-Speed 18409.83 samples/sec Loss 4.8647 LearningRate 0.0056 Epoch: 17 Global Step: 92650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:30,167-Speed 18623.95 samples/sec Loss 4.8735 LearningRate 0.0056 Epoch: 17 Global Step: 92660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:34,565-Speed 18628.49 samples/sec Loss 4.8901 LearningRate 0.0056 Epoch: 17 Global Step: 92670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:38,983-Speed 18549.85 samples/sec Loss 4.9164 LearningRate 0.0056 Epoch: 17 Global Step: 92680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:43,415-Speed 18487.54 samples/sec Loss 4.8763 LearningRate 0.0055 Epoch: 17 Global Step: 92690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:47,816-Speed 18616.17 samples/sec Loss 4.8533 LearningRate 0.0055 Epoch: 17 Global Step: 92700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:52,220-Speed 18610.27 samples/sec Loss 4.9056 LearningRate 0.0055 Epoch: 17 Global Step: 92710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:29:56,674-Speed 18398.66 samples/sec Loss 4.8864 LearningRate 0.0055 Epoch: 17 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:30:01,083-Speed 18586.54 samples/sec Loss 4.8463 LearningRate 0.0055 Epoch: 17 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:30:05,493-Speed 18578.30 samples/sec Loss 4.8684 LearningRate 0.0055 Epoch: 17 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:30:09,919-Speed 18514.27 samples/sec Loss 4.8630 LearningRate 0.0055 Epoch: 17 Global Step: 92750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:30:14,327-Speed 18589.24 samples/sec Loss 4.9038 LearningRate 0.0055 Epoch: 17 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:30:18,755-Speed 18505.16 samples/sec Loss 4.8911 LearningRate 0.0055 Epoch: 17 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:30:23,176-Speed 18535.39 samples/sec Loss 4.8576 LearningRate 0.0055 Epoch: 17 Global Step: 92780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:27,621-Speed 18436.15 samples/sec Loss 4.8815 LearningRate 0.0054 Epoch: 17 Global Step: 92790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:32,197-Speed 17903.67 samples/sec Loss 4.8593 LearningRate 0.0054 Epoch: 17 Global Step: 92800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:36,648-Speed 18410.19 samples/sec Loss 4.8984 LearningRate 0.0054 Epoch: 17 Global Step: 92810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:41,086-Speed 18472.09 samples/sec Loss 4.8738 LearningRate 0.0054 Epoch: 17 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:45,509-Speed 18526.06 samples/sec Loss 4.8520 LearningRate 0.0054 Epoch: 17 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:49,902-Speed 18657.75 samples/sec Loss 4.8798 LearningRate 0.0054 Epoch: 17 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:54,325-Speed 18529.23 samples/sec Loss 4.8450 LearningRate 0.0054 Epoch: 17 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:30:58,739-Speed 18569.71 samples/sec Loss 4.8688 LearningRate 0.0054 Epoch: 17 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:03,223-Speed 18274.48 samples/sec Loss 4.8826 LearningRate 0.0054 Epoch: 17 Global Step: 92870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:07,647-Speed 18520.89 samples/sec Loss 4.8567 LearningRate 0.0054 Epoch: 17 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:31:12,069-Speed 18528.49 samples/sec Loss 4.8423 LearningRate 0.0053 Epoch: 17 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:31:16,512-Speed 18444.36 samples/sec Loss 4.8643 LearningRate 0.0053 Epoch: 17 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:31:20,962-Speed 18412.96 samples/sec Loss 4.8651 LearningRate 0.0053 Epoch: 17 Global Step: 92910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:31:25,372-Speed 18582.64 samples/sec Loss 4.8870 LearningRate 0.0053 Epoch: 17 Global Step: 92920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:29,788-Speed 18559.60 samples/sec Loss 4.8619 LearningRate 0.0053 Epoch: 17 Global Step: 92930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:34,208-Speed 18554.22 samples/sec Loss 4.8579 LearningRate 0.0053 Epoch: 17 Global Step: 92940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:38,618-Speed 18580.01 samples/sec Loss 4.8585 LearningRate 0.0053 Epoch: 17 Global Step: 92950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:43,067-Speed 18420.28 samples/sec Loss 4.8337 LearningRate 0.0053 Epoch: 17 Global Step: 92960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:47,471-Speed 18616.27 samples/sec Loss 4.8619 LearningRate 0.0053 Epoch: 17 Global Step: 92970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:51,872-Speed 18621.03 samples/sec Loss 4.8558 LearningRate 0.0053 Epoch: 17 Global Step: 92980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:31:56,293-Speed 18530.65 samples/sec Loss 4.8646 LearningRate 0.0052 Epoch: 17 Global Step: 92990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:32:00,718-Speed 18520.58 samples/sec Loss 4.8403 LearningRate 0.0052 Epoch: 17 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:32:05,152-Speed 18477.32 samples/sec Loss 4.8929 LearningRate 0.0052 Epoch: 17 Global Step: 93010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:32:09,542-Speed 18666.64 samples/sec Loss 4.8970 LearningRate 0.0052 Epoch: 17 Global Step: 93020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:13,973-Speed 18493.79 samples/sec Loss 4.8769 LearningRate 0.0052 Epoch: 17 Global Step: 93030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:18,450-Speed 18302.10 samples/sec Loss 4.8859 LearningRate 0.0052 Epoch: 17 Global Step: 93040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:22,865-Speed 18562.09 samples/sec Loss 4.8778 LearningRate 0.0052 Epoch: 17 Global Step: 93050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:27,261-Speed 18636.46 samples/sec Loss 4.8581 LearningRate 0.0052 Epoch: 17 Global Step: 93060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:31,682-Speed 18534.16 samples/sec Loss 4.8551 LearningRate 0.0052 Epoch: 17 Global Step: 93070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:36,112-Speed 18498.38 samples/sec Loss 4.8600 LearningRate 0.0052 Epoch: 17 Global Step: 93080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:40,522-Speed 18580.92 samples/sec Loss 4.8441 LearningRate 0.0052 Epoch: 17 Global Step: 93090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:44,951-Speed 18498.54 samples/sec Loss 4.8835 LearningRate 0.0051 Epoch: 17 Global Step: 93100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:49,350-Speed 18627.78 samples/sec Loss 4.8416 LearningRate 0.0051 Epoch: 17 Global Step: 93110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:32:53,773-Speed 18525.66 samples/sec Loss 4.8689 LearningRate 0.0051 Epoch: 17 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:32:58,174-Speed 18623.48 samples/sec Loss 4.8657 LearningRate 0.0051 Epoch: 17 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:02,601-Speed 18506.24 samples/sec Loss 4.8380 LearningRate 0.0051 Epoch: 17 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:07,019-Speed 18561.63 samples/sec Loss 4.8843 LearningRate 0.0051 Epoch: 17 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:11,441-Speed 18537.98 samples/sec Loss 4.8760 LearningRate 0.0051 Epoch: 17 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:15,861-Speed 18542.11 samples/sec Loss 4.8580 LearningRate 0.0051 Epoch: 17 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:20,299-Speed 18462.31 samples/sec Loss 4.8777 LearningRate 0.0051 Epoch: 17 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:24,693-Speed 18647.11 samples/sec Loss 4.8509 LearningRate 0.0051 Epoch: 17 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:29,102-Speed 18590.49 samples/sec Loss 4.8621 LearningRate 0.0050 Epoch: 17 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:33,590-Speed 18263.44 samples/sec Loss 4.8765 LearningRate 0.0050 Epoch: 17 Global Step: 93210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:38,050-Speed 18372.62 samples/sec Loss 4.8726 LearningRate 0.0050 Epoch: 17 Global Step: 93220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:42,483-Speed 18482.56 samples/sec Loss 4.8580 LearningRate 0.0050 Epoch: 17 Global Step: 93230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:46,883-Speed 18623.64 samples/sec Loss 4.8498 LearningRate 0.0050 Epoch: 17 Global Step: 93240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:33:51,265-Speed 18699.15 samples/sec Loss 4.8224 LearningRate 0.0050 Epoch: 17 Global Step: 93250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:33:55,703-Speed 18464.65 samples/sec Loss 4.8753 LearningRate 0.0050 Epoch: 17 Global Step: 93260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:00,143-Speed 18456.75 samples/sec Loss 4.8705 LearningRate 0.0050 Epoch: 17 Global Step: 93270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:04,645-Speed 18203.13 samples/sec Loss 4.8759 LearningRate 0.0050 Epoch: 17 Global Step: 93280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:09,075-Speed 18502.23 samples/sec Loss 4.8692 LearningRate 0.0050 Epoch: 17 Global Step: 93290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:13,479-Speed 18623.26 samples/sec Loss 4.8722 LearningRate 0.0049 Epoch: 17 Global Step: 93300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:17,903-Speed 18522.68 samples/sec Loss 4.8834 LearningRate 0.0049 Epoch: 17 Global Step: 93310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:22,314-Speed 18574.95 samples/sec Loss 4.8574 LearningRate 0.0049 Epoch: 17 Global Step: 93320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:40,674-Speed 4462.27 samples/sec Loss 4.8919 LearningRate 0.0049 Epoch: 18 Global Step: 93330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:45,154-Speed 18295.01 samples/sec Loss 4.8891 LearningRate 0.0049 Epoch: 18 Global Step: 93340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 09:34:49,580-Speed 18514.63 samples/sec Loss 4.8746 LearningRate 0.0049 Epoch: 18 Global Step: 93350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:34:53,989-Speed 18584.68 samples/sec Loss 4.8373 LearningRate 0.0049 Epoch: 18 Global Step: 93360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:34:58,399-Speed 18582.45 samples/sec Loss 4.8703 LearningRate 0.0049 Epoch: 18 Global Step: 93370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:02,797-Speed 18633.15 samples/sec Loss 4.8678 LearningRate 0.0049 Epoch: 18 Global Step: 93380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:07,216-Speed 18543.17 samples/sec Loss 4.8718 LearningRate 0.0049 Epoch: 18 Global Step: 93390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:11,627-Speed 18578.20 samples/sec Loss 4.8515 LearningRate 0.0049 Epoch: 18 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:16,069-Speed 18450.10 samples/sec Loss 4.8230 LearningRate 0.0048 Epoch: 18 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:20,500-Speed 18490.83 samples/sec Loss 4.8324 LearningRate 0.0048 Epoch: 18 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:24,922-Speed 18532.09 samples/sec Loss 4.8169 LearningRate 0.0048 Epoch: 18 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:29,370-Speed 18424.98 samples/sec Loss 4.8368 LearningRate 0.0048 Epoch: 18 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:33,828-Speed 18381.00 samples/sec Loss 4.8487 LearningRate 0.0048 Epoch: 18 Global Step: 93450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:35:38,215-Speed 18677.73 samples/sec Loss 4.8126 LearningRate 0.0048 Epoch: 18 Global Step: 93460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:42,600-Speed 18683.27 samples/sec Loss 4.8128 LearningRate 0.0048 Epoch: 18 Global Step: 93470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:47,026-Speed 18520.74 samples/sec Loss 4.8162 LearningRate 0.0048 Epoch: 18 Global Step: 93480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:51,459-Speed 18486.23 samples/sec Loss 4.8733 LearningRate 0.0048 Epoch: 18 Global Step: 93490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:35:55,860-Speed 18615.83 samples/sec Loss 4.8440 LearningRate 0.0048 Epoch: 18 Global Step: 93500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:00,270-Speed 18581.88 samples/sec Loss 4.8702 LearningRate 0.0048 Epoch: 18 Global Step: 93510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:04,663-Speed 18653.25 samples/sec Loss 4.8054 LearningRate 0.0047 Epoch: 18 Global Step: 93520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:09,065-Speed 18617.05 samples/sec Loss 4.8704 LearningRate 0.0047 Epoch: 18 Global Step: 93530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:13,554-Speed 18254.03 samples/sec Loss 4.8459 LearningRate 0.0047 Epoch: 18 Global Step: 93540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:18,009-Speed 18390.82 samples/sec Loss 4.8403 LearningRate 0.0047 Epoch: 18 Global Step: 93550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:22,484-Speed 18309.78 samples/sec Loss 4.8773 LearningRate 0.0047 Epoch: 18 Global Step: 93560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:26,935-Speed 18411.09 samples/sec Loss 4.8139 LearningRate 0.0047 Epoch: 18 Global Step: 93570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:31,346-Speed 18574.68 samples/sec Loss 4.8532 LearningRate 0.0047 Epoch: 18 Global Step: 93580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:35,785-Speed 18459.38 samples/sec Loss 4.8517 LearningRate 0.0047 Epoch: 18 Global Step: 93590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:40,215-Speed 18498.55 samples/sec Loss 4.8647 LearningRate 0.0047 Epoch: 18 Global Step: 93600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:44,629-Speed 18560.80 samples/sec Loss 4.8457 LearningRate 0.0047 Epoch: 18 Global Step: 93610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:49,056-Speed 18511.23 samples/sec Loss 4.8698 LearningRate 0.0046 Epoch: 18 Global Step: 93620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:53,483-Speed 18509.24 samples/sec Loss 4.8400 LearningRate 0.0046 Epoch: 18 Global Step: 93630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:36:58,001-Speed 18135.87 samples/sec Loss 4.8548 LearningRate 0.0046 Epoch: 18 Global Step: 93640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:02,436-Speed 18477.35 samples/sec Loss 4.8342 LearningRate 0.0046 Epoch: 18 Global Step: 93650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:06,875-Speed 18457.04 samples/sec Loss 4.8675 LearningRate 0.0046 Epoch: 18 Global Step: 93660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:37:11,288-Speed 18569.03 samples/sec Loss 4.8484 LearningRate 0.0046 Epoch: 18 Global Step: 93670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:15,721-Speed 18484.60 samples/sec Loss 4.8725 LearningRate 0.0046 Epoch: 18 Global Step: 93680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:20,200-Speed 18297.99 samples/sec Loss 4.8534 LearningRate 0.0046 Epoch: 18 Global Step: 93690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:24,586-Speed 18683.61 samples/sec Loss 4.8270 LearningRate 0.0046 Epoch: 18 Global Step: 93700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:28,991-Speed 18598.81 samples/sec Loss 4.8727 LearningRate 0.0046 Epoch: 18 Global Step: 93710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:33,403-Speed 18586.09 samples/sec Loss 4.8233 LearningRate 0.0046 Epoch: 18 Global Step: 93720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:37,816-Speed 18568.29 samples/sec Loss 4.9029 LearningRate 0.0045 Epoch: 18 Global Step: 93730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:42,246-Speed 18497.89 samples/sec Loss 4.8658 LearningRate 0.0045 Epoch: 18 Global Step: 93740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:46,705-Speed 18376.24 samples/sec Loss 4.8342 LearningRate 0.0045 Epoch: 18 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:51,148-Speed 18446.85 samples/sec Loss 4.8415 LearningRate 0.0045 Epoch: 18 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:37:55,630-Speed 18278.46 samples/sec Loss 4.8227 LearningRate 0.0045 Epoch: 18 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:38:00,021-Speed 18669.95 samples/sec Loss 4.8799 LearningRate 0.0045 Epoch: 18 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:04,432-Speed 18579.86 samples/sec Loss 4.8620 LearningRate 0.0045 Epoch: 18 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:08,904-Speed 18330.16 samples/sec Loss 4.8186 LearningRate 0.0045 Epoch: 18 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:13,338-Speed 18482.51 samples/sec Loss 4.8497 LearningRate 0.0045 Epoch: 18 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:17,794-Speed 18390.39 samples/sec Loss 4.8360 LearningRate 0.0045 Epoch: 18 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:22,264-Speed 18336.34 samples/sec Loss 4.8551 LearningRate 0.0045 Epoch: 18 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:26,678-Speed 18563.67 samples/sec Loss 4.8348 LearningRate 0.0044 Epoch: 18 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:31,109-Speed 18490.03 samples/sec Loss 4.8576 LearningRate 0.0044 Epoch: 18 Global Step: 93850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:35,548-Speed 18469.55 samples/sec Loss 4.8566 LearningRate 0.0044 Epoch: 18 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:39,941-Speed 18657.67 samples/sec Loss 4.8083 LearningRate 0.0044 Epoch: 18 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:44,468-Speed 18100.97 samples/sec Loss 4.8727 LearningRate 0.0044 Epoch: 18 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:38:48,864-Speed 18642.02 samples/sec Loss 4.8200 LearningRate 0.0044 Epoch: 18 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:53,274-Speed 18577.27 samples/sec Loss 4.8443 LearningRate 0.0044 Epoch: 18 Global Step: 93900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:38:57,659-Speed 18690.23 samples/sec Loss 4.8245 LearningRate 0.0044 Epoch: 18 Global Step: 93910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:02,114-Speed 18392.97 samples/sec Loss 4.8614 LearningRate 0.0044 Epoch: 18 Global Step: 93920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:06,595-Speed 18287.16 samples/sec Loss 4.8634 LearningRate 0.0044 Epoch: 18 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:11,022-Speed 18508.87 samples/sec Loss 4.8545 LearningRate 0.0044 Epoch: 18 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:15,405-Speed 18696.01 samples/sec Loss 4.8563 LearningRate 0.0043 Epoch: 18 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:19,864-Speed 18376.25 samples/sec Loss 4.8225 LearningRate 0.0043 Epoch: 18 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:24,272-Speed 18594.69 samples/sec Loss 4.8079 LearningRate 0.0043 Epoch: 18 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:28,678-Speed 18602.21 samples/sec Loss 4.8036 LearningRate 0.0043 Epoch: 18 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:39:33,123-Speed 18433.93 samples/sec Loss 4.8374 LearningRate 0.0043 Epoch: 18 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:39:37,518-Speed 18640.67 samples/sec Loss 4.8047 LearningRate 0.0043 Epoch: 18 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:39:41,998-Speed 18298.07 samples/sec Loss 4.8116 LearningRate 0.0043 Epoch: 18 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:39:46,461-Speed 18362.39 samples/sec Loss 4.8155 LearningRate 0.0043 Epoch: 18 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:39:50,904-Speed 18444.28 samples/sec Loss 4.8635 LearningRate 0.0043 Epoch: 18 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:39:55,302-Speed 18633.52 samples/sec Loss 4.8234 LearningRate 0.0043 Epoch: 18 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:39:59,687-Speed 18689.45 samples/sec Loss 4.8550 LearningRate 0.0043 Epoch: 18 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:40:04,158-Speed 18327.31 samples/sec Loss 4.8431 LearningRate 0.0043 Epoch: 18 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:40:08,617-Speed 18379.18 samples/sec Loss 4.8603 LearningRate 0.0042 Epoch: 18 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:40:13,074-Speed 18382.75 samples/sec Loss 4.8369 LearningRate 0.0042 Epoch: 18 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:40:17,509-Speed 18481.48 samples/sec Loss 4.8290 LearningRate 0.0042 Epoch: 18 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 09:40:21,977-Speed 18339.12 samples/sec Loss 4.8406 LearningRate 0.0042 Epoch: 18 Global Step: 94100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:26,366-Speed 18676.83 samples/sec Loss 4.8059 LearningRate 0.0042 Epoch: 18 Global Step: 94110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:30,763-Speed 18635.54 samples/sec Loss 4.8198 LearningRate 0.0042 Epoch: 18 Global Step: 94120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:35,151-Speed 18675.89 samples/sec Loss 4.8079 LearningRate 0.0042 Epoch: 18 Global Step: 94130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:39,585-Speed 18487.28 samples/sec Loss 4.8147 LearningRate 0.0042 Epoch: 18 Global Step: 94140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:44,019-Speed 18480.27 samples/sec Loss 4.8140 LearningRate 0.0042 Epoch: 18 Global Step: 94150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:48,443-Speed 18525.94 samples/sec Loss 4.8498 LearningRate 0.0042 Epoch: 18 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:52,872-Speed 18499.70 samples/sec Loss 4.8215 LearningRate 0.0042 Epoch: 18 Global Step: 94170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:40:57,279-Speed 18592.79 samples/sec Loss 4.8061 LearningRate 0.0041 Epoch: 18 Global Step: 94180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:01,698-Speed 18547.81 samples/sec Loss 4.8031 LearningRate 0.0041 Epoch: 18 Global Step: 94190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:06,154-Speed 18385.94 samples/sec Loss 4.8421 LearningRate 0.0041 Epoch: 18 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:41:10,550-Speed 18640.71 samples/sec Loss 4.8286 LearningRate 0.0041 Epoch: 18 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:41:14,980-Speed 18497.76 samples/sec Loss 4.8106 LearningRate 0.0041 Epoch: 18 Global Step: 94220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:41:19,395-Speed 18560.20 samples/sec Loss 4.8252 LearningRate 0.0041 Epoch: 18 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:41:23,804-Speed 18585.69 samples/sec Loss 4.8419 LearningRate 0.0041 Epoch: 18 Global Step: 94240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:28,221-Speed 18550.34 samples/sec Loss 4.8219 LearningRate 0.0041 Epoch: 18 Global Step: 94250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:32,616-Speed 18643.50 samples/sec Loss 4.8875 LearningRate 0.0041 Epoch: 18 Global Step: 94260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:37,008-Speed 18660.19 samples/sec Loss 4.8084 LearningRate 0.0041 Epoch: 18 Global Step: 94270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:41,408-Speed 18621.53 samples/sec Loss 4.8508 LearningRate 0.0041 Epoch: 18 Global Step: 94280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:45,856-Speed 18426.33 samples/sec Loss 4.8397 LearningRate 0.0041 Epoch: 18 Global Step: 94290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:41:57,701-Speed 6917.03 samples/sec Loss 4.8233 LearningRate 0.0040 Epoch: 18 Global Step: 94300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:42:02,152-Speed 18406.00 samples/sec Loss 4.7650 LearningRate 0.0040 Epoch: 18 Global Step: 94310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:42:06,542-Speed 18670.89 samples/sec Loss 4.7938 LearningRate 0.0040 Epoch: 18 Global Step: 94320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:42:10,921-Speed 18716.26 samples/sec Loss 4.8259 LearningRate 0.0040 Epoch: 18 Global Step: 94330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:42:15,325-Speed 18603.46 samples/sec Loss 4.8053 LearningRate 0.0040 Epoch: 18 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:42:19,739-Speed 18564.02 samples/sec Loss 4.8295 LearningRate 0.0040 Epoch: 18 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:42:24,168-Speed 18502.42 samples/sec Loss 4.8334 LearningRate 0.0040 Epoch: 18 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:42:28,579-Speed 18575.85 samples/sec Loss 4.8270 LearningRate 0.0040 Epoch: 18 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:42:32,976-Speed 18635.44 samples/sec Loss 4.8058 LearningRate 0.0040 Epoch: 18 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:42:37,440-Speed 18357.65 samples/sec Loss 4.8398 LearningRate 0.0040 Epoch: 18 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:42:41,909-Speed 18334.91 samples/sec Loss 4.8051 LearningRate 0.0040 Epoch: 18 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:42:46,319-Speed 18577.97 samples/sec Loss 4.8232 LearningRate 0.0039 Epoch: 18 Global Step: 94410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:42:50,714-Speed 18648.33 samples/sec Loss 4.8492 LearningRate 0.0039 Epoch: 18 Global Step: 94420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:42:55,145-Speed 18493.86 samples/sec Loss 4.8187 LearningRate 0.0039 Epoch: 18 Global Step: 94430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:42:59,583-Speed 18461.41 samples/sec Loss 4.8700 LearningRate 0.0039 Epoch: 18 Global Step: 94440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:03,977-Speed 18650.36 samples/sec Loss 4.8506 LearningRate 0.0039 Epoch: 18 Global Step: 94450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:08,399-Speed 18530.22 samples/sec Loss 4.8704 LearningRate 0.0039 Epoch: 18 Global Step: 94460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:12,869-Speed 18329.75 samples/sec Loss 4.8038 LearningRate 0.0039 Epoch: 18 Global Step: 94470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:17,301-Speed 18490.97 samples/sec Loss 4.8593 LearningRate 0.0039 Epoch: 18 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:21,705-Speed 18604.09 samples/sec Loss 4.7831 LearningRate 0.0039 Epoch: 18 Global Step: 94490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:26,122-Speed 18550.61 samples/sec Loss 4.8117 LearningRate 0.0039 Epoch: 18 Global Step: 94500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:30,518-Speed 18642.18 samples/sec Loss 4.8137 LearningRate 0.0039 Epoch: 18 Global Step: 94510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:43:34,952-Speed 18483.63 samples/sec Loss 4.8443 LearningRate 0.0039 Epoch: 18 Global Step: 94520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:43:39,430-Speed 18297.70 samples/sec Loss 4.8293 LearningRate 0.0038 Epoch: 18 Global Step: 94530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:43:43,859-Speed 18511.88 samples/sec Loss 4.8338 LearningRate 0.0038 Epoch: 18 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:48,293-Speed 18486.20 samples/sec Loss 4.8217 LearningRate 0.0038 Epoch: 18 Global Step: 94550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:52,762-Speed 18338.64 samples/sec Loss 4.8310 LearningRate 0.0038 Epoch: 18 Global Step: 94560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:43:57,161-Speed 18629.43 samples/sec Loss 4.8206 LearningRate 0.0038 Epoch: 18 Global Step: 94570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:01,567-Speed 18596.29 samples/sec Loss 4.7798 LearningRate 0.0038 Epoch: 18 Global Step: 94580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:05,989-Speed 18530.86 samples/sec Loss 4.8189 LearningRate 0.0038 Epoch: 18 Global Step: 94590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:10,422-Speed 18488.21 samples/sec Loss 4.8054 LearningRate 0.0038 Epoch: 18 Global Step: 94600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:14,903-Speed 18285.34 samples/sec Loss 4.8440 LearningRate 0.0038 Epoch: 18 Global Step: 94610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:19,392-Speed 18253.86 samples/sec Loss 4.8071 LearningRate 0.0038 Epoch: 18 Global Step: 94620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:23,809-Speed 18552.98 samples/sec Loss 4.7731 LearningRate 0.0038 Epoch: 18 Global Step: 94630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:28,219-Speed 18580.64 samples/sec Loss 4.8270 LearningRate 0.0038 Epoch: 18 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:32,698-Speed 18296.25 samples/sec Loss 4.8308 LearningRate 0.0037 Epoch: 18 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:37,148-Speed 18411.96 samples/sec Loss 4.8502 LearningRate 0.0037 Epoch: 18 Global Step: 94660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:41,580-Speed 18490.08 samples/sec Loss 4.8256 LearningRate 0.0037 Epoch: 18 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:45,991-Speed 18579.79 samples/sec Loss 4.7926 LearningRate 0.0037 Epoch: 18 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:50,439-Speed 18422.39 samples/sec Loss 4.7900 LearningRate 0.0037 Epoch: 18 Global Step: 94690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:54,842-Speed 18612.99 samples/sec Loss 4.8436 LearningRate 0.0037 Epoch: 18 Global Step: 94700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:44:59,289-Speed 18432.35 samples/sec Loss 4.8021 LearningRate 0.0037 Epoch: 18 Global Step: 94710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:03,713-Speed 18520.30 samples/sec Loss 4.8388 LearningRate 0.0037 Epoch: 18 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:08,112-Speed 18627.72 samples/sec Loss 4.7921 LearningRate 0.0037 Epoch: 18 Global Step: 94730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:12,551-Speed 18461.80 samples/sec Loss 4.8196 LearningRate 0.0037 Epoch: 18 Global Step: 94740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:45:16,987-Speed 18470.72 samples/sec Loss 4.8243 LearningRate 0.0037 Epoch: 18 Global Step: 94750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:45:21,442-Speed 18395.86 samples/sec Loss 4.8232 LearningRate 0.0037 Epoch: 18 Global Step: 94760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:45:25,841-Speed 18628.97 samples/sec Loss 4.8478 LearningRate 0.0036 Epoch: 18 Global Step: 94770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:30,264-Speed 18525.26 samples/sec Loss 4.8359 LearningRate 0.0036 Epoch: 18 Global Step: 94780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:34,704-Speed 18461.91 samples/sec Loss 4.8020 LearningRate 0.0036 Epoch: 18 Global Step: 94790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:39,122-Speed 18552.72 samples/sec Loss 4.8441 LearningRate 0.0036 Epoch: 18 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:43,509-Speed 18677.86 samples/sec Loss 4.8472 LearningRate 0.0036 Epoch: 18 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:47,949-Speed 18456.50 samples/sec Loss 4.8033 LearningRate 0.0036 Epoch: 18 Global Step: 94820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:52,367-Speed 18546.93 samples/sec Loss 4.8625 LearningRate 0.0036 Epoch: 18 Global Step: 94830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:45:56,842-Speed 18312.26 samples/sec Loss 4.7979 LearningRate 0.0036 Epoch: 18 Global Step: 94840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:01,314-Speed 18323.65 samples/sec Loss 4.8187 LearningRate 0.0036 Epoch: 18 Global Step: 94850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:05,743-Speed 18504.25 samples/sec Loss 4.8049 LearningRate 0.0036 Epoch: 18 Global Step: 94860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:10,208-Speed 18353.37 samples/sec Loss 4.8375 LearningRate 0.0036 Epoch: 18 Global Step: 94870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:46:14,632-Speed 18519.41 samples/sec Loss 4.8400 LearningRate 0.0036 Epoch: 18 Global Step: 94880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:46:19,102-Speed 18334.16 samples/sec Loss 4.8248 LearningRate 0.0035 Epoch: 18 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:46:23,502-Speed 18625.76 samples/sec Loss 4.8350 LearningRate 0.0035 Epoch: 18 Global Step: 94900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:27,947-Speed 18433.16 samples/sec Loss 4.7850 LearningRate 0.0035 Epoch: 18 Global Step: 94910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:32,374-Speed 18510.37 samples/sec Loss 4.7978 LearningRate 0.0035 Epoch: 18 Global Step: 94920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:36,788-Speed 18565.54 samples/sec Loss 4.8351 LearningRate 0.0035 Epoch: 18 Global Step: 94930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:41,193-Speed 18606.80 samples/sec Loss 4.8202 LearningRate 0.0035 Epoch: 18 Global Step: 94940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:45,608-Speed 18563.48 samples/sec Loss 4.8413 LearningRate 0.0035 Epoch: 18 Global Step: 94950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:50,021-Speed 18566.99 samples/sec Loss 4.7864 LearningRate 0.0035 Epoch: 18 Global Step: 94960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:54,456-Speed 18476.43 samples/sec Loss 4.8397 LearningRate 0.0035 Epoch: 18 Global Step: 94970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:46:58,894-Speed 18481.67 samples/sec Loss 4.7902 LearningRate 0.0035 Epoch: 18 Global Step: 94980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:03,312-Speed 18550.47 samples/sec Loss 4.7870 LearningRate 0.0035 Epoch: 18 Global Step: 94990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:07,717-Speed 18599.67 samples/sec Loss 4.8013 LearningRate 0.0035 Epoch: 18 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:47:12,100-Speed 18698.96 samples/sec Loss 4.8131 LearningRate 0.0035 Epoch: 18 Global Step: 95010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:16,497-Speed 18637.36 samples/sec Loss 4.8419 LearningRate 0.0034 Epoch: 18 Global Step: 95020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:20,875-Speed 18718.54 samples/sec Loss 4.7771 LearningRate 0.0034 Epoch: 18 Global Step: 95030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:25,323-Speed 18420.19 samples/sec Loss 4.7879 LearningRate 0.0034 Epoch: 18 Global Step: 95040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:29,752-Speed 18502.72 samples/sec Loss 4.7613 LearningRate 0.0034 Epoch: 18 Global Step: 95050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:34,139-Speed 18679.59 samples/sec Loss 4.8163 LearningRate 0.0034 Epoch: 18 Global Step: 95060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:38,530-Speed 18661.25 samples/sec Loss 4.7929 LearningRate 0.0034 Epoch: 18 Global Step: 95070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:42,966-Speed 18474.86 samples/sec Loss 4.8022 LearningRate 0.0034 Epoch: 18 Global Step: 95080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:47,419-Speed 18401.96 samples/sec Loss 4.8342 LearningRate 0.0034 Epoch: 18 Global Step: 95090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:51,830-Speed 18579.64 samples/sec Loss 4.8270 LearningRate 0.0034 Epoch: 18 Global Step: 95100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:47:56,306-Speed 18303.59 samples/sec Loss 4.8277 LearningRate 0.0034 Epoch: 18 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:48:00,724-Speed 18548.99 samples/sec Loss 4.8197 LearningRate 0.0034 Epoch: 18 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:48:05,119-Speed 18646.00 samples/sec Loss 4.8172 LearningRate 0.0034 Epoch: 18 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:48:09,525-Speed 18600.45 samples/sec Loss 4.8162 LearningRate 0.0034 Epoch: 18 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:13,961-Speed 18471.19 samples/sec Loss 4.7903 LearningRate 0.0033 Epoch: 18 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:18,365-Speed 18609.12 samples/sec Loss 4.7913 LearningRate 0.0033 Epoch: 18 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:22,768-Speed 18610.54 samples/sec Loss 4.7987 LearningRate 0.0033 Epoch: 18 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:27,184-Speed 18557.29 samples/sec Loss 4.7835 LearningRate 0.0033 Epoch: 18 Global Step: 95180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:31,631-Speed 18425.53 samples/sec Loss 4.8112 LearningRate 0.0033 Epoch: 18 Global Step: 95190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:36,088-Speed 18386.27 samples/sec Loss 4.8109 LearningRate 0.0033 Epoch: 18 Global Step: 95200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:40,553-Speed 18355.62 samples/sec Loss 4.8022 LearningRate 0.0033 Epoch: 18 Global Step: 95210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:45,004-Speed 18414.08 samples/sec Loss 4.8210 LearningRate 0.0033 Epoch: 18 Global Step: 95220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:49,455-Speed 18411.00 samples/sec Loss 4.8251 LearningRate 0.0033 Epoch: 18 Global Step: 95230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:48:53,980-Speed 18110.83 samples/sec Loss 4.8265 LearningRate 0.0033 Epoch: 18 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:48:58,434-Speed 18396.76 samples/sec Loss 4.8009 LearningRate 0.0033 Epoch: 18 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:49:02,894-Speed 18377.10 samples/sec Loss 4.8060 LearningRate 0.0033 Epoch: 18 Global Step: 95260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:07,314-Speed 18536.71 samples/sec Loss 4.7981 LearningRate 0.0032 Epoch: 18 Global Step: 95270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:11,740-Speed 18525.18 samples/sec Loss 4.7827 LearningRate 0.0032 Epoch: 18 Global Step: 95280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:16,162-Speed 18530.32 samples/sec Loss 4.7723 LearningRate 0.0032 Epoch: 18 Global Step: 95290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:20,577-Speed 18563.74 samples/sec Loss 4.7790 LearningRate 0.0032 Epoch: 18 Global Step: 95300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:24,983-Speed 18597.85 samples/sec Loss 4.7456 LearningRate 0.0032 Epoch: 18 Global Step: 95310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:29,447-Speed 18354.39 samples/sec Loss 4.8440 LearningRate 0.0032 Epoch: 18 Global Step: 95320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:33,900-Speed 18402.76 samples/sec Loss 4.7867 LearningRate 0.0032 Epoch: 18 Global Step: 95330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:38,337-Speed 18470.12 samples/sec Loss 4.8130 LearningRate 0.0032 Epoch: 18 Global Step: 95340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:42,747-Speed 18580.05 samples/sec Loss 4.8349 LearningRate 0.0032 Epoch: 18 Global Step: 95350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:49:47,147-Speed 18624.55 samples/sec Loss 4.8080 LearningRate 0.0032 Epoch: 18 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:49:51,577-Speed 18503.93 samples/sec Loss 4.8052 LearningRate 0.0032 Epoch: 18 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:49:55,937-Speed 18796.70 samples/sec Loss 4.8137 LearningRate 0.0032 Epoch: 18 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:50:00,389-Speed 18406.38 samples/sec Loss 4.8258 LearningRate 0.0032 Epoch: 18 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:50:04,816-Speed 18510.67 samples/sec Loss 4.8023 LearningRate 0.0031 Epoch: 18 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:50:09,216-Speed 18621.23 samples/sec Loss 4.7711 LearningRate 0.0031 Epoch: 18 Global Step: 95410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:13,666-Speed 18416.89 samples/sec Loss 4.7954 LearningRate 0.0031 Epoch: 18 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:18,090-Speed 18526.66 samples/sec Loss 4.8129 LearningRate 0.0031 Epoch: 18 Global Step: 95430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:22,551-Speed 18367.51 samples/sec Loss 4.8017 LearningRate 0.0031 Epoch: 18 Global Step: 95440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:27,026-Speed 18312.71 samples/sec Loss 4.7824 LearningRate 0.0031 Epoch: 18 Global Step: 95450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:31,523-Speed 18223.07 samples/sec Loss 4.7844 LearningRate 0.0031 Epoch: 18 Global Step: 95460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:36,065-Speed 18040.50 samples/sec Loss 4.7851 LearningRate 0.0031 Epoch: 18 Global Step: 95470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:40,529-Speed 18359.18 samples/sec Loss 4.7652 LearningRate 0.0031 Epoch: 18 Global Step: 95480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:44,940-Speed 18578.30 samples/sec Loss 4.8234 LearningRate 0.0031 Epoch: 18 Global Step: 95490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:49,384-Speed 18440.59 samples/sec Loss 4.8210 LearningRate 0.0031 Epoch: 18 Global Step: 95500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:50:53,829-Speed 18434.75 samples/sec Loss 4.7687 LearningRate 0.0031 Epoch: 18 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:50:58,239-Speed 18582.60 samples/sec Loss 4.7887 LearningRate 0.0031 Epoch: 18 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:02,667-Speed 18505.26 samples/sec Loss 4.8130 LearningRate 0.0031 Epoch: 18 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:11,290-Speed 9501.29 samples/sec Loss 4.7681 LearningRate 0.0030 Epoch: 18 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:15,740-Speed 18418.84 samples/sec Loss 4.8068 LearningRate 0.0030 Epoch: 18 Global Step: 95550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:20,190-Speed 18411.36 samples/sec Loss 4.7947 LearningRate 0.0030 Epoch: 18 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:24,598-Speed 18590.51 samples/sec Loss 4.8065 LearningRate 0.0030 Epoch: 18 Global Step: 95570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:29,036-Speed 18461.64 samples/sec Loss 4.7748 LearningRate 0.0030 Epoch: 18 Global Step: 95580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:33,438-Speed 18613.15 samples/sec Loss 4.8118 LearningRate 0.0030 Epoch: 18 Global Step: 95590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:51:37,837-Speed 18629.76 samples/sec Loss 4.7878 LearningRate 0.0030 Epoch: 18 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:51:42,266-Speed 18503.49 samples/sec Loss 4.7599 LearningRate 0.0030 Epoch: 18 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:51:46,667-Speed 18619.31 samples/sec Loss 4.8119 LearningRate 0.0030 Epoch: 18 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:51:51,053-Speed 18683.04 samples/sec Loss 4.7965 LearningRate 0.0030 Epoch: 18 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:51:55,481-Speed 18506.15 samples/sec Loss 4.8047 LearningRate 0.0030 Epoch: 18 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:51:59,907-Speed 18509.14 samples/sec Loss 4.8173 LearningRate 0.0030 Epoch: 18 Global Step: 95650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:04,330-Speed 18529.23 samples/sec Loss 4.7835 LearningRate 0.0030 Epoch: 18 Global Step: 95660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:08,763-Speed 18484.73 samples/sec Loss 4.7681 LearningRate 0.0029 Epoch: 18 Global Step: 95670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:13,202-Speed 18460.74 samples/sec Loss 4.7908 LearningRate 0.0029 Epoch: 18 Global Step: 95680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:17,630-Speed 18504.37 samples/sec Loss 4.8256 LearningRate 0.0029 Epoch: 18 Global Step: 95690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:22,041-Speed 18578.73 samples/sec Loss 4.7922 LearningRate 0.0029 Epoch: 18 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:52:26,455-Speed 18565.38 samples/sec Loss 4.7652 LearningRate 0.0029 Epoch: 18 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:52:30,916-Speed 18366.56 samples/sec Loss 4.8104 LearningRate 0.0029 Epoch: 18 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:52:35,332-Speed 18555.82 samples/sec Loss 4.7867 LearningRate 0.0029 Epoch: 18 Global Step: 95730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:39,749-Speed 18551.66 samples/sec Loss 4.7966 LearningRate 0.0029 Epoch: 18 Global Step: 95740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:44,182-Speed 18489.05 samples/sec Loss 4.8377 LearningRate 0.0029 Epoch: 18 Global Step: 95750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:48,614-Speed 18488.75 samples/sec Loss 4.8352 LearningRate 0.0029 Epoch: 18 Global Step: 95760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:53,023-Speed 18586.40 samples/sec Loss 4.7928 LearningRate 0.0029 Epoch: 18 Global Step: 95770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:52:57,440-Speed 18551.55 samples/sec Loss 4.8078 LearningRate 0.0029 Epoch: 18 Global Step: 95780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:53:01,842-Speed 18614.89 samples/sec Loss 4.7774 LearningRate 0.0029 Epoch: 18 Global Step: 95790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:53:06,248-Speed 18596.83 samples/sec Loss 4.8201 LearningRate 0.0029 Epoch: 18 Global Step: 95800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:53:10,674-Speed 18517.46 samples/sec Loss 4.7997 LearningRate 0.0028 Epoch: 18 Global Step: 95810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:53:15,095-Speed 18558.02 samples/sec Loss 4.7611 LearningRate 0.0028 Epoch: 18 Global Step: 95820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:53:19,530-Speed 18476.53 samples/sec Loss 4.8150 LearningRate 0.0028 Epoch: 18 Global Step: 95830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:24,033-Speed 18196.72 samples/sec Loss 4.7953 LearningRate 0.0028 Epoch: 18 Global Step: 95840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:28,533-Speed 18209.26 samples/sec Loss 4.7730 LearningRate 0.0028 Epoch: 18 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:32,996-Speed 18360.84 samples/sec Loss 4.7837 LearningRate 0.0028 Epoch: 18 Global Step: 95860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:37,432-Speed 18474.68 samples/sec Loss 4.7875 LearningRate 0.0028 Epoch: 18 Global Step: 95870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:41,973-Speed 18047.59 samples/sec Loss 4.7942 LearningRate 0.0028 Epoch: 18 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:46,396-Speed 18521.51 samples/sec Loss 4.8193 LearningRate 0.0028 Epoch: 18 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:50,857-Speed 18373.28 samples/sec Loss 4.7758 LearningRate 0.0028 Epoch: 18 Global Step: 95900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:55,352-Speed 18228.29 samples/sec Loss 4.7843 LearningRate 0.0028 Epoch: 18 Global Step: 95910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:53:59,756-Speed 18604.94 samples/sec Loss 4.8198 LearningRate 0.0028 Epoch: 18 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:54:04,193-Speed 18478.92 samples/sec Loss 4.7778 LearningRate 0.0028 Epoch: 18 Global Step: 95930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:08,628-Speed 18480.87 samples/sec Loss 4.8133 LearningRate 0.0028 Epoch: 18 Global Step: 95940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:13,087-Speed 18377.08 samples/sec Loss 4.7805 LearningRate 0.0027 Epoch: 18 Global Step: 95950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:17,504-Speed 18549.46 samples/sec Loss 4.8122 LearningRate 0.0027 Epoch: 18 Global Step: 95960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:21,905-Speed 18622.02 samples/sec Loss 4.8138 LearningRate 0.0027 Epoch: 18 Global Step: 95970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:26,324-Speed 18540.01 samples/sec Loss 4.7947 LearningRate 0.0027 Epoch: 18 Global Step: 95980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:30,732-Speed 18590.65 samples/sec Loss 4.7993 LearningRate 0.0027 Epoch: 18 Global Step: 95990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:35,193-Speed 18370.61 samples/sec Loss 4.7789 LearningRate 0.0027 Epoch: 18 Global Step: 96000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:39,624-Speed 18495.54 samples/sec Loss 4.7617 LearningRate 0.0027 Epoch: 18 Global Step: 96010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:44,069-Speed 18434.01 samples/sec Loss 4.7964 LearningRate 0.0027 Epoch: 18 Global Step: 96020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:48,519-Speed 18413.52 samples/sec Loss 4.7977 LearningRate 0.0027 Epoch: 18 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:54:52,966-Speed 18434.10 samples/sec Loss 4.7884 LearningRate 0.0027 Epoch: 18 Global Step: 96040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:54:57,437-Speed 18327.26 samples/sec Loss 4.7829 LearningRate 0.0027 Epoch: 18 Global Step: 96050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:01,920-Speed 18278.69 samples/sec Loss 4.7686 LearningRate 0.0027 Epoch: 18 Global Step: 96060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:06,392-Speed 18327.72 samples/sec Loss 4.8094 LearningRate 0.0027 Epoch: 18 Global Step: 96070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:10,904-Speed 18161.69 samples/sec Loss 4.7842 LearningRate 0.0027 Epoch: 18 Global Step: 96080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:15,383-Speed 18305.86 samples/sec Loss 4.7723 LearningRate 0.0026 Epoch: 18 Global Step: 96090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:19,848-Speed 18353.27 samples/sec Loss 4.7710 LearningRate 0.0026 Epoch: 18 Global Step: 96100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:24,312-Speed 18357.80 samples/sec Loss 4.7752 LearningRate 0.0026 Epoch: 18 Global Step: 96110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:28,727-Speed 18562.37 samples/sec Loss 4.7574 LearningRate 0.0026 Epoch: 18 Global Step: 96120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:33,218-Speed 18249.69 samples/sec Loss 4.8237 LearningRate 0.0026 Epoch: 18 Global Step: 96130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:37,636-Speed 18552.94 samples/sec Loss 4.8234 LearningRate 0.0026 Epoch: 18 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:55:42,110-Speed 18318.21 samples/sec Loss 4.7693 LearningRate 0.0026 Epoch: 18 Global Step: 96150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:46,523-Speed 18568.99 samples/sec Loss 4.8209 LearningRate 0.0026 Epoch: 18 Global Step: 96160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:50,920-Speed 18639.16 samples/sec Loss 4.8044 LearningRate 0.0026 Epoch: 18 Global Step: 96170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:55,319-Speed 18628.13 samples/sec Loss 4.7968 LearningRate 0.0026 Epoch: 18 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:55:59,739-Speed 18539.60 samples/sec Loss 4.7946 LearningRate 0.0026 Epoch: 18 Global Step: 96190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:56:04,181-Speed 18448.54 samples/sec Loss 4.7515 LearningRate 0.0026 Epoch: 18 Global Step: 96200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:56:08,589-Speed 18587.81 samples/sec Loss 4.7994 LearningRate 0.0026 Epoch: 18 Global Step: 96210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:56:12,980-Speed 18666.31 samples/sec Loss 4.8062 LearningRate 0.0026 Epoch: 18 Global Step: 96220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:56:17,392-Speed 18570.81 samples/sec Loss 4.7894 LearningRate 0.0025 Epoch: 18 Global Step: 96230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:56:21,813-Speed 18540.37 samples/sec Loss 4.7882 LearningRate 0.0025 Epoch: 18 Global Step: 96240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:56:26,235-Speed 18528.40 samples/sec Loss 4.8016 LearningRate 0.0025 Epoch: 18 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:56:30,636-Speed 18620.13 samples/sec Loss 4.7500 LearningRate 0.0025 Epoch: 18 Global Step: 96260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:56:35,073-Speed 18472.95 samples/sec Loss 4.7690 LearningRate 0.0025 Epoch: 18 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:56:39,496-Speed 18530.44 samples/sec Loss 4.8073 LearningRate 0.0025 Epoch: 18 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:56:43,912-Speed 18561.96 samples/sec Loss 4.7807 LearningRate 0.0025 Epoch: 18 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:56:48,333-Speed 18537.48 samples/sec Loss 4.7759 LearningRate 0.0025 Epoch: 18 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:56:52,766-Speed 18485.41 samples/sec Loss 4.7847 LearningRate 0.0025 Epoch: 18 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:56:57,148-Speed 18702.28 samples/sec Loss 4.7850 LearningRate 0.0025 Epoch: 18 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:01,584-Speed 18477.46 samples/sec Loss 4.8188 LearningRate 0.0025 Epoch: 18 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:05,995-Speed 18580.27 samples/sec Loss 4.7895 LearningRate 0.0025 Epoch: 18 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:10,438-Speed 18439.85 samples/sec Loss 4.7771 LearningRate 0.0025 Epoch: 18 Global Step: 96350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 09:57:14,927-Speed 18257.55 samples/sec Loss 4.7916 LearningRate 0.0025 Epoch: 18 Global Step: 96360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 09:57:19,441-Speed 18150.79 samples/sec Loss 4.7873 LearningRate 0.0025 Epoch: 18 Global Step: 96370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:23,909-Speed 18344.06 samples/sec Loss 4.8043 LearningRate 0.0024 Epoch: 18 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:28,365-Speed 18392.08 samples/sec Loss 4.8009 LearningRate 0.0024 Epoch: 18 Global Step: 96390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:32,821-Speed 18389.52 samples/sec Loss 4.7750 LearningRate 0.0024 Epoch: 18 Global Step: 96400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:37,264-Speed 18444.58 samples/sec Loss 4.7774 LearningRate 0.0024 Epoch: 18 Global Step: 96410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:41,705-Speed 18455.74 samples/sec Loss 4.7606 LearningRate 0.0024 Epoch: 18 Global Step: 96420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:46,160-Speed 18393.18 samples/sec Loss 4.7618 LearningRate 0.0024 Epoch: 18 Global Step: 96430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:57:50,615-Speed 18394.30 samples/sec Loss 4.7528 LearningRate 0.0024 Epoch: 18 Global Step: 96440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:57:55,095-Speed 18290.24 samples/sec Loss 4.7867 LearningRate 0.0024 Epoch: 18 Global Step: 96450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:57:59,596-Speed 18208.62 samples/sec Loss 4.8003 LearningRate 0.0024 Epoch: 18 Global Step: 96460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:04,048-Speed 18408.53 samples/sec Loss 4.7672 LearningRate 0.0024 Epoch: 18 Global Step: 96470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:08,494-Speed 18432.11 samples/sec Loss 4.7799 LearningRate 0.0024 Epoch: 18 Global Step: 96480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:12,938-Speed 18440.42 samples/sec Loss 4.8156 LearningRate 0.0024 Epoch: 18 Global Step: 96490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:17,389-Speed 18410.64 samples/sec Loss 4.7770 LearningRate 0.0024 Epoch: 18 Global Step: 96500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:21,844-Speed 18393.00 samples/sec Loss 4.7963 LearningRate 0.0024 Epoch: 18 Global Step: 96510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:26,312-Speed 18336.89 samples/sec Loss 4.7498 LearningRate 0.0024 Epoch: 18 Global Step: 96520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:30,725-Speed 18568.06 samples/sec Loss 4.7700 LearningRate 0.0023 Epoch: 18 Global Step: 96530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 09:58:35,140-Speed 18563.35 samples/sec Loss 4.7514 LearningRate 0.0023 Epoch: 18 Global Step: 96540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:58:39,622-Speed 18282.65 samples/sec Loss 4.7512 LearningRate 0.0023 Epoch: 18 Global Step: 96550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:58:44,092-Speed 18331.27 samples/sec Loss 4.7727 LearningRate 0.0023 Epoch: 18 Global Step: 96560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:58:48,544-Speed 18405.60 samples/sec Loss 4.7809 LearningRate 0.0023 Epoch: 18 Global Step: 96570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:58:52,966-Speed 18535.13 samples/sec Loss 4.7685 LearningRate 0.0023 Epoch: 18 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:58:57,406-Speed 18453.88 samples/sec Loss 4.8331 LearningRate 0.0023 Epoch: 18 Global Step: 96590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:01,827-Speed 18532.84 samples/sec Loss 4.7492 LearningRate 0.0023 Epoch: 18 Global Step: 96600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:06,259-Speed 18488.90 samples/sec Loss 4.7988 LearningRate 0.0023 Epoch: 18 Global Step: 96610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:10,670-Speed 18576.23 samples/sec Loss 4.7720 LearningRate 0.0023 Epoch: 18 Global Step: 96620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:15,104-Speed 18485.33 samples/sec Loss 4.7766 LearningRate 0.0023 Epoch: 18 Global Step: 96630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:19,519-Speed 18557.40 samples/sec Loss 4.7505 LearningRate 0.0023 Epoch: 18 Global Step: 96640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:24,048-Speed 18095.34 samples/sec Loss 4.7990 LearningRate 0.0023 Epoch: 18 Global Step: 96650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:28,528-Speed 18289.23 samples/sec Loss 4.7495 LearningRate 0.0023 Epoch: 18 Global Step: 96660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:33,117-Speed 17858.46 samples/sec Loss 4.7518 LearningRate 0.0023 Epoch: 18 Global Step: 96670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:37,618-Speed 18210.25 samples/sec Loss 4.7626 LearningRate 0.0023 Epoch: 18 Global Step: 96680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:42,031-Speed 18570.75 samples/sec Loss 4.7718 LearningRate 0.0022 Epoch: 18 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:46,550-Speed 18131.53 samples/sec Loss 4.7671 LearningRate 0.0022 Epoch: 18 Global Step: 96700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:51,019-Speed 18340.45 samples/sec Loss 4.7718 LearningRate 0.0022 Epoch: 18 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:55,443-Speed 18517.94 samples/sec Loss 4.7642 LearningRate 0.0022 Epoch: 18 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 09:59:59,869-Speed 18514.31 samples/sec Loss 4.7607 LearningRate 0.0022 Epoch: 18 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:00:04,325-Speed 18392.06 samples/sec Loss 4.7859 LearningRate 0.0022 Epoch: 18 Global Step: 96740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-14 10:00:08,741-Speed 18552.67 samples/sec Loss 4.7852 LearningRate 0.0022 Epoch: 18 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:00:13,148-Speed 18597.37 samples/sec Loss 4.7700 LearningRate 0.0022 Epoch: 18 Global Step: 96760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:00:17,561-Speed 18573.81 samples/sec Loss 4.7713 LearningRate 0.0022 Epoch: 18 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:00:22,687-Speed 15984.81 samples/sec Loss 4.7513 LearningRate 0.0022 Epoch: 18 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:00:27,173-Speed 18265.67 samples/sec Loss 4.7635 LearningRate 0.0022 Epoch: 18 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:00:31,620-Speed 18427.19 samples/sec Loss 4.7565 LearningRate 0.0022 Epoch: 18 Global Step: 96800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:00:36,076-Speed 18388.87 samples/sec Loss 4.8077 LearningRate 0.0022 Epoch: 18 Global Step: 96810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:00:40,529-Speed 18401.98 samples/sec Loss 4.7806 LearningRate 0.0022 Epoch: 18 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:00:44,965-Speed 18472.37 samples/sec Loss 4.7659 LearningRate 0.0022 Epoch: 18 Global Step: 96830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:00:49,399-Speed 18479.97 samples/sec Loss 4.7813 LearningRate 0.0021 Epoch: 18 Global Step: 96840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:00:53,815-Speed 18557.45 samples/sec Loss 4.8082 LearningRate 0.0021 Epoch: 18 Global Step: 96850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:00:58,255-Speed 18456.80 samples/sec Loss 4.7885 LearningRate 0.0021 Epoch: 18 Global Step: 96860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:01:02,697-Speed 18444.48 samples/sec Loss 4.7808 LearningRate 0.0021 Epoch: 18 Global Step: 96870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:01:07,162-Speed 18357.39 samples/sec Loss 4.7773 LearningRate 0.0021 Epoch: 18 Global Step: 96880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:01:11,638-Speed 18305.42 samples/sec Loss 4.7362 LearningRate 0.0021 Epoch: 18 Global Step: 96890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:01:16,056-Speed 18547.67 samples/sec Loss 4.7364 LearningRate 0.0021 Epoch: 18 Global Step: 96900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:01:20,483-Speed 18513.92 samples/sec Loss 4.7656 LearningRate 0.0021 Epoch: 18 Global Step: 96910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:01:24,961-Speed 18297.67 samples/sec Loss 4.7685 LearningRate 0.0021 Epoch: 18 Global Step: 96920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:01:29,382-Speed 18531.14 samples/sec Loss 4.7915 LearningRate 0.0021 Epoch: 18 Global Step: 96930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:01:33,851-Speed 18339.33 samples/sec Loss 4.7409 LearningRate 0.0021 Epoch: 18 Global Step: 96940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:01:38,352-Speed 18205.60 samples/sec Loss 4.7908 LearningRate 0.0021 Epoch: 18 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:01:42,771-Speed 18544.68 samples/sec Loss 4.7736 LearningRate 0.0021 Epoch: 18 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:01:47,179-Speed 18590.71 samples/sec Loss 4.7533 LearningRate 0.0021 Epoch: 18 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:01:51,570-Speed 18659.79 samples/sec Loss 4.7549 LearningRate 0.0021 Epoch: 18 Global Step: 96980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:01:55,978-Speed 18589.15 samples/sec Loss 4.7477 LearningRate 0.0021 Epoch: 18 Global Step: 96990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:02:00,397-Speed 18549.90 samples/sec Loss 4.7629 LearningRate 0.0020 Epoch: 18 Global Step: 97000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:02:04,800-Speed 18617.51 samples/sec Loss 4.7940 LearningRate 0.0020 Epoch: 18 Global Step: 97010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:09,210-Speed 18588.30 samples/sec Loss 4.7753 LearningRate 0.0020 Epoch: 18 Global Step: 97020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:13,631-Speed 18537.11 samples/sec Loss 4.7659 LearningRate 0.0020 Epoch: 18 Global Step: 97030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:17,981-Speed 18840.93 samples/sec Loss 4.8114 LearningRate 0.0020 Epoch: 18 Global Step: 97040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:22,421-Speed 18456.09 samples/sec Loss 4.7688 LearningRate 0.0020 Epoch: 18 Global Step: 97050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:26,841-Speed 18540.94 samples/sec Loss 4.7570 LearningRate 0.0020 Epoch: 18 Global Step: 97060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:31,245-Speed 18604.85 samples/sec Loss 4.7513 LearningRate 0.0020 Epoch: 18 Global Step: 97070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:35,678-Speed 18488.27 samples/sec Loss 4.7738 LearningRate 0.0020 Epoch: 18 Global Step: 97080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:40,096-Speed 18549.04 samples/sec Loss 4.7708 LearningRate 0.0020 Epoch: 18 Global Step: 97090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:44,575-Speed 18294.92 samples/sec Loss 4.7545 LearningRate 0.0020 Epoch: 18 Global Step: 97100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:02:49,074-Speed 18213.07 samples/sec Loss 4.7724 LearningRate 0.0020 Epoch: 18 Global Step: 97110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:02:53,475-Speed 18623.55 samples/sec Loss 4.7128 LearningRate 0.0020 Epoch: 18 Global Step: 97120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:02:57,923-Speed 18423.10 samples/sec Loss 4.7814 LearningRate 0.0020 Epoch: 18 Global Step: 97130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:02,318-Speed 18644.28 samples/sec Loss 4.7653 LearningRate 0.0020 Epoch: 18 Global Step: 97140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:06,754-Speed 18475.13 samples/sec Loss 4.7812 LearningRate 0.0020 Epoch: 18 Global Step: 97150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:11,141-Speed 18674.93 samples/sec Loss 4.7842 LearningRate 0.0020 Epoch: 18 Global Step: 97160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:15,561-Speed 18542.68 samples/sec Loss 4.7723 LearningRate 0.0019 Epoch: 18 Global Step: 97170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:19,960-Speed 18624.84 samples/sec Loss 4.7664 LearningRate 0.0019 Epoch: 18 Global Step: 97180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:24,403-Speed 18442.78 samples/sec Loss 4.7559 LearningRate 0.0019 Epoch: 18 Global Step: 97190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:28,853-Speed 18414.54 samples/sec Loss 4.7771 LearningRate 0.0019 Epoch: 18 Global Step: 97200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:03:33,306-Speed 18400.65 samples/sec Loss 4.7601 LearningRate 0.0019 Epoch: 18 Global Step: 97210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:03:37,769-Speed 18364.01 samples/sec Loss 4.7722 LearningRate 0.0019 Epoch: 18 Global Step: 97220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:03:42,303-Speed 18073.79 samples/sec Loss 4.7939 LearningRate 0.0019 Epoch: 18 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:03:46,712-Speed 18586.38 samples/sec Loss 4.7444 LearningRate 0.0019 Epoch: 18 Global Step: 97240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:03:51,232-Speed 18132.20 samples/sec Loss 4.7496 LearningRate 0.0019 Epoch: 18 Global Step: 97250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:03:55,711-Speed 18302.03 samples/sec Loss 4.7835 LearningRate 0.0019 Epoch: 18 Global Step: 97260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:04:00,145-Speed 18479.90 samples/sec Loss 4.8040 LearningRate 0.0019 Epoch: 18 Global Step: 97270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:04:04,558-Speed 18572.31 samples/sec Loss 4.7715 LearningRate 0.0019 Epoch: 18 Global Step: 97280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:04:08,989-Speed 18488.28 samples/sec Loss 4.7539 LearningRate 0.0019 Epoch: 18 Global Step: 97290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:04:13,395-Speed 18601.62 samples/sec Loss 4.7659 LearningRate 0.0019 Epoch: 18 Global Step: 97300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:04:17,863-Speed 18337.31 samples/sec Loss 4.7396 LearningRate 0.0019 Epoch: 18 Global Step: 97310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:22,351-Speed 18259.94 samples/sec Loss 4.7845 LearningRate 0.0019 Epoch: 18 Global Step: 97320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:26,772-Speed 18533.86 samples/sec Loss 4.7724 LearningRate 0.0019 Epoch: 18 Global Step: 97330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:31,258-Speed 18265.35 samples/sec Loss 4.7638 LearningRate 0.0018 Epoch: 18 Global Step: 97340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:35,705-Speed 18428.01 samples/sec Loss 4.8001 LearningRate 0.0018 Epoch: 18 Global Step: 97350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:43,258-Speed 10851.20 samples/sec Loss 4.7394 LearningRate 0.0018 Epoch: 18 Global Step: 97360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:47,665-Speed 18595.29 samples/sec Loss 4.7654 LearningRate 0.0018 Epoch: 18 Global Step: 97370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:52,076-Speed 18577.77 samples/sec Loss 4.7330 LearningRate 0.0018 Epoch: 18 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:04:56,483-Speed 18598.90 samples/sec Loss 4.7699 LearningRate 0.0018 Epoch: 18 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:00,868-Speed 18685.41 samples/sec Loss 4.7637 LearningRate 0.0018 Epoch: 18 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:05,270-Speed 18611.40 samples/sec Loss 4.7712 LearningRate 0.0018 Epoch: 18 Global Step: 97410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:05:09,691-Speed 18534.57 samples/sec Loss 4.7959 LearningRate 0.0018 Epoch: 18 Global Step: 97420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:05:14,141-Speed 18417.17 samples/sec Loss 4.7562 LearningRate 0.0018 Epoch: 18 Global Step: 97430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:05:18,594-Speed 18400.50 samples/sec Loss 4.7438 LearningRate 0.0018 Epoch: 18 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:23,055-Speed 18370.98 samples/sec Loss 4.7865 LearningRate 0.0018 Epoch: 18 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:27,513-Speed 18380.22 samples/sec Loss 4.7813 LearningRate 0.0018 Epoch: 18 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:31,949-Speed 18471.42 samples/sec Loss 4.7159 LearningRate 0.0018 Epoch: 18 Global Step: 97470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:36,350-Speed 18622.57 samples/sec Loss 4.7683 LearningRate 0.0018 Epoch: 18 Global Step: 97480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:40,770-Speed 18536.81 samples/sec Loss 4.7570 LearningRate 0.0018 Epoch: 18 Global Step: 97490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:45,185-Speed 18562.71 samples/sec Loss 4.7635 LearningRate 0.0018 Epoch: 18 Global Step: 97500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:49,576-Speed 18662.07 samples/sec Loss 4.7478 LearningRate 0.0017 Epoch: 18 Global Step: 97510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:53,975-Speed 18626.98 samples/sec Loss 4.7422 LearningRate 0.0017 Epoch: 18 Global Step: 97520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:05:58,375-Speed 18627.90 samples/sec Loss 4.7248 LearningRate 0.0017 Epoch: 18 Global Step: 97530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:02,835-Speed 18377.39 samples/sec Loss 4.7557 LearningRate 0.0017 Epoch: 18 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:06:07,254-Speed 18545.15 samples/sec Loss 4.7855 LearningRate 0.0017 Epoch: 18 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:06:11,661-Speed 18592.30 samples/sec Loss 4.7445 LearningRate 0.0017 Epoch: 18 Global Step: 97560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:16,085-Speed 18524.20 samples/sec Loss 4.7526 LearningRate 0.0017 Epoch: 18 Global Step: 97570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:20,567-Speed 18278.74 samples/sec Loss 4.8010 LearningRate 0.0017 Epoch: 18 Global Step: 97580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:24,959-Speed 18660.46 samples/sec Loss 4.7617 LearningRate 0.0017 Epoch: 18 Global Step: 97590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:29,414-Speed 18391.92 samples/sec Loss 4.7478 LearningRate 0.0017 Epoch: 18 Global Step: 97600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:33,835-Speed 18534.55 samples/sec Loss 4.7675 LearningRate 0.0017 Epoch: 18 Global Step: 97610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:38,268-Speed 18484.43 samples/sec Loss 4.7770 LearningRate 0.0017 Epoch: 18 Global Step: 97620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:42,690-Speed 18532.55 samples/sec Loss 4.8044 LearningRate 0.0017 Epoch: 18 Global Step: 97630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:47,120-Speed 18495.56 samples/sec Loss 4.7398 LearningRate 0.0017 Epoch: 18 Global Step: 97640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:51,538-Speed 18546.25 samples/sec Loss 4.7349 LearningRate 0.0017 Epoch: 18 Global Step: 97650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:06:55,992-Speed 18395.20 samples/sec Loss 4.7405 LearningRate 0.0017 Epoch: 18 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:07:00,431-Speed 18461.83 samples/sec Loss 4.7336 LearningRate 0.0017 Epoch: 18 Global Step: 97670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:04,895-Speed 18359.09 samples/sec Loss 4.7736 LearningRate 0.0017 Epoch: 18 Global Step: 97680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:09,306-Speed 18574.34 samples/sec Loss 4.7657 LearningRate 0.0016 Epoch: 18 Global Step: 97690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:13,730-Speed 18525.25 samples/sec Loss 4.7546 LearningRate 0.0016 Epoch: 18 Global Step: 97700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:18,142-Speed 18570.72 samples/sec Loss 4.7545 LearningRate 0.0016 Epoch: 18 Global Step: 97710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:22,582-Speed 18457.27 samples/sec Loss 4.7653 LearningRate 0.0016 Epoch: 18 Global Step: 97720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:27,040-Speed 18381.25 samples/sec Loss 4.7420 LearningRate 0.0016 Epoch: 18 Global Step: 97730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:31,447-Speed 18593.70 samples/sec Loss 4.7733 LearningRate 0.0016 Epoch: 18 Global Step: 97740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:35,878-Speed 18493.84 samples/sec Loss 4.8100 LearningRate 0.0016 Epoch: 18 Global Step: 97750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:40,313-Speed 18473.03 samples/sec Loss 4.7394 LearningRate 0.0016 Epoch: 18 Global Step: 97760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:44,727-Speed 18565.62 samples/sec Loss 4.7173 LearningRate 0.0016 Epoch: 18 Global Step: 97770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:49,122-Speed 18644.65 samples/sec Loss 4.7541 LearningRate 0.0016 Epoch: 18 Global Step: 97780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:53,562-Speed 18454.05 samples/sec Loss 4.7571 LearningRate 0.0016 Epoch: 18 Global Step: 97790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:07:58,013-Speed 18411.59 samples/sec Loss 4.7651 LearningRate 0.0016 Epoch: 18 Global Step: 97800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:02,469-Speed 18387.20 samples/sec Loss 4.7612 LearningRate 0.0016 Epoch: 18 Global Step: 97810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:06,900-Speed 18497.16 samples/sec Loss 4.7702 LearningRate 0.0016 Epoch: 18 Global Step: 97820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:11,334-Speed 18487.65 samples/sec Loss 4.7665 LearningRate 0.0016 Epoch: 18 Global Step: 97830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:15,899-Speed 17955.29 samples/sec Loss 4.7487 LearningRate 0.0016 Epoch: 18 Global Step: 97840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:20,344-Speed 18433.65 samples/sec Loss 4.7463 LearningRate 0.0016 Epoch: 18 Global Step: 97850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:24,757-Speed 18569.00 samples/sec Loss 4.7010 LearningRate 0.0016 Epoch: 18 Global Step: 97860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:29,163-Speed 18600.01 samples/sec Loss 4.7384 LearningRate 0.0016 Epoch: 18 Global Step: 97870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:08:33,577-Speed 18564.07 samples/sec Loss 4.7544 LearningRate 0.0015 Epoch: 18 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:08:38,030-Speed 18407.88 samples/sec Loss 4.7447 LearningRate 0.0015 Epoch: 18 Global Step: 97890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:42,437-Speed 18596.59 samples/sec Loss 4.7574 LearningRate 0.0015 Epoch: 18 Global Step: 97900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:46,862-Speed 18518.34 samples/sec Loss 4.7604 LearningRate 0.0015 Epoch: 18 Global Step: 97910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:51,287-Speed 18521.68 samples/sec Loss 4.7601 LearningRate 0.0015 Epoch: 18 Global Step: 97920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:08:55,735-Speed 18420.30 samples/sec Loss 4.7338 LearningRate 0.0015 Epoch: 18 Global Step: 97930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:00,157-Speed 18532.53 samples/sec Loss 4.7116 LearningRate 0.0015 Epoch: 18 Global Step: 97940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:04,585-Speed 18505.15 samples/sec Loss 4.7655 LearningRate 0.0015 Epoch: 18 Global Step: 97950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:08,994-Speed 18588.28 samples/sec Loss 4.7601 LearningRate 0.0015 Epoch: 18 Global Step: 97960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:13,427-Speed 18487.91 samples/sec Loss 4.7418 LearningRate 0.0015 Epoch: 18 Global Step: 97970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:17,854-Speed 18508.91 samples/sec Loss 4.7705 LearningRate 0.0015 Epoch: 18 Global Step: 97980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:22,297-Speed 18444.99 samples/sec Loss 4.7635 LearningRate 0.0015 Epoch: 18 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:09:26,719-Speed 18530.69 samples/sec Loss 4.7492 LearningRate 0.0015 Epoch: 18 Global Step: 98000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:31,195-Speed 18306.36 samples/sec Loss 4.7529 LearningRate 0.0015 Epoch: 18 Global Step: 98010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:35,683-Speed 18260.17 samples/sec Loss 4.7234 LearningRate 0.0015 Epoch: 18 Global Step: 98020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:40,128-Speed 18433.44 samples/sec Loss 4.7440 LearningRate 0.0015 Epoch: 18 Global Step: 98030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:44,607-Speed 18296.19 samples/sec Loss 4.7808 LearningRate 0.0015 Epoch: 18 Global Step: 98040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:49,008-Speed 18615.34 samples/sec Loss 4.7628 LearningRate 0.0015 Epoch: 18 Global Step: 98050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:53,495-Speed 18267.12 samples/sec Loss 4.7272 LearningRate 0.0015 Epoch: 18 Global Step: 98060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:09:57,932-Speed 18468.71 samples/sec Loss 4.7412 LearningRate 0.0014 Epoch: 18 Global Step: 98070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:10:02,345-Speed 18563.47 samples/sec Loss 4.7851 LearningRate 0.0014 Epoch: 18 Global Step: 98080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:10:06,783-Speed 18467.34 samples/sec Loss 4.7641 LearningRate 0.0014 Epoch: 18 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:10:11,233-Speed 18411.55 samples/sec Loss 4.7655 LearningRate 0.0014 Epoch: 18 Global Step: 98100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:10:15,668-Speed 18477.53 samples/sec Loss 4.7207 LearningRate 0.0014 Epoch: 18 Global Step: 98110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:10:20,066-Speed 18633.32 samples/sec Loss 4.7224 LearningRate 0.0014 Epoch: 18 Global Step: 98120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:10:24,477-Speed 18576.87 samples/sec Loss 4.7411 LearningRate 0.0014 Epoch: 18 Global Step: 98130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:10:28,948-Speed 18329.66 samples/sec Loss 4.7883 LearningRate 0.0014 Epoch: 18 Global Step: 98140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:10:33,353-Speed 18601.40 samples/sec Loss 4.7484 LearningRate 0.0014 Epoch: 18 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:10:37,763-Speed 18583.57 samples/sec Loss 4.7483 LearningRate 0.0014 Epoch: 18 Global Step: 98160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:10:42,313-Speed 18010.06 samples/sec Loss 4.7108 LearningRate 0.0014 Epoch: 18 Global Step: 98170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:10:46,770-Speed 18387.57 samples/sec Loss 4.7488 LearningRate 0.0014 Epoch: 18 Global Step: 98180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:10:51,180-Speed 18582.27 samples/sec Loss 4.7647 LearningRate 0.0014 Epoch: 18 Global Step: 98190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:10:55,607-Speed 18511.50 samples/sec Loss 4.7325 LearningRate 0.0014 Epoch: 18 Global Step: 98200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:11:00,017-Speed 18583.51 samples/sec Loss 4.7744 LearningRate 0.0014 Epoch: 18 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:11:04,426-Speed 18590.50 samples/sec Loss 4.7686 LearningRate 0.0014 Epoch: 18 Global Step: 98220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:11:08,836-Speed 18582.04 samples/sec Loss 4.7119 LearningRate 0.0014 Epoch: 18 Global Step: 98230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:11:13,320-Speed 18274.37 samples/sec Loss 4.7093 LearningRate 0.0014 Epoch: 18 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:11:17,770-Speed 18416.73 samples/sec Loss 4.7555 LearningRate 0.0014 Epoch: 18 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:11:22,199-Speed 18500.53 samples/sec Loss 4.7651 LearningRate 0.0013 Epoch: 18 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:26,637-Speed 18463.16 samples/sec Loss 4.7482 LearningRate 0.0013 Epoch: 18 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:31,088-Speed 18409.31 samples/sec Loss 4.7410 LearningRate 0.0013 Epoch: 18 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:35,533-Speed 18436.48 samples/sec Loss 4.7611 LearningRate 0.0013 Epoch: 18 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:39,982-Speed 18416.68 samples/sec Loss 4.8050 LearningRate 0.0013 Epoch: 18 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:44,396-Speed 18565.43 samples/sec Loss 4.7618 LearningRate 0.0013 Epoch: 18 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:48,812-Speed 18558.60 samples/sec Loss 4.7786 LearningRate 0.0013 Epoch: 18 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:53,249-Speed 18466.37 samples/sec Loss 4.7660 LearningRate 0.0013 Epoch: 18 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:11:57,674-Speed 18519.14 samples/sec Loss 4.7459 LearningRate 0.0013 Epoch: 18 Global Step: 98340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:02,150-Speed 18305.69 samples/sec Loss 4.7192 LearningRate 0.0013 Epoch: 18 Global Step: 98350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:06,634-Speed 18270.30 samples/sec Loss 4.7298 LearningRate 0.0013 Epoch: 18 Global Step: 98360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:11,066-Speed 18488.05 samples/sec Loss 4.7498 LearningRate 0.0013 Epoch: 18 Global Step: 98370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:15,480-Speed 18567.62 samples/sec Loss 4.7825 LearningRate 0.0013 Epoch: 18 Global Step: 98380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:19,886-Speed 18596.23 samples/sec Loss 4.7685 LearningRate 0.0013 Epoch: 18 Global Step: 98390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:24,339-Speed 18403.95 samples/sec Loss 4.7636 LearningRate 0.0013 Epoch: 18 Global Step: 98400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:28,770-Speed 18491.80 samples/sec Loss 4.7494 LearningRate 0.0013 Epoch: 18 Global Step: 98410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:33,189-Speed 18542.77 samples/sec Loss 4.7254 LearningRate 0.0013 Epoch: 18 Global Step: 98420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:37,620-Speed 18493.47 samples/sec Loss 4.7618 LearningRate 0.0013 Epoch: 18 Global Step: 98430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:42,075-Speed 18389.16 samples/sec Loss 4.7538 LearningRate 0.0013 Epoch: 18 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:12:46,512-Speed 18469.49 samples/sec Loss 4.7556 LearningRate 0.0013 Epoch: 18 Global Step: 98450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:50,921-Speed 18586.31 samples/sec Loss 4.7665 LearningRate 0.0013 Epoch: 18 Global Step: 98460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:55,331-Speed 18579.79 samples/sec Loss 4.7547 LearningRate 0.0012 Epoch: 18 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:12:59,795-Speed 18352.93 samples/sec Loss 4.7490 LearningRate 0.0012 Epoch: 18 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:13:04,254-Speed 18377.29 samples/sec Loss 4.7329 LearningRate 0.0012 Epoch: 18 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:13:08,687-Speed 18487.55 samples/sec Loss 4.7493 LearningRate 0.0012 Epoch: 18 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:13:13,063-Speed 18725.57 samples/sec Loss 4.7480 LearningRate 0.0012 Epoch: 18 Global Step: 98510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:13:31,650-Speed 4407.49 samples/sec Loss 4.7417 LearningRate 0.0012 Epoch: 19 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:13:36,120-Speed 18334.97 samples/sec Loss 4.7762 LearningRate 0.0012 Epoch: 19 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:13:40,570-Speed 18413.42 samples/sec Loss 4.7483 LearningRate 0.0012 Epoch: 19 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:13:44,959-Speed 18671.30 samples/sec Loss 4.7409 LearningRate 0.0012 Epoch: 19 Global Step: 98550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:13:49,388-Speed 18499.87 samples/sec Loss 4.7487 LearningRate 0.0012 Epoch: 19 Global Step: 98560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:13:54,326-Speed 16594.55 samples/sec Loss 4.7609 LearningRate 0.0012 Epoch: 19 Global Step: 98570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:13:58,731-Speed 18603.76 samples/sec Loss 4.7194 LearningRate 0.0012 Epoch: 19 Global Step: 98580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:14:03,169-Speed 18462.77 samples/sec Loss 4.7128 LearningRate 0.0012 Epoch: 19 Global Step: 98590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:14:07,563-Speed 18651.26 samples/sec Loss 4.7250 LearningRate 0.0012 Epoch: 19 Global Step: 98600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:14:11,998-Speed 18476.58 samples/sec Loss 4.7319 LearningRate 0.0012 Epoch: 19 Global Step: 98610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:14:16,407-Speed 18585.68 samples/sec Loss 4.7712 LearningRate 0.0012 Epoch: 19 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:20,872-Speed 18353.69 samples/sec Loss 4.7024 LearningRate 0.0012 Epoch: 19 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:25,324-Speed 18407.33 samples/sec Loss 4.7577 LearningRate 0.0012 Epoch: 19 Global Step: 98640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:29,760-Speed 18474.68 samples/sec Loss 4.7297 LearningRate 0.0012 Epoch: 19 Global Step: 98650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:34,147-Speed 18673.59 samples/sec Loss 4.7231 LearningRate 0.0012 Epoch: 19 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:38,540-Speed 18659.51 samples/sec Loss 4.7760 LearningRate 0.0012 Epoch: 19 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:42,941-Speed 18626.31 samples/sec Loss 4.6941 LearningRate 0.0011 Epoch: 19 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:47,329-Speed 18673.35 samples/sec Loss 4.7332 LearningRate 0.0011 Epoch: 19 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:51,790-Speed 18370.19 samples/sec Loss 4.7394 LearningRate 0.0011 Epoch: 19 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:14:56,227-Speed 18467.47 samples/sec Loss 4.7419 LearningRate 0.0011 Epoch: 19 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:00,721-Speed 18239.51 samples/sec Loss 4.7053 LearningRate 0.0011 Epoch: 19 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:15:05,166-Speed 18437.48 samples/sec Loss 4.7685 LearningRate 0.0011 Epoch: 19 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:15:09,620-Speed 18396.80 samples/sec Loss 4.7642 LearningRate 0.0011 Epoch: 19 Global Step: 98740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:15:14,097-Speed 18300.78 samples/sec Loss 4.7242 LearningRate 0.0011 Epoch: 19 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:18,541-Speed 18441.79 samples/sec Loss 4.7191 LearningRate 0.0011 Epoch: 19 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:22,982-Speed 18454.29 samples/sec Loss 4.7579 LearningRate 0.0011 Epoch: 19 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:27,424-Speed 18445.78 samples/sec Loss 4.7660 LearningRate 0.0011 Epoch: 19 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:31,918-Speed 18233.59 samples/sec Loss 4.7244 LearningRate 0.0011 Epoch: 19 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:36,341-Speed 18523.35 samples/sec Loss 4.7489 LearningRate 0.0011 Epoch: 19 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:40,771-Speed 18492.69 samples/sec Loss 4.7639 LearningRate 0.0011 Epoch: 19 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:45,190-Speed 18547.72 samples/sec Loss 4.7459 LearningRate 0.0011 Epoch: 19 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:49,648-Speed 18379.18 samples/sec Loss 4.7643 LearningRate 0.0011 Epoch: 19 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:54,056-Speed 18587.18 samples/sec Loss 4.7633 LearningRate 0.0011 Epoch: 19 Global Step: 98840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:15:58,486-Speed 18498.07 samples/sec Loss 4.7564 LearningRate 0.0011 Epoch: 19 Global Step: 98850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:16:02,888-Speed 18616.28 samples/sec Loss 4.7452 LearningRate 0.0011 Epoch: 19 Global Step: 98860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:16:07,308-Speed 18540.70 samples/sec Loss 4.6979 LearningRate 0.0011 Epoch: 19 Global Step: 98870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:16:11,709-Speed 18616.99 samples/sec Loss 4.7223 LearningRate 0.0011 Epoch: 19 Global Step: 98880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:16:16,137-Speed 18504.06 samples/sec Loss 4.7693 LearningRate 0.0011 Epoch: 19 Global Step: 98890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:16:20,531-Speed 18649.68 samples/sec Loss 4.7606 LearningRate 0.0010 Epoch: 19 Global Step: 98900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:16:24,987-Speed 18387.25 samples/sec Loss 4.7382 LearningRate 0.0010 Epoch: 19 Global Step: 98910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:16:29,429-Speed 18448.45 samples/sec Loss 4.7409 LearningRate 0.0010 Epoch: 19 Global Step: 98920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:16:33,829-Speed 18621.49 samples/sec Loss 4.7613 LearningRate 0.0010 Epoch: 19 Global Step: 98930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:16:38,221-Speed 18656.05 samples/sec Loss 4.7412 LearningRate 0.0010 Epoch: 19 Global Step: 98940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:16:42,641-Speed 18534.81 samples/sec Loss 4.7730 LearningRate 0.0010 Epoch: 19 Global Step: 98950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:16:47,047-Speed 18600.90 samples/sec Loss 4.7377 LearningRate 0.0010 Epoch: 19 Global Step: 98960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:16:51,433-Speed 18679.67 samples/sec Loss 4.7184 LearningRate 0.0010 Epoch: 19 Global Step: 98970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:16:55,841-Speed 18588.94 samples/sec Loss 4.7394 LearningRate 0.0010 Epoch: 19 Global Step: 98980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:17:00,272-Speed 18490.35 samples/sec Loss 4.7716 LearningRate 0.0010 Epoch: 19 Global Step: 98990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:17:04,781-Speed 18176.41 samples/sec Loss 4.7526 LearningRate 0.0010 Epoch: 19 Global Step: 99000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:17:09,250-Speed 18336.16 samples/sec Loss 4.7568 LearningRate 0.0010 Epoch: 19 Global Step: 99010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-01-14 10:17:13,681-Speed 18491.55 samples/sec Loss 4.7159 LearningRate 0.0010 Epoch: 19 Global Step: 99020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:18,140-Speed 18376.52 samples/sec Loss 4.7414 LearningRate 0.0010 Epoch: 19 Global Step: 99030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:22,535-Speed 18643.52 samples/sec Loss 4.7207 LearningRate 0.0010 Epoch: 19 Global Step: 99040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:26,957-Speed 18534.96 samples/sec Loss 4.7410 LearningRate 0.0010 Epoch: 19 Global Step: 99050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:31,356-Speed 18623.30 samples/sec Loss 4.7210 LearningRate 0.0010 Epoch: 19 Global Step: 99060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:35,776-Speed 18545.86 samples/sec Loss 4.7249 LearningRate 0.0010 Epoch: 19 Global Step: 99070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:40,208-Speed 18491.66 samples/sec Loss 4.7570 LearningRate 0.0010 Epoch: 19 Global Step: 99080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:44,648-Speed 18455.65 samples/sec Loss 4.7503 LearningRate 0.0010 Epoch: 19 Global Step: 99090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:49,058-Speed 18580.80 samples/sec Loss 4.7426 LearningRate 0.0010 Epoch: 19 Global Step: 99100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:53,486-Speed 18506.28 samples/sec Loss 4.7562 LearningRate 0.0010 Epoch: 19 Global Step: 99110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:17:57,892-Speed 18598.42 samples/sec Loss 4.7132 LearningRate 0.0010 Epoch: 19 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:18:02,350-Speed 18385.71 samples/sec Loss 4.7413 LearningRate 0.0010 Epoch: 19 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:18:06,761-Speed 18580.16 samples/sec Loss 4.7362 LearningRate 0.0009 Epoch: 19 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:18:11,224-Speed 18365.95 samples/sec Loss 4.7140 LearningRate 0.0009 Epoch: 19 Global Step: 99150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:15,640-Speed 18556.28 samples/sec Loss 4.7246 LearningRate 0.0009 Epoch: 19 Global Step: 99160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:20,058-Speed 18548.66 samples/sec Loss 4.7329 LearningRate 0.0009 Epoch: 19 Global Step: 99170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:24,471-Speed 18568.52 samples/sec Loss 4.6992 LearningRate 0.0009 Epoch: 19 Global Step: 99180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:28,911-Speed 18457.13 samples/sec Loss 4.7224 LearningRate 0.0009 Epoch: 19 Global Step: 99190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:33,337-Speed 18513.56 samples/sec Loss 4.7192 LearningRate 0.0009 Epoch: 19 Global Step: 99200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:37,814-Speed 18304.85 samples/sec Loss 4.6900 LearningRate 0.0009 Epoch: 19 Global Step: 99210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:42,229-Speed 18557.70 samples/sec Loss 4.6940 LearningRate 0.0009 Epoch: 19 Global Step: 99220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:46,666-Speed 18476.78 samples/sec Loss 4.7803 LearningRate 0.0009 Epoch: 19 Global Step: 99230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:51,081-Speed 18557.14 samples/sec Loss 4.7025 LearningRate 0.0009 Epoch: 19 Global Step: 99240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:18:55,465-Speed 18691.29 samples/sec Loss 4.7542 LearningRate 0.0009 Epoch: 19 Global Step: 99250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:18:59,856-Speed 18661.87 samples/sec Loss 4.7191 LearningRate 0.0009 Epoch: 19 Global Step: 99260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:19:04,318-Speed 18368.75 samples/sec Loss 4.7486 LearningRate 0.0009 Epoch: 19 Global Step: 99270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:08,707-Speed 18669.39 samples/sec Loss 4.7175 LearningRate 0.0009 Epoch: 19 Global Step: 99280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:13,134-Speed 18508.06 samples/sec Loss 4.7449 LearningRate 0.0009 Epoch: 19 Global Step: 99290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:17,596-Speed 18369.19 samples/sec Loss 4.7237 LearningRate 0.0009 Epoch: 19 Global Step: 99300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:22,091-Speed 18227.94 samples/sec Loss 4.7371 LearningRate 0.0009 Epoch: 19 Global Step: 99310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:26,572-Speed 18287.26 samples/sec Loss 4.7032 LearningRate 0.0009 Epoch: 19 Global Step: 99320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:30,997-Speed 18518.56 samples/sec Loss 4.7327 LearningRate 0.0009 Epoch: 19 Global Step: 99330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:35,455-Speed 18383.64 samples/sec Loss 4.7393 LearningRate 0.0009 Epoch: 19 Global Step: 99340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:39,869-Speed 18564.59 samples/sec Loss 4.7310 LearningRate 0.0009 Epoch: 19 Global Step: 99350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:44,274-Speed 18600.23 samples/sec Loss 4.7509 LearningRate 0.0009 Epoch: 19 Global Step: 99360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:48,691-Speed 18551.93 samples/sec Loss 4.7160 LearningRate 0.0009 Epoch: 19 Global Step: 99370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:53,098-Speed 18595.07 samples/sec Loss 4.7326 LearningRate 0.0008 Epoch: 19 Global Step: 99380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:19:57,510-Speed 18573.85 samples/sec Loss 4.7191 LearningRate 0.0008 Epoch: 19 Global Step: 99390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:01,890-Speed 18712.15 samples/sec Loss 4.7254 LearningRate 0.0008 Epoch: 19 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:06,315-Speed 18524.29 samples/sec Loss 4.7001 LearningRate 0.0008 Epoch: 19 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:10,772-Speed 18387.24 samples/sec Loss 4.7110 LearningRate 0.0008 Epoch: 19 Global Step: 99420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:15,220-Speed 18422.95 samples/sec Loss 4.7497 LearningRate 0.0008 Epoch: 19 Global Step: 99430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:19,645-Speed 18523.67 samples/sec Loss 4.7850 LearningRate 0.0008 Epoch: 19 Global Step: 99440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:24,057-Speed 18573.12 samples/sec Loss 4.7363 LearningRate 0.0008 Epoch: 19 Global Step: 99450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:28,545-Speed 18256.55 samples/sec Loss 4.7412 LearningRate 0.0008 Epoch: 19 Global Step: 99460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:32,981-Speed 18475.52 samples/sec Loss 4.7290 LearningRate 0.0008 Epoch: 19 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:20:37,392-Speed 18578.32 samples/sec Loss 4.7286 LearningRate 0.0008 Epoch: 19 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:20:41,803-Speed 18574.19 samples/sec Loss 4.7372 LearningRate 0.0008 Epoch: 19 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:20:46,216-Speed 18573.66 samples/sec Loss 4.7172 LearningRate 0.0008 Epoch: 19 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:20:50,629-Speed 18568.69 samples/sec Loss 4.7121 LearningRate 0.0008 Epoch: 19 Global Step: 99510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:55,025-Speed 18644.94 samples/sec Loss 4.7116 LearningRate 0.0008 Epoch: 19 Global Step: 99520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:20:59,413-Speed 18668.74 samples/sec Loss 4.7314 LearningRate 0.0008 Epoch: 19 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:03,890-Speed 18304.75 samples/sec Loss 4.6914 LearningRate 0.0008 Epoch: 19 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:08,354-Speed 18354.98 samples/sec Loss 4.7193 LearningRate 0.0008 Epoch: 19 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:12,734-Speed 18706.85 samples/sec Loss 4.7360 LearningRate 0.0008 Epoch: 19 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:17,161-Speed 18510.87 samples/sec Loss 4.7128 LearningRate 0.0008 Epoch: 19 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:21,616-Speed 18391.02 samples/sec Loss 4.7234 LearningRate 0.0008 Epoch: 19 Global Step: 99580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:26,008-Speed 18659.12 samples/sec Loss 4.7500 LearningRate 0.0008 Epoch: 19 Global Step: 99590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:30,444-Speed 18472.75 samples/sec Loss 4.7232 LearningRate 0.0008 Epoch: 19 Global Step: 99600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-14 10:21:34,873-Speed 18503.05 samples/sec Loss 4.7498 LearningRate 0.0008 Epoch: 19 Global Step: 99610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:21:39,274-Speed 18618.37 samples/sec Loss 4.7180 LearningRate 0.0008 Epoch: 19 Global Step: 99620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:21:43,688-Speed 18563.81 samples/sec Loss 4.7112 LearningRate 0.0008 Epoch: 19 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:21:48,118-Speed 18498.23 samples/sec Loss 4.7525 LearningRate 0.0007 Epoch: 19 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:21:52,553-Speed 18479.18 samples/sec Loss 4.7401 LearningRate 0.0007 Epoch: 19 Global Step: 99650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:21:56,990-Speed 18464.60 samples/sec Loss 4.7826 LearningRate 0.0007 Epoch: 19 Global Step: 99660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-14 10:22:01,403-Speed 18572.51 samples/sec Loss 4.7279 LearningRate 0.0007 Epoch: 19 Global Step: 99670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:05,812-Speed 18593.66 samples/sec Loss 4.7436 LearningRate 0.0007 Epoch: 19 Global Step: 99680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:10,235-Speed 18523.44 samples/sec Loss 4.7224 LearningRate 0.0007 Epoch: 19 Global Step: 99690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:14,711-Speed 18307.35 samples/sec Loss 4.7082 LearningRate 0.0007 Epoch: 19 Global Step: 99700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:19,104-Speed 18677.54 samples/sec Loss 4.7273 LearningRate 0.0007 Epoch: 19 Global Step: 99710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:23,470-Speed 18769.36 samples/sec Loss 4.7520 LearningRate 0.0007 Epoch: 19 Global Step: 99720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:27,861-Speed 18660.59 samples/sec Loss 4.7400 LearningRate 0.0007 Epoch: 19 Global Step: 99730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:32,264-Speed 18610.78 samples/sec Loss 4.7277 LearningRate 0.0007 Epoch: 19 Global Step: 99740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:36,636-Speed 18741.01 samples/sec Loss 4.7207 LearningRate 0.0007 Epoch: 19 Global Step: 99750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:41,071-Speed 18480.09 samples/sec Loss 4.7133 LearningRate 0.0007 Epoch: 19 Global Step: 99760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:22:45,495-Speed 18523.05 samples/sec Loss 4.7186 LearningRate 0.0007 Epoch: 19 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:22:49,925-Speed 18499.30 samples/sec Loss 4.7461 LearningRate 0.0007 Epoch: 19 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:22:54,310-Speed 18688.37 samples/sec Loss 4.7045 LearningRate 0.0007 Epoch: 19 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:22:58,723-Speed 18566.97 samples/sec Loss 4.7195 LearningRate 0.0007 Epoch: 19 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:23:03,137-Speed 18568.44 samples/sec Loss 4.7258 LearningRate 0.0007 Epoch: 19 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:23:07,551-Speed 18566.55 samples/sec Loss 4.6956 LearningRate 0.0007 Epoch: 19 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:23:11,965-Speed 18560.98 samples/sec Loss 4.7255 LearningRate 0.0007 Epoch: 19 Global Step: 99830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:23:16,467-Speed 18205.44 samples/sec Loss 4.7241 LearningRate 0.0007 Epoch: 19 Global Step: 99840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:23:20,915-Speed 18423.43 samples/sec Loss 4.7222 LearningRate 0.0007 Epoch: 19 Global Step: 99850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:23:25,338-Speed 18525.43 samples/sec Loss 4.7279 LearningRate 0.0007 Epoch: 19 Global Step: 99860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:23:29,788-Speed 18412.38 samples/sec Loss 4.7451 LearningRate 0.0007 Epoch: 19 Global Step: 99870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:23:34,203-Speed 18564.98 samples/sec Loss 4.7290 LearningRate 0.0007 Epoch: 19 Global Step: 99880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:23:38,643-Speed 18452.76 samples/sec Loss 4.7247 LearningRate 0.0007 Epoch: 19 Global Step: 99890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:23:43,046-Speed 18608.84 samples/sec Loss 4.7306 LearningRate 0.0007 Epoch: 19 Global Step: 99900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:23:47,450-Speed 18608.14 samples/sec Loss 4.6989 LearningRate 0.0007 Epoch: 19 Global Step: 99910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:23:51,857-Speed 18594.30 samples/sec Loss 4.7290 LearningRate 0.0006 Epoch: 19 Global Step: 99920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:23:56,240-Speed 18694.64 samples/sec Loss 4.7338 LearningRate 0.0006 Epoch: 19 Global Step: 99930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:00,648-Speed 18590.19 samples/sec Loss 4.7193 LearningRate 0.0006 Epoch: 19 Global Step: 99940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:05,051-Speed 18610.45 samples/sec Loss 4.7672 LearningRate 0.0006 Epoch: 19 Global Step: 99950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:09,520-Speed 18330.36 samples/sec Loss 4.7581 LearningRate 0.0006 Epoch: 19 Global Step: 99960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:13,921-Speed 18619.58 samples/sec Loss 4.7103 LearningRate 0.0006 Epoch: 19 Global Step: 99970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:18,335-Speed 18563.93 samples/sec Loss 4.6937 LearningRate 0.0006 Epoch: 19 Global Step: 99980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:22,818-Speed 18276.44 samples/sec Loss 4.7235 LearningRate 0.0006 Epoch: 19 Global Step: 99990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:24:27,219-Speed 18618.58 samples/sec Loss 4.7429 LearningRate 0.0006 Epoch: 19 Global Step: 100000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:24:31,661-Speed 18444.41 samples/sec Loss 4.7244 LearningRate 0.0006 Epoch: 19 Global Step: 100010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:24:36,058-Speed 18638.34 samples/sec Loss 4.7403 LearningRate 0.0006 Epoch: 19 Global Step: 100020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:24:40,449-Speed 18657.20 samples/sec Loss 4.7387 LearningRate 0.0006 Epoch: 19 Global Step: 100030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:44,894-Speed 18436.30 samples/sec Loss 4.6838 LearningRate 0.0006 Epoch: 19 Global Step: 100040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:49,319-Speed 18520.30 samples/sec Loss 4.7166 LearningRate 0.0006 Epoch: 19 Global Step: 100050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:53,724-Speed 18603.16 samples/sec Loss 4.7118 LearningRate 0.0006 Epoch: 19 Global Step: 100060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:24:58,174-Speed 18412.20 samples/sec Loss 4.7534 LearningRate 0.0006 Epoch: 19 Global Step: 100070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:25:02,610-Speed 18474.71 samples/sec Loss 4.7626 LearningRate 0.0006 Epoch: 19 Global Step: 100080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:25:07,078-Speed 18339.87 samples/sec Loss 4.7408 LearningRate 0.0006 Epoch: 19 Global Step: 100090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:25:11,473-Speed 18645.06 samples/sec Loss 4.7338 LearningRate 0.0006 Epoch: 19 Global Step: 100100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:25:15,914-Speed 18450.51 samples/sec Loss 4.7482 LearningRate 0.0006 Epoch: 19 Global Step: 100110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:25:20,373-Speed 18375.47 samples/sec Loss 4.6993 LearningRate 0.0006 Epoch: 19 Global Step: 100120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:25:24,797-Speed 18524.00 samples/sec Loss 4.7283 LearningRate 0.0006 Epoch: 19 Global Step: 100130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:25:29,330-Speed 18075.34 samples/sec Loss 4.7198 LearningRate 0.0006 Epoch: 19 Global Step: 100140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:25:33,713-Speed 18697.51 samples/sec Loss 4.7583 LearningRate 0.0006 Epoch: 19 Global Step: 100150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:25:38,118-Speed 18600.98 samples/sec Loss 4.7305 LearningRate 0.0006 Epoch: 19 Global Step: 100160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:25:42,499-Speed 18704.14 samples/sec Loss 4.7344 LearningRate 0.0006 Epoch: 19 Global Step: 100170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:25:46,954-Speed 18392.46 samples/sec Loss 4.6906 LearningRate 0.0006 Epoch: 19 Global Step: 100180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:25:51,376-Speed 18528.78 samples/sec Loss 4.7183 LearningRate 0.0006 Epoch: 19 Global Step: 100190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:25:55,787-Speed 18582.13 samples/sec Loss 4.7353 LearningRate 0.0006 Epoch: 19 Global Step: 100200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:00,210-Speed 18529.71 samples/sec Loss 4.7223 LearningRate 0.0006 Epoch: 19 Global Step: 100210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:04,625-Speed 18557.04 samples/sec Loss 4.7231 LearningRate 0.0005 Epoch: 19 Global Step: 100220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:09,067-Speed 18448.32 samples/sec Loss 4.6951 LearningRate 0.0005 Epoch: 19 Global Step: 100230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:26:13,513-Speed 18429.10 samples/sec Loss 4.7287 LearningRate 0.0005 Epoch: 19 Global Step: 100240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:26:17,940-Speed 18507.82 samples/sec Loss 4.7141 LearningRate 0.0005 Epoch: 19 Global Step: 100250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:26:22,358-Speed 18545.27 samples/sec Loss 4.7264 LearningRate 0.0005 Epoch: 19 Global Step: 100260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:26:26,835-Speed 18307.39 samples/sec Loss 4.7115 LearningRate 0.0005 Epoch: 19 Global Step: 100270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:26:31,248-Speed 18565.31 samples/sec Loss 4.7300 LearningRate 0.0005 Epoch: 19 Global Step: 100280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:35,638-Speed 18667.46 samples/sec Loss 4.6794 LearningRate 0.0005 Epoch: 19 Global Step: 100290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:40,019-Speed 18707.51 samples/sec Loss 4.7083 LearningRate 0.0005 Epoch: 19 Global Step: 100300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:44,429-Speed 18581.59 samples/sec Loss 4.7455 LearningRate 0.0005 Epoch: 19 Global Step: 100310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:48,861-Speed 18492.65 samples/sec Loss 4.7225 LearningRate 0.0005 Epoch: 19 Global Step: 100320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:53,269-Speed 18585.55 samples/sec Loss 4.7268 LearningRate 0.0005 Epoch: 19 Global Step: 100330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:26:57,668-Speed 18627.77 samples/sec Loss 4.6883 LearningRate 0.0005 Epoch: 19 Global Step: 100340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:27:02,074-Speed 18597.60 samples/sec Loss 4.7281 LearningRate 0.0005 Epoch: 19 Global Step: 100350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:27:06,510-Speed 18473.99 samples/sec Loss 4.7259 LearningRate 0.0005 Epoch: 19 Global Step: 100360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:27:10,900-Speed 18665.31 samples/sec Loss 4.7229 LearningRate 0.0005 Epoch: 19 Global Step: 100370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:15,322-Speed 18527.55 samples/sec Loss 4.7041 LearningRate 0.0005 Epoch: 19 Global Step: 100380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:19,738-Speed 18556.55 samples/sec Loss 4.7293 LearningRate 0.0005 Epoch: 19 Global Step: 100390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:24,147-Speed 18588.58 samples/sec Loss 4.7362 LearningRate 0.0005 Epoch: 19 Global Step: 100400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:28,566-Speed 18545.00 samples/sec Loss 4.7069 LearningRate 0.0005 Epoch: 19 Global Step: 100410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:32,957-Speed 18670.56 samples/sec Loss 4.7082 LearningRate 0.0005 Epoch: 19 Global Step: 100420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:37,372-Speed 18560.26 samples/sec Loss 4.7569 LearningRate 0.0005 Epoch: 19 Global Step: 100430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:41,855-Speed 18279.88 samples/sec Loss 4.7039 LearningRate 0.0005 Epoch: 19 Global Step: 100440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:46,272-Speed 18556.12 samples/sec Loss 4.6922 LearningRate 0.0005 Epoch: 19 Global Step: 100450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:50,697-Speed 18518.26 samples/sec Loss 4.7285 LearningRate 0.0005 Epoch: 19 Global Step: 100460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:27:55,155-Speed 18381.43 samples/sec Loss 4.7394 LearningRate 0.0005 Epoch: 19 Global Step: 100470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:27:59,554-Speed 18626.84 samples/sec Loss 4.7363 LearningRate 0.0005 Epoch: 19 Global Step: 100480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:04,021-Speed 18349.01 samples/sec Loss 4.7170 LearningRate 0.0005 Epoch: 19 Global Step: 100490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:08,493-Speed 18319.54 samples/sec Loss 4.6964 LearningRate 0.0005 Epoch: 19 Global Step: 100500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:12,925-Speed 18489.20 samples/sec Loss 4.7262 LearningRate 0.0005 Epoch: 19 Global Step: 100510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:17,337-Speed 18577.05 samples/sec Loss 4.7255 LearningRate 0.0005 Epoch: 19 Global Step: 100520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:21,760-Speed 18525.64 samples/sec Loss 4.7280 LearningRate 0.0005 Epoch: 19 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:26,161-Speed 18617.95 samples/sec Loss 4.6611 LearningRate 0.0005 Epoch: 19 Global Step: 100540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:30,586-Speed 18518.90 samples/sec Loss 4.6996 LearningRate 0.0005 Epoch: 19 Global Step: 100550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:35,040-Speed 18399.73 samples/sec Loss 4.7392 LearningRate 0.0004 Epoch: 19 Global Step: 100560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:39,480-Speed 18455.44 samples/sec Loss 4.7575 LearningRate 0.0004 Epoch: 19 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:28:43,902-Speed 18529.57 samples/sec Loss 4.7169 LearningRate 0.0004 Epoch: 19 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:28:48,345-Speed 18442.34 samples/sec Loss 4.7065 LearningRate 0.0004 Epoch: 19 Global Step: 100590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:52,818-Speed 18319.81 samples/sec Loss 4.6956 LearningRate 0.0004 Epoch: 19 Global Step: 100600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:28:57,231-Speed 18570.26 samples/sec Loss 4.7363 LearningRate 0.0004 Epoch: 19 Global Step: 100610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:01,666-Speed 18473.10 samples/sec Loss 4.7302 LearningRate 0.0004 Epoch: 19 Global Step: 100620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:06,086-Speed 18538.66 samples/sec Loss 4.7076 LearningRate 0.0004 Epoch: 19 Global Step: 100630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:10,471-Speed 18691.44 samples/sec Loss 4.6979 LearningRate 0.0004 Epoch: 19 Global Step: 100640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:14,868-Speed 18634.98 samples/sec Loss 4.6484 LearningRate 0.0004 Epoch: 19 Global Step: 100650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:19,289-Speed 18536.59 samples/sec Loss 4.7187 LearningRate 0.0004 Epoch: 19 Global Step: 100660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:23,739-Speed 18413.39 samples/sec Loss 4.7419 LearningRate 0.0004 Epoch: 19 Global Step: 100670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:28,234-Speed 18233.86 samples/sec Loss 4.7520 LearningRate 0.0004 Epoch: 19 Global Step: 100680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:32,662-Speed 18508.81 samples/sec Loss 4.7007 LearningRate 0.0004 Epoch: 19 Global Step: 100690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:37,093-Speed 18492.40 samples/sec Loss 4.7057 LearningRate 0.0004 Epoch: 19 Global Step: 100700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:41,489-Speed 18642.34 samples/sec Loss 4.7470 LearningRate 0.0004 Epoch: 19 Global Step: 100710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:45,892-Speed 18605.71 samples/sec Loss 4.7238 LearningRate 0.0004 Epoch: 19 Global Step: 100720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:50,365-Speed 18321.77 samples/sec Loss 4.6877 LearningRate 0.0004 Epoch: 19 Global Step: 100730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:54,806-Speed 18453.96 samples/sec Loss 4.7581 LearningRate 0.0004 Epoch: 19 Global Step: 100740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:29:59,316-Speed 18169.47 samples/sec Loss 4.7172 LearningRate 0.0004 Epoch: 19 Global Step: 100750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:30:03,728-Speed 18574.30 samples/sec Loss 4.6919 LearningRate 0.0004 Epoch: 19 Global Step: 100760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:30:08,118-Speed 18666.34 samples/sec Loss 4.7164 LearningRate 0.0004 Epoch: 19 Global Step: 100770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:30:12,506-Speed 18669.58 samples/sec Loss 4.7430 LearningRate 0.0004 Epoch: 19 Global Step: 100780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:30:16,927-Speed 18538.38 samples/sec Loss 4.7472 LearningRate 0.0004 Epoch: 19 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:30:21,328-Speed 18619.89 samples/sec Loss 4.7195 LearningRate 0.0004 Epoch: 19 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:30:25,732-Speed 18609.72 samples/sec Loss 4.7036 LearningRate 0.0004 Epoch: 19 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:30:30,139-Speed 18591.27 samples/sec Loss 4.6839 LearningRate 0.0004 Epoch: 19 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:30:34,556-Speed 18549.85 samples/sec Loss 4.7218 LearningRate 0.0004 Epoch: 19 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:30:38,998-Speed 18450.88 samples/sec Loss 4.7412 LearningRate 0.0004 Epoch: 19 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:30:43,415-Speed 18552.75 samples/sec Loss 4.7282 LearningRate 0.0004 Epoch: 19 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:30:47,869-Speed 18396.94 samples/sec Loss 4.7375 LearningRate 0.0004 Epoch: 19 Global Step: 100860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:30:52,331-Speed 18366.23 samples/sec Loss 4.7288 LearningRate 0.0004 Epoch: 19 Global Step: 100870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:30:56,765-Speed 18486.34 samples/sec Loss 4.7080 LearningRate 0.0004 Epoch: 19 Global Step: 100880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:01,205-Speed 18451.20 samples/sec Loss 4.6998 LearningRate 0.0004 Epoch: 19 Global Step: 100890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:05,605-Speed 18628.01 samples/sec Loss 4.7091 LearningRate 0.0004 Epoch: 19 Global Step: 100900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:10,066-Speed 18370.08 samples/sec Loss 4.7238 LearningRate 0.0004 Epoch: 19 Global Step: 100910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:14,491-Speed 18514.36 samples/sec Loss 4.6875 LearningRate 0.0003 Epoch: 19 Global Step: 100920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:18,872-Speed 18703.30 samples/sec Loss 4.7469 LearningRate 0.0003 Epoch: 19 Global Step: 100930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:23,254-Speed 18698.89 samples/sec Loss 4.7060 LearningRate 0.0003 Epoch: 19 Global Step: 100940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:27,660-Speed 18605.24 samples/sec Loss 4.7232 LearningRate 0.0003 Epoch: 19 Global Step: 100950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:31:32,116-Speed 18390.01 samples/sec Loss 4.7213 LearningRate 0.0003 Epoch: 19 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:31:36,523-Speed 18593.25 samples/sec Loss 4.7070 LearningRate 0.0003 Epoch: 19 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:31:40,947-Speed 18521.28 samples/sec Loss 4.7212 LearningRate 0.0003 Epoch: 19 Global Step: 100980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:31:45,420-Speed 18318.90 samples/sec Loss 4.7122 LearningRate 0.0003 Epoch: 19 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:31:49,828-Speed 18589.37 samples/sec Loss 4.7293 LearningRate 0.0003 Epoch: 19 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:31:54,215-Speed 18675.35 samples/sec Loss 4.7373 LearningRate 0.0003 Epoch: 19 Global Step: 101010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:31:58,615-Speed 18627.38 samples/sec Loss 4.7170 LearningRate 0.0003 Epoch: 19 Global Step: 101020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:03,009-Speed 18642.70 samples/sec Loss 4.6797 LearningRate 0.0003 Epoch: 19 Global Step: 101030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:07,401-Speed 18660.58 samples/sec Loss 4.6827 LearningRate 0.0003 Epoch: 19 Global Step: 101040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:11,814-Speed 18565.86 samples/sec Loss 4.7132 LearningRate 0.0003 Epoch: 19 Global Step: 101050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:16,243-Speed 18501.19 samples/sec Loss 4.7413 LearningRate 0.0003 Epoch: 19 Global Step: 101060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:20,649-Speed 18593.64 samples/sec Loss 4.7132 LearningRate 0.0003 Epoch: 19 Global Step: 101070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:25,116-Speed 18345.96 samples/sec Loss 4.7012 LearningRate 0.0003 Epoch: 19 Global Step: 101080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:29,512-Speed 18638.65 samples/sec Loss 4.7392 LearningRate 0.0003 Epoch: 19 Global Step: 101090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:33,895-Speed 18696.91 samples/sec Loss 4.7095 LearningRate 0.0003 Epoch: 19 Global Step: 101100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:38,284-Speed 18667.05 samples/sec Loss 4.7348 LearningRate 0.0003 Epoch: 19 Global Step: 101110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:42,697-Speed 18563.97 samples/sec Loss 4.7359 LearningRate 0.0003 Epoch: 19 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:32:47,131-Speed 18479.69 samples/sec Loss 4.7357 LearningRate 0.0003 Epoch: 19 Global Step: 101130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:51,558-Speed 18507.25 samples/sec Loss 4.6901 LearningRate 0.0003 Epoch: 19 Global Step: 101140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:32:55,949-Speed 18660.73 samples/sec Loss 4.6992 LearningRate 0.0003 Epoch: 19 Global Step: 101150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:00,364-Speed 18558.75 samples/sec Loss 4.7177 LearningRate 0.0003 Epoch: 19 Global Step: 101160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:04,802-Speed 18463.77 samples/sec Loss 4.6938 LearningRate 0.0003 Epoch: 19 Global Step: 101170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:09,239-Speed 18471.20 samples/sec Loss 4.7125 LearningRate 0.0003 Epoch: 19 Global Step: 101180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:13,696-Speed 18384.83 samples/sec Loss 4.7103 LearningRate 0.0003 Epoch: 19 Global Step: 101190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:18,119-Speed 18526.50 samples/sec Loss 4.7170 LearningRate 0.0003 Epoch: 19 Global Step: 101200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:22,548-Speed 18496.03 samples/sec Loss 4.6976 LearningRate 0.0003 Epoch: 19 Global Step: 101210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:27,008-Speed 18373.49 samples/sec Loss 4.7116 LearningRate 0.0003 Epoch: 19 Global Step: 101220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:31,508-Speed 18208.91 samples/sec Loss 4.7114 LearningRate 0.0003 Epoch: 19 Global Step: 101230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:33:35,951-Speed 18449.01 samples/sec Loss 4.7197 LearningRate 0.0003 Epoch: 19 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:33:40,356-Speed 18605.32 samples/sec Loss 4.6906 LearningRate 0.0003 Epoch: 19 Global Step: 101250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:33:44,750-Speed 18648.59 samples/sec Loss 4.7653 LearningRate 0.0003 Epoch: 19 Global Step: 101260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:49,148-Speed 18634.71 samples/sec Loss 4.7019 LearningRate 0.0003 Epoch: 19 Global Step: 101270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:53,562-Speed 18565.25 samples/sec Loss 4.7273 LearningRate 0.0003 Epoch: 19 Global Step: 101280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:33:57,943-Speed 18707.19 samples/sec Loss 4.7444 LearningRate 0.0003 Epoch: 19 Global Step: 101290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:02,388-Speed 18431.30 samples/sec Loss 4.7358 LearningRate 0.0003 Epoch: 19 Global Step: 101300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:06,815-Speed 18509.17 samples/sec Loss 4.6994 LearningRate 0.0003 Epoch: 19 Global Step: 101310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:11,216-Speed 18621.23 samples/sec Loss 4.7352 LearningRate 0.0003 Epoch: 19 Global Step: 101320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:15,624-Speed 18593.23 samples/sec Loss 4.6805 LearningRate 0.0003 Epoch: 19 Global Step: 101330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:20,092-Speed 18343.40 samples/sec Loss 4.7037 LearningRate 0.0003 Epoch: 19 Global Step: 101340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:24,477-Speed 18687.14 samples/sec Loss 4.6983 LearningRate 0.0002 Epoch: 19 Global Step: 101350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:28,907-Speed 18499.71 samples/sec Loss 4.7236 LearningRate 0.0002 Epoch: 19 Global Step: 101360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:34:33,316-Speed 18586.44 samples/sec Loss 4.7390 LearningRate 0.0002 Epoch: 19 Global Step: 101370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:34:37,735-Speed 18543.65 samples/sec Loss 4.7239 LearningRate 0.0002 Epoch: 19 Global Step: 101380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:34:42,138-Speed 18607.73 samples/sec Loss 4.7108 LearningRate 0.0002 Epoch: 19 Global Step: 101390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:46,541-Speed 18608.52 samples/sec Loss 4.6897 LearningRate 0.0002 Epoch: 19 Global Step: 101400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:50,975-Speed 18481.86 samples/sec Loss 4.7229 LearningRate 0.0002 Epoch: 19 Global Step: 101410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:55,362-Speed 18683.10 samples/sec Loss 4.7042 LearningRate 0.0002 Epoch: 19 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:34:59,772-Speed 18578.46 samples/sec Loss 4.7374 LearningRate 0.0002 Epoch: 19 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:04,174-Speed 18614.57 samples/sec Loss 4.6929 LearningRate 0.0002 Epoch: 19 Global Step: 101440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:09,372-Speed 15765.94 samples/sec Loss 4.7004 LearningRate 0.0002 Epoch: 19 Global Step: 101450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:13,787-Speed 18559.93 samples/sec Loss 4.7551 LearningRate 0.0002 Epoch: 19 Global Step: 101460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:18,208-Speed 18535.76 samples/sec Loss 4.6972 LearningRate 0.0002 Epoch: 19 Global Step: 101470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:22,596-Speed 18674.77 samples/sec Loss 4.7093 LearningRate 0.0002 Epoch: 19 Global Step: 101480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:27,001-Speed 18603.16 samples/sec Loss 4.7194 LearningRate 0.0002 Epoch: 19 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:35:31,436-Speed 18478.75 samples/sec Loss 4.7171 LearningRate 0.0002 Epoch: 19 Global Step: 101500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:35,848-Speed 18569.73 samples/sec Loss 4.7223 LearningRate 0.0002 Epoch: 19 Global Step: 101510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:40,310-Speed 18364.73 samples/sec Loss 4.7410 LearningRate 0.0002 Epoch: 19 Global Step: 101520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:44,779-Speed 18333.51 samples/sec Loss 4.7633 LearningRate 0.0002 Epoch: 19 Global Step: 101530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:49,176-Speed 18637.85 samples/sec Loss 4.7291 LearningRate 0.0002 Epoch: 19 Global Step: 101540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:53,557-Speed 18701.19 samples/sec Loss 4.7248 LearningRate 0.0002 Epoch: 19 Global Step: 101550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:35:57,972-Speed 18560.89 samples/sec Loss 4.7132 LearningRate 0.0002 Epoch: 19 Global Step: 101560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:02,421-Speed 18417.45 samples/sec Loss 4.6639 LearningRate 0.0002 Epoch: 19 Global Step: 101570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:06,809-Speed 18676.65 samples/sec Loss 4.7079 LearningRate 0.0002 Epoch: 19 Global Step: 101580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:11,254-Speed 18435.79 samples/sec Loss 4.7457 LearningRate 0.0002 Epoch: 19 Global Step: 101590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:15,698-Speed 18439.10 samples/sec Loss 4.7207 LearningRate 0.0002 Epoch: 19 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:36:20,105-Speed 18595.56 samples/sec Loss 4.7177 LearningRate 0.0002 Epoch: 19 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:36:24,603-Speed 18213.24 samples/sec Loss 4.6933 LearningRate 0.0002 Epoch: 19 Global Step: 101620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:36:29,081-Speed 18301.84 samples/sec Loss 4.7065 LearningRate 0.0002 Epoch: 19 Global Step: 101630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:36:33,475-Speed 18648.09 samples/sec Loss 4.7267 LearningRate 0.0002 Epoch: 19 Global Step: 101640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:37,874-Speed 18630.33 samples/sec Loss 4.6874 LearningRate 0.0002 Epoch: 19 Global Step: 101650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:42,303-Speed 18501.49 samples/sec Loss 4.7205 LearningRate 0.0002 Epoch: 19 Global Step: 101660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:46,709-Speed 18597.58 samples/sec Loss 4.7043 LearningRate 0.0002 Epoch: 19 Global Step: 101670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:51,155-Speed 18432.53 samples/sec Loss 4.7201 LearningRate 0.0002 Epoch: 19 Global Step: 101680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:55,542-Speed 18679.07 samples/sec Loss 4.7106 LearningRate 0.0002 Epoch: 19 Global Step: 101690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:36:59,935-Speed 18652.25 samples/sec Loss 4.7282 LearningRate 0.0002 Epoch: 19 Global Step: 101700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:37:04,376-Speed 18455.30 samples/sec Loss 4.7251 LearningRate 0.0002 Epoch: 19 Global Step: 101710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:37:08,869-Speed 18235.93 samples/sec Loss 4.6571 LearningRate 0.0002 Epoch: 19 Global Step: 101720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:37:13,269-Speed 18622.44 samples/sec Loss 4.6686 LearningRate 0.0002 Epoch: 19 Global Step: 101730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:37:17,681-Speed 18576.27 samples/sec Loss 4.7050 LearningRate 0.0002 Epoch: 19 Global Step: 101740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:22,105-Speed 18520.07 samples/sec Loss 4.6884 LearningRate 0.0002 Epoch: 19 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:26,509-Speed 18607.08 samples/sec Loss 4.7177 LearningRate 0.0002 Epoch: 19 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:30,916-Speed 18591.23 samples/sec Loss 4.7326 LearningRate 0.0002 Epoch: 19 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:35,353-Speed 18468.66 samples/sec Loss 4.7094 LearningRate 0.0002 Epoch: 19 Global Step: 101780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:39,777-Speed 18524.64 samples/sec Loss 4.7128 LearningRate 0.0002 Epoch: 19 Global Step: 101790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:44,221-Speed 18439.19 samples/sec Loss 4.7159 LearningRate 0.0002 Epoch: 19 Global Step: 101800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:48,647-Speed 18509.29 samples/sec Loss 4.7134 LearningRate 0.0002 Epoch: 19 Global Step: 101810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:53,090-Speed 18444.69 samples/sec Loss 4.7053 LearningRate 0.0002 Epoch: 19 Global Step: 101820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:37:57,485-Speed 18645.15 samples/sec Loss 4.7119 LearningRate 0.0002 Epoch: 19 Global Step: 101830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:01,906-Speed 18533.77 samples/sec Loss 4.7313 LearningRate 0.0002 Epoch: 19 Global Step: 101840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-14 10:38:06,323-Speed 18553.43 samples/sec Loss 4.6858 LearningRate 0.0002 Epoch: 19 Global Step: 101850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:10,737-Speed 18561.24 samples/sec Loss 4.7036 LearningRate 0.0002 Epoch: 19 Global Step: 101860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:15,163-Speed 18515.17 samples/sec Loss 4.7190 LearningRate 0.0002 Epoch: 19 Global Step: 101870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:19,573-Speed 18583.63 samples/sec Loss 4.6964 LearningRate 0.0001 Epoch: 19 Global Step: 101880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:24,005-Speed 18487.83 samples/sec Loss 4.7427 LearningRate 0.0001 Epoch: 19 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:28,444-Speed 18461.39 samples/sec Loss 4.7114 LearningRate 0.0001 Epoch: 19 Global Step: 101900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:32,855-Speed 18572.41 samples/sec Loss 4.7071 LearningRate 0.0001 Epoch: 19 Global Step: 101910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:37,260-Speed 18606.77 samples/sec Loss 4.6842 LearningRate 0.0001 Epoch: 19 Global Step: 101920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:41,662-Speed 18611.62 samples/sec Loss 4.7181 LearningRate 0.0001 Epoch: 19 Global Step: 101930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:46,071-Speed 18586.57 samples/sec Loss 4.7118 LearningRate 0.0001 Epoch: 19 Global Step: 101940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:38:50,524-Speed 18402.61 samples/sec Loss 4.7401 LearningRate 0.0001 Epoch: 19 Global Step: 101950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:38:54,922-Speed 18632.72 samples/sec Loss 4.7239 LearningRate 0.0001 Epoch: 19 Global Step: 101960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:38:59,358-Speed 18476.81 samples/sec Loss 4.6893 LearningRate 0.0001 Epoch: 19 Global Step: 101970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:39:03,828-Speed 18326.75 samples/sec Loss 4.7169 LearningRate 0.0001 Epoch: 19 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:39:08,287-Speed 18377.99 samples/sec Loss 4.6877 LearningRate 0.0001 Epoch: 19 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:39:12,731-Speed 18439.27 samples/sec Loss 4.7241 LearningRate 0.0001 Epoch: 19 Global Step: 102000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:17,135-Speed 18608.38 samples/sec Loss 4.7090 LearningRate 0.0001 Epoch: 19 Global Step: 102010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:21,554-Speed 18541.57 samples/sec Loss 4.6908 LearningRate 0.0001 Epoch: 19 Global Step: 102020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:25,936-Speed 18700.25 samples/sec Loss 4.6780 LearningRate 0.0001 Epoch: 19 Global Step: 102030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:30,333-Speed 18634.38 samples/sec Loss 4.7171 LearningRate 0.0001 Epoch: 19 Global Step: 102040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:34,777-Speed 18439.67 samples/sec Loss 4.7355 LearningRate 0.0001 Epoch: 19 Global Step: 102050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:39,229-Speed 18405.85 samples/sec Loss 4.7080 LearningRate 0.0001 Epoch: 19 Global Step: 102060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:43,650-Speed 18537.16 samples/sec Loss 4.7041 LearningRate 0.0001 Epoch: 19 Global Step: 102070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:48,034-Speed 18688.81 samples/sec Loss 4.6980 LearningRate 0.0001 Epoch: 19 Global Step: 102080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:52,436-Speed 18616.50 samples/sec Loss 4.7081 LearningRate 0.0001 Epoch: 19 Global Step: 102090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:39:56,830-Speed 18648.27 samples/sec Loss 4.7376 LearningRate 0.0001 Epoch: 19 Global Step: 102100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:01,217-Speed 18679.23 samples/sec Loss 4.7253 LearningRate 0.0001 Epoch: 19 Global Step: 102110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:05,597-Speed 18712.76 samples/sec Loss 4.7009 LearningRate 0.0001 Epoch: 19 Global Step: 102120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:09,997-Speed 18622.92 samples/sec Loss 4.6930 LearningRate 0.0001 Epoch: 19 Global Step: 102130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:14,396-Speed 18625.33 samples/sec Loss 4.6789 LearningRate 0.0001 Epoch: 19 Global Step: 102140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:18,773-Speed 18721.45 samples/sec Loss 4.7221 LearningRate 0.0001 Epoch: 19 Global Step: 102150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:23,151-Speed 18718.91 samples/sec Loss 4.7252 LearningRate 0.0001 Epoch: 19 Global Step: 102160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:27,532-Speed 18702.13 samples/sec Loss 4.6881 LearningRate 0.0001 Epoch: 19 Global Step: 102170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:31,917-Speed 18687.82 samples/sec Loss 4.6855 LearningRate 0.0001 Epoch: 19 Global Step: 102180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:36,371-Speed 18404.54 samples/sec Loss 4.6925 LearningRate 0.0001 Epoch: 19 Global Step: 102190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:40,768-Speed 18639.99 samples/sec Loss 4.6904 LearningRate 0.0001 Epoch: 19 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:40:45,169-Speed 18618.20 samples/sec Loss 4.6973 LearningRate 0.0001 Epoch: 19 Global Step: 102210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:40:49,574-Speed 18602.26 samples/sec Loss 4.7043 LearningRate 0.0001 Epoch: 19 Global Step: 102220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:40:53,975-Speed 18622.41 samples/sec Loss 4.7038 LearningRate 0.0001 Epoch: 19 Global Step: 102230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:40:58,395-Speed 18541.62 samples/sec Loss 4.7017 LearningRate 0.0001 Epoch: 19 Global Step: 102240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:02,828-Speed 18483.01 samples/sec Loss 4.7340 LearningRate 0.0001 Epoch: 19 Global Step: 102250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:07,243-Speed 18565.03 samples/sec Loss 4.7285 LearningRate 0.0001 Epoch: 19 Global Step: 102260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:11,643-Speed 18622.29 samples/sec Loss 4.6945 LearningRate 0.0001 Epoch: 19 Global Step: 102270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:16,073-Speed 18495.73 samples/sec Loss 4.7093 LearningRate 0.0001 Epoch: 19 Global Step: 102280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:20,492-Speed 18543.21 samples/sec Loss 4.6833 LearningRate 0.0001 Epoch: 19 Global Step: 102290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:24,889-Speed 18636.79 samples/sec Loss 4.7243 LearningRate 0.0001 Epoch: 19 Global Step: 102300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:29,326-Speed 18468.12 samples/sec Loss 4.7002 LearningRate 0.0001 Epoch: 19 Global Step: 102310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:33,814-Speed 18256.36 samples/sec Loss 4.6938 LearningRate 0.0001 Epoch: 19 Global Step: 102320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:38,211-Speed 18639.45 samples/sec Loss 4.7020 LearningRate 0.0001 Epoch: 19 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:41:42,641-Speed 18495.27 samples/sec Loss 4.6763 LearningRate 0.0001 Epoch: 19 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:41:47,032-Speed 18664.01 samples/sec Loss 4.7049 LearningRate 0.0001 Epoch: 19 Global Step: 102350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:51,536-Speed 18193.11 samples/sec Loss 4.7343 LearningRate 0.0001 Epoch: 19 Global Step: 102360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:41:55,974-Speed 18465.61 samples/sec Loss 4.7360 LearningRate 0.0001 Epoch: 19 Global Step: 102370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:00,409-Speed 18478.14 samples/sec Loss 4.7004 LearningRate 0.0001 Epoch: 19 Global Step: 102380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:04,847-Speed 18465.12 samples/sec Loss 4.6894 LearningRate 0.0001 Epoch: 19 Global Step: 102390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:09,249-Speed 18614.94 samples/sec Loss 4.7177 LearningRate 0.0001 Epoch: 19 Global Step: 102400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:13,663-Speed 18564.48 samples/sec Loss 4.6965 LearningRate 0.0001 Epoch: 19 Global Step: 102410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:18,082-Speed 18548.04 samples/sec Loss 4.6849 LearningRate 0.0001 Epoch: 19 Global Step: 102420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:22,509-Speed 18507.21 samples/sec Loss 4.7052 LearningRate 0.0001 Epoch: 19 Global Step: 102430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:26,934-Speed 18520.28 samples/sec Loss 4.7136 LearningRate 0.0001 Epoch: 19 Global Step: 102440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:31,332-Speed 18629.80 samples/sec Loss 4.6761 LearningRate 0.0001 Epoch: 19 Global Step: 102450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:35,771-Speed 18461.66 samples/sec Loss 4.7633 LearningRate 0.0001 Epoch: 19 Global Step: 102460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:40,240-Speed 18337.40 samples/sec Loss 4.7193 LearningRate 0.0001 Epoch: 19 Global Step: 102470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:44,707-Speed 18341.56 samples/sec Loss 4.7049 LearningRate 0.0001 Epoch: 19 Global Step: 102480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:49,095-Speed 18673.98 samples/sec Loss 4.7209 LearningRate 0.0001 Epoch: 19 Global Step: 102490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:53,485-Speed 18667.87 samples/sec Loss 4.7082 LearningRate 0.0001 Epoch: 19 Global Step: 102500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:42:57,919-Speed 18483.22 samples/sec Loss 4.7305 LearningRate 0.0001 Epoch: 19 Global Step: 102510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:02,361-Speed 18443.91 samples/sec Loss 4.7182 LearningRate 0.0001 Epoch: 19 Global Step: 102520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:06,855-Speed 18237.49 samples/sec Loss 4.7182 LearningRate 0.0001 Epoch: 19 Global Step: 102530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:11,267-Speed 18573.20 samples/sec Loss 4.7147 LearningRate 0.0001 Epoch: 19 Global Step: 102540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:15,682-Speed 18556.66 samples/sec Loss 4.7193 LearningRate 0.0001 Epoch: 19 Global Step: 102550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:43:20,162-Speed 18293.25 samples/sec Loss 4.6818 LearningRate 0.0001 Epoch: 19 Global Step: 102560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:43:24,579-Speed 18547.94 samples/sec Loss 4.7248 LearningRate 0.0001 Epoch: 19 Global Step: 102570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:43:28,964-Speed 18688.61 samples/sec Loss 4.7055 LearningRate 0.0001 Epoch: 19 Global Step: 102580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:33,359-Speed 18646.33 samples/sec Loss 4.6913 LearningRate 0.0001 Epoch: 19 Global Step: 102590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:37,793-Speed 18479.34 samples/sec Loss 4.6783 LearningRate 0.0001 Epoch: 19 Global Step: 102600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:42,234-Speed 18448.54 samples/sec Loss 4.6988 LearningRate 0.0001 Epoch: 19 Global Step: 102610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:46,645-Speed 18576.41 samples/sec Loss 4.7094 LearningRate 0.0001 Epoch: 19 Global Step: 102620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:51,061-Speed 18556.21 samples/sec Loss 4.7005 LearningRate 0.0001 Epoch: 19 Global Step: 102630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:55,492-Speed 18493.22 samples/sec Loss 4.7173 LearningRate 0.0000 Epoch: 19 Global Step: 102640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:43:59,906-Speed 18565.10 samples/sec Loss 4.7161 LearningRate 0.0000 Epoch: 19 Global Step: 102650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:44:04,328-Speed 18529.04 samples/sec Loss 4.7012 LearningRate 0.0000 Epoch: 19 Global Step: 102660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:44:08,731-Speed 18613.94 samples/sec Loss 4.7189 LearningRate 0.0000 Epoch: 19 Global Step: 102670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:44:13,142-Speed 18575.39 samples/sec Loss 4.7327 LearningRate 0.0000 Epoch: 19 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:17,552-Speed 18580.89 samples/sec Loss 4.6795 LearningRate 0.0000 Epoch: 19 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:21,984-Speed 18487.39 samples/sec Loss 4.6859 LearningRate 0.0000 Epoch: 19 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:26,378-Speed 18651.44 samples/sec Loss 4.6838 LearningRate 0.0000 Epoch: 19 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:30,860-Speed 18280.90 samples/sec Loss 4.7274 LearningRate 0.0000 Epoch: 19 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:35,360-Speed 18209.95 samples/sec Loss 4.6747 LearningRate 0.0000 Epoch: 19 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:39,828-Speed 18340.39 samples/sec Loss 4.7016 LearningRate 0.0000 Epoch: 19 Global Step: 102740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:44,215-Speed 18675.57 samples/sec Loss 4.7185 LearningRate 0.0000 Epoch: 19 Global Step: 102750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:48,602-Speed 18678.31 samples/sec Loss 4.7115 LearningRate 0.0000 Epoch: 19 Global Step: 102760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:53,001-Speed 18627.42 samples/sec Loss 4.6831 LearningRate 0.0000 Epoch: 19 Global Step: 102770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:44:57,418-Speed 18556.02 samples/sec Loss 4.7215 LearningRate 0.0000 Epoch: 19 Global Step: 102780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:45:02,229-Speed 17044.18 samples/sec Loss 4.7143 LearningRate 0.0000 Epoch: 19 Global Step: 102790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:45:06,657-Speed 18508.30 samples/sec Loss 4.7243 LearningRate 0.0000 Epoch: 19 Global Step: 102800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:45:11,064-Speed 18593.26 samples/sec Loss 4.7333 LearningRate 0.0000 Epoch: 19 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:45:15,479-Speed 18560.07 samples/sec Loss 4.7048 LearningRate 0.0000 Epoch: 19 Global Step: 102820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:45:19,896-Speed 18553.90 samples/sec Loss 4.6917 LearningRate 0.0000 Epoch: 19 Global Step: 102830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:45:24,316-Speed 18540.36 samples/sec Loss 4.6954 LearningRate 0.0000 Epoch: 19 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:45:28,726-Speed 18583.65 samples/sec Loss 4.6933 LearningRate 0.0000 Epoch: 19 Global Step: 102850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:45:33,141-Speed 18559.13 samples/sec Loss 4.6950 LearningRate 0.0000 Epoch: 19 Global Step: 102860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:45:37,577-Speed 18474.96 samples/sec Loss 4.6947 LearningRate 0.0000 Epoch: 19 Global Step: 102870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:45:41,963-Speed 18682.83 samples/sec Loss 4.7211 LearningRate 0.0000 Epoch: 19 Global Step: 102880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:45:46,428-Speed 18350.09 samples/sec Loss 4.7009 LearningRate 0.0000 Epoch: 19 Global Step: 102890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:45:50,814-Speed 18687.45 samples/sec Loss 4.7200 LearningRate 0.0000 Epoch: 19 Global Step: 102900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:45:55,264-Speed 18414.13 samples/sec Loss 4.6795 LearningRate 0.0000 Epoch: 19 Global Step: 102910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:45:59,750-Speed 18268.03 samples/sec Loss 4.6854 LearningRate 0.0000 Epoch: 19 Global Step: 102920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:04,165-Speed 18561.81 samples/sec Loss 4.7004 LearningRate 0.0000 Epoch: 19 Global Step: 102930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:08,584-Speed 18544.65 samples/sec Loss 4.7362 LearningRate 0.0000 Epoch: 19 Global Step: 102940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:13,084-Speed 18210.01 samples/sec Loss 4.7303 LearningRate 0.0000 Epoch: 19 Global Step: 102950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:46:17,472-Speed 18675.91 samples/sec Loss 4.7018 LearningRate 0.0000 Epoch: 19 Global Step: 102960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:21,901-Speed 18503.82 samples/sec Loss 4.7001 LearningRate 0.0000 Epoch: 19 Global Step: 102970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:26,309-Speed 18590.15 samples/sec Loss 4.7141 LearningRate 0.0000 Epoch: 19 Global Step: 102980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:30,722-Speed 18565.88 samples/sec Loss 4.7053 LearningRate 0.0000 Epoch: 19 Global Step: 102990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:35,130-Speed 18591.19 samples/sec Loss 4.7396 LearningRate 0.0000 Epoch: 19 Global Step: 103000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:39,555-Speed 18519.44 samples/sec Loss 4.7013 LearningRate 0.0000 Epoch: 19 Global Step: 103010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:43,960-Speed 18598.46 samples/sec Loss 4.7258 LearningRate 0.0000 Epoch: 19 Global Step: 103020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:48,390-Speed 18501.08 samples/sec Loss 4.7060 LearningRate 0.0000 Epoch: 19 Global Step: 103030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:52,806-Speed 18555.32 samples/sec Loss 4.7031 LearningRate 0.0000 Epoch: 19 Global Step: 103040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:46:57,192-Speed 18682.93 samples/sec Loss 4.6905 LearningRate 0.0000 Epoch: 19 Global Step: 103050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:01,716-Speed 18114.39 samples/sec Loss 4.7268 LearningRate 0.0000 Epoch: 19 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:47:06,245-Speed 18093.71 samples/sec Loss 4.7297 LearningRate 0.0000 Epoch: 19 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:47:10,685-Speed 18455.27 samples/sec Loss 4.7276 LearningRate 0.0000 Epoch: 19 Global Step: 103080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:15,105-Speed 18538.38 samples/sec Loss 4.6968 LearningRate 0.0000 Epoch: 19 Global Step: 103090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:19,490-Speed 18688.29 samples/sec Loss 4.7373 LearningRate 0.0000 Epoch: 19 Global Step: 103100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:23,949-Speed 18381.01 samples/sec Loss 4.7083 LearningRate 0.0000 Epoch: 19 Global Step: 103110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:28,344-Speed 18645.52 samples/sec Loss 4.6948 LearningRate 0.0000 Epoch: 19 Global Step: 103120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:32,758-Speed 18561.10 samples/sec Loss 4.6740 LearningRate 0.0000 Epoch: 19 Global Step: 103130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:37,150-Speed 18657.44 samples/sec Loss 4.6933 LearningRate 0.0000 Epoch: 19 Global Step: 103140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:41,538-Speed 18677.88 samples/sec Loss 4.7126 LearningRate 0.0000 Epoch: 19 Global Step: 103150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:45,953-Speed 18558.02 samples/sec Loss 4.7014 LearningRate 0.0000 Epoch: 19 Global Step: 103160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:50,404-Speed 18412.10 samples/sec Loss 4.7396 LearningRate 0.0000 Epoch: 19 Global Step: 103170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:47:54,889-Speed 18272.51 samples/sec Loss 4.6748 LearningRate 0.0000 Epoch: 19 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:47:59,347-Speed 18384.50 samples/sec Loss 4.7120 LearningRate 0.0000 Epoch: 19 Global Step: 103190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:03,769-Speed 18529.87 samples/sec Loss 4.6996 LearningRate 0.0000 Epoch: 19 Global Step: 103200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:08,174-Speed 18602.53 samples/sec Loss 4.6713 LearningRate 0.0000 Epoch: 19 Global Step: 103210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:12,616-Speed 18447.01 samples/sec Loss 4.7219 LearningRate 0.0000 Epoch: 19 Global Step: 103220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:17,080-Speed 18356.13 samples/sec Loss 4.7173 LearningRate 0.0000 Epoch: 19 Global Step: 103230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:21,510-Speed 18497.42 samples/sec Loss 4.6916 LearningRate 0.0000 Epoch: 19 Global Step: 103240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:25,998-Speed 18258.39 samples/sec Loss 4.7124 LearningRate 0.0000 Epoch: 19 Global Step: 103250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:30,461-Speed 18362.24 samples/sec Loss 4.6874 LearningRate 0.0000 Epoch: 19 Global Step: 103260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:34,981-Speed 18126.70 samples/sec Loss 4.6914 LearningRate 0.0000 Epoch: 19 Global Step: 103270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:39,408-Speed 18512.61 samples/sec Loss 4.6911 LearningRate 0.0000 Epoch: 19 Global Step: 103280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:48:43,806-Speed 18630.49 samples/sec Loss 4.6912 LearningRate 0.0000 Epoch: 19 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:48:48,273-Speed 18340.71 samples/sec Loss 4.6912 LearningRate 0.0000 Epoch: 19 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:48:52,705-Speed 18493.31 samples/sec Loss 4.7171 LearningRate 0.0000 Epoch: 19 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:48:57,094-Speed 18670.41 samples/sec Loss 4.7154 LearningRate 0.0000 Epoch: 19 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:49:01,462-Speed 18758.35 samples/sec Loss 4.6886 LearningRate 0.0000 Epoch: 19 Global Step: 103330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:05,904-Speed 18444.84 samples/sec Loss 4.7515 LearningRate 0.0000 Epoch: 19 Global Step: 103340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:10,415-Speed 18162.27 samples/sec Loss 4.7380 LearningRate 0.0000 Epoch: 19 Global Step: 103350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:14,845-Speed 18498.31 samples/sec Loss 4.6465 LearningRate 0.0000 Epoch: 19 Global Step: 103360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:19,265-Speed 18537.38 samples/sec Loss 4.6922 LearningRate 0.0000 Epoch: 19 Global Step: 103370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:23,666-Speed 18621.37 samples/sec Loss 4.6948 LearningRate 0.0000 Epoch: 19 Global Step: 103380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:28,071-Speed 18602.29 samples/sec Loss 4.7172 LearningRate 0.0000 Epoch: 19 Global Step: 103390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:32,485-Speed 18563.97 samples/sec Loss 4.7030 LearningRate 0.0000 Epoch: 19 Global Step: 103400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:36,898-Speed 18569.30 samples/sec Loss 4.6899 LearningRate 0.0000 Epoch: 19 Global Step: 103410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:41,297-Speed 18628.60 samples/sec Loss 4.7373 LearningRate 0.0000 Epoch: 19 Global Step: 103420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:49:45,693-Speed 18646.13 samples/sec Loss 4.7123 LearningRate 0.0000 Epoch: 19 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:49:50,081-Speed 18679.55 samples/sec Loss 4.7352 LearningRate 0.0000 Epoch: 19 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:49:54,513-Speed 18487.18 samples/sec Loss 4.7225 LearningRate 0.0000 Epoch: 19 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:49:58,918-Speed 18602.53 samples/sec Loss 4.7398 LearningRate 0.0000 Epoch: 19 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:03,352-Speed 18482.60 samples/sec Loss 4.7029 LearningRate 0.0000 Epoch: 19 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:07,793-Speed 18450.50 samples/sec Loss 4.7224 LearningRate 0.0000 Epoch: 19 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:12,258-Speed 18350.47 samples/sec Loss 4.6913 LearningRate 0.0000 Epoch: 19 Global Step: 103490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:16,696-Speed 18463.96 samples/sec Loss 4.6738 LearningRate 0.0000 Epoch: 19 Global Step: 103500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:21,075-Speed 18714.57 samples/sec Loss 4.6849 LearningRate 0.0000 Epoch: 19 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:25,508-Speed 18491.69 samples/sec Loss 4.6749 LearningRate 0.0000 Epoch: 19 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:29,939-Speed 18494.60 samples/sec Loss 4.6944 LearningRate 0.0000 Epoch: 19 Global Step: 103530 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-14 10:50:34,382-Speed 18447.18 samples/sec Loss 4.7053 LearningRate 0.0000 Epoch: 19 Global Step: 103540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:50:38,897-Speed 18147.68 samples/sec Loss 4.6647 LearningRate 0.0000 Epoch: 19 Global Step: 103550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:50:43,342-Speed 18433.42 samples/sec Loss 4.7134 LearningRate 0.0000 Epoch: 19 Global Step: 103560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:50:47,764-Speed 18531.27 samples/sec Loss 4.7122 LearningRate 0.0000 Epoch: 19 Global Step: 103570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:50:55,370-Speed 10772.32 samples/sec Loss 4.7054 LearningRate 0.0000 Epoch: 19 Global Step: 103580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:50:59,759-Speed 18675.86 samples/sec Loss 4.6960 LearningRate 0.0000 Epoch: 19 Global Step: 103590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:51:04,170-Speed 18583.46 samples/sec Loss 4.7255 LearningRate 0.0000 Epoch: 19 Global Step: 103600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:51:08,583-Speed 18570.14 samples/sec Loss 4.7084 LearningRate 0.0000 Epoch: 19 Global Step: 103610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:51:12,999-Speed 18558.48 samples/sec Loss 4.7387 LearningRate 0.0000 Epoch: 19 Global Step: 103620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:51:17,390-Speed 18663.05 samples/sec Loss 4.7307 LearningRate 0.0000 Epoch: 19 Global Step: 103630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:51:21,759-Speed 18767.40 samples/sec Loss 4.7594 LearningRate 0.0000 Epoch: 19 Global Step: 103640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-14 10:51:26,261-Speed 18203.91 samples/sec Loss 4.7095 LearningRate 0.0000 Epoch: 19 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:51:30,683-Speed 18538.01 samples/sec Loss 4.7055 LearningRate 0.0000 Epoch: 19 Global Step: 103660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-14 10:51:35,097-Speed 18568.12 samples/sec Loss 4.7273 LearningRate 0.0000 Epoch: 19 Global Step: 103670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-14 10:51:39,518-Speed 18537.82 samples/sec Loss 4.7214 LearningRate 0.0000 Epoch: 19 Global Step: 103680 Fp16 Grad Scale: 16384 Required: -0 hours Training: 2022-01-14 10:51:43,936-Speed 18544.12 samples/sec Loss 4.6962 LearningRate 0.0000 Epoch: 19 Global Step: 103690 Fp16 Grad Scale: 16384 Required: -0 hours